5 terminal tricks to manage your server workload

Do you want to get better at Linux? I have a few command-line tricks you can learn faster than drinking your early morning coffee. Here are my must-learn commands, not because you need to know them to use Linux, but because you’ll want to know them to make you better at Linux:

find
xargs and nproc
taskset
numactl
inotify-tools

In this article, I’m going to present you with a challenge, and the tools demonstrating how to solve each problem.

1. Directories with lots of files

You may have encountered this problem once or twice. You tried to do a ls on a directory with a very large number of files, and the command throws an argument list too long error:

$ ls *
-bash: /usr/bin/ls: Argument list too long

1 2	$ ls * -bash: /usr/bin/ls: Argument list too long

The reason is that a POSIX system has a limit for the maximum number of bytes you can
pass as an argument:

$ getconf ARG_MAX
2097152

1 2	$ getconf ARG_MAX 2097152

Two million bytes may be not enough, depending on who you ask, but it’s a protection against attacks or innocent mistakes with bad consequences. In any case, you can bypass this limitation with a few different tricks.

Use a shell built-in

Bash doesn’t have the ARG_MAX limitation by default:

$ echo *|ls
...
test_file055554  test_file111110  test_file166666  test_file222222  test_file277778  test_file333334  test_file388890  test_file444446
test_file055555  test_file111111  test_file166667  test_file222223  test_file277779  test_file333335  test_file388891  test_file444447
test_file055556  test_file111112  test_file166668  test_file222224  test_file277780  test_file333336  test_file388892  test_file444448

$ echo *|ls

...

test_file055554 test_file111110 test_file166666 test_file222222 test_file277778 test_file333334 test_file388890 test_file444446

test_file055555 test_file111111 test_file166667 test_file222223 test_file277779 test_file333335 test_file388891 test_file444447

test_file055556 test_file111112 test_file166668 test_file222224 test_file277780 test_file333336 test_file388892 test_file444448

This is probably the simplest solution, but let’s look at another way.

Use find when you want formatting options

You can use this well-known find flag:

find /data/test_xargs -type f -ls -printf '%name'

1	find /data/test_xargs -type f -ls -printf '%name'

Or with formatting, to mimic ls:

find /data/test_xargs -type f -printf '%f\n

1	find /data/test_xargs -type f -printf '%f\n

This is fast and also the most complete solution.

Use xargs

There is, of course, yet another way. The following works:

find /data/test_xargs -type f -print0 | xargs -0 ls

1	find /data/test_xargs -type f -print0 \| xargs -0 ls

This works but is admittedly inefficient. You’re forking 3 processes to display the contents of the directory, and on top of that xargs is throttling how many files get passed to the ls command.

Let’s move on to a different problem.

2. Run more programs at once

First you walk, then you run. That’s a serial process. Suppose you want to compress all files in a given directory. You might first do this:

gzip *

gzip *

That’s a serial process, so it takes a long time. Running just the gzip command processes one file at the time. So you might instead try something that processes files in parallel:

$ for file in $(ls data/test_xargs/*); do gzip $file &; done
-bash: /usr/bin/ls: Argument list too long

1 2	$ for file in $(ls data/test_xargs/*); do gzip $file &; done -bash: /usr/bin/ls: Argument list too long

But ARG_MAX strikes again. So what if you do this:

for file in $(find $PWD); do echo gzip $file &; done
wait
echo "All files compressed?"

for file in $(find $PWD); do echo gzip $file &; done

wait

echo "All files compressed?"

That either makes your server run out of memory, or it all but crushes your server under a very heavy CPU load, because you’re forking a gzip instance for every file found.

Is there a better way?

Parallelism and throttling (the art of self control)

What you need is a way to throttle your compression requests, so you don’t launch more processes than the number of CPUs you have.

Let’s try compressing files again with find and xargs:

find /data/test_xargs -type f -print0| xargs -0 -P $(($(nproc)-1)) -I % gzip %

1	find /data/test_xargs -type f -print0\| xargs -0 -P $(($(nproc)-1)) -I % gzip %

That looks like a fancy one-liner. Let me explain how it works:

Use find to get all files in a given directory, and use the null character as separator to be able to process ones with weird names.
nproc tells you how many CPUs you have, then subtracts 1 using Bash arithmetic like this using sub-shells: $(($(nproc)-1))
Finally, xargs runs no more than -P processes. In my case, that’s 8 CPUs – 1, for a total of 7 jobs. The percent (%) character is dynamically replaced with the name of the file to compress.

There are other ways to get the number of CPUs on a machine. You can parse /proc/cpuinfo, for example.

There are also more efficient compression algorithms out there, but gzip is available on pretty much any Linux or Unix system.

It is time to see our next problem.

3.Maximize execution time with taskset

Despite limiting the number of CPUs, intensive jobs can slow down other processes on your machine as they all compete for resources. There are a few things you can do to keep the performance of your server under control, like using taskset.

The taskset command can set or retrieve the CPU affinity of a running process (by PID), or it can launch a new command with a given CPU affinity.

CPU affinity is a scheduler property. It binds a process to a given set of CPUs on a system.

The kernel is normally pretty good about keeping running processes glued to a specific CPU to avoid context switching, but if you want to enforce which CPU a process runs on, you can use taskset. In general, you want to leave one of your CPUS free for operating system tasks.

taskset -c 1,2,3,4,5,6,7 find /data/test_xargs -type f -print0 | \
xargs -0 -P $(($(nproc)-1)) -I % gzip %

1 2	taskset -c 1,2,3,4,5,6,7 find /data/test_xargs -type f -print0 \| \ xargs -0 -P $(($(nproc)-1)) -I % gzip %

4. Overcome physical limitations with numactl

From What is NUMA and why you should care:

There are physical limitations to hardware that are encountered when many CPUs and lots of memory are required. The important limitation is that there is limited communication bandwidth between the CPUs and the memory. One architecture modification that was introduced to address this is Non-Uniform Memory Access (NUMA).

Most desktop machines only have a single NUMA node, like mine:

$ numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 15679 MB
node 0 free: 5083 MB
node distances:
node   0 
  0:  10
# Or with lscpu
$ lscpu |rg NUMA
NUMA node(s):                    1
NUMA node0 CPU(s):               0-7

$ numactl --hardware

available: 1 nodes (0)

node 0 cpus: 0 1 2 3 4 5 6 7

node 0 size: 15679 MB

node 0 free: 5083 MB

node distances:

node 0

0: 10

# Or with lscpu

$ lscpu |rg NUMA

NUMA node(s): 1

NUMA node0 CPU(s): 0-7

If you have more than one NUMA node, you may want to “pin” (set the affinity) your program so that it uses a CPU and memory in the same node. For example, on a machine with 16 cores (0-7 on node 0, and 8-15 on node 1), you could force your compression program to run on all CPUs on node 1, and to use the memory of node 1:

numactl --physcpubind 8-15 --membind=1 \
find /data/test_xargs -type f -print0 | \
xargs -0 -P $(($(nproc)-1)) -I % gzip %

numactl --physcpubind 8-15 --membind=1 \

find /data/test_xargs -type f -print0 | \

xargs -0 -P $(($(nproc)-1)) -I % gzip %

Enough CPU talk. Time to learn how to watch things.

5. Keep an eye on things

The watch command allows you to periodically run a command, and even shows you the differences between calls. Here’s the output of watch showing the output of the ls command every 10 seconds:

Every 10.0s: ls                                                                                                         orangepi5: Wed May 24 22:46:33 2023

test_file000001.gz
test_file000002.gz
test_file000003.gz
test_file000004.gz
test_file000005.gz
test_file000006.gz
test_file000007.gz
test_file000008.gz
test_file000009.gz
test_file000010.gz
...

Every 10.0s: ls orangepi5: Wed May 24 22:46:33 2023

test_file000001.gz

test_file000002.gz

test_file000003.gz

test_file000004.gz

test_file000005.gz

test_file000006.gz

test_file000007.gz

test_file000008.gz

test_file000009.gz

test_file000010.gz

...

That’s fine to detect changes within a directory, but it’s not easy to automate and it’s definitely not efficient. Wouldn’t it be nice if the kernel was able to tell you about changes to directories?

A better way to watch with inotify-tools

You may need to install this separately, but it should be easy to do. On Ubuntu:

sudo apt install inotify-tools

1	sudo apt install inotify-tools

On Fedora or similar:

sudo dnf install inotify-tools

1	sudo dnf install inotify-tools

To monitor for events on a given directory, run inotifywait:

$ inotifywait --recursive /data/test_xargs/
Setting up watches. Beware: because -r was given, this may take a while!
Watches established.

$ inotifywait --recursive /data/test_xargs/

Setting up watches. Beware: because -r was given, this may take a while!

Watches established.

Open another terminal and touch some files to simulate an event:

$ pwd
/data/test_xargs
$ touch test_file285707.gz test_file357136.gz test_file428565.gz

$ pwd

/data/test_xargs

$ touch test_file285707.gz test_file357136.gz test_file428565.gz

The original terminal gets the first event and then exits:

Watches established.
/data/test_xargs/ OPEN test_file285707.gz

1 2	Watches established. /data/test_xargs/ OPEN test_file285707.gz

That’s not very useful. The command detects only the first event. To make it listen forever, add the --monitor option:

inotifywait --recursive --monitor /data/test_xargs/

1	inotifywait --recursive --monitor /data/test_xargs/

If you touch a file again in a separate terminal, you see all events:

Setting up watches.  Beware: since -r was given, this may take a while!
Watches established.
/data/test_xargs/ OPEN test_file285707.gz
/data/test_xargs/ ATTRIB test_file285707.gz
/data/test_xargs/ CLOSE_WRITE,CLOSE test_file285707.gz
/data/test_xargs/ OPEN test_file357136.gz
/data/test_xargs/ ATTRIB test_file357136.gz
/data/test_xargs/ CLOSE_WRITE,CLOSE test_file357136.gz
/data/test_xargs/ OPEN test_file428565.gz
/data/test_xargs/ ATTRIB test_file428565.gz
/data/test_xargs/ CLOSE_WRITE,CLOSE test_file428565.gz

Setting up watches. Beware: since -r was given, this may take a while!

Watches established.

/data/test_xargs/ OPEN test_file285707.gz

/data/test_xargs/ ATTRIB test_file285707.gz

/data/test_xargs/ CLOSE_WRITE,CLOSE test_file285707.gz

/data/test_xargs/ OPEN test_file357136.gz

/data/test_xargs/ ATTRIB test_file357136.gz

/data/test_xargs/ CLOSE_WRITE,CLOSE test_file357136.gz

/data/test_xargs/ OPEN test_file428565.gz

/data/test_xargs/ ATTRIB test_file428565.gz

/data/test_xargs/ CLOSE_WRITE,CLOSE test_file428565.gz

This is less taxing to the operating system than asking for directory changes at random, and filtering the differences yourself.

Commands for quality of life

There is so much more to explore. The tips above have introduced you to some important concepts, so why not to learn much more about them?

The Ubuntu forum has a great conversation about xargs, find, ulimit, and much more. Knowledge is power.
Red Hat as a nice page about NUMA, taskset, and interrupt handling. If you’re serious about fine-tuning the performance of your processes, then you have to read this.
You liked inotify and want to use it from your Python script. Take a look at pynotify.
The find command can be intimidating, but this tutorial makes it easy to understand.
The source code for this tutorial is available in my Git repository.

Author

Jose Vicente Nunez

System Administrator/ DevOps.

View all posts

5 terminal tricks to manage your server workload

Published by Jose Vicente Nunez on 2023-06-022023-06-02

1. Directories with lots of files

Use a shell built-in

Use find when you want formatting options

Use xargs

2. Run more programs at once

Parallelism and throttling (the art of self control)

3.Maximize execution time with taskset

4. Overcome physical limitations with numactl

5. Keep an eye on things

A better way to watch with inotify-tools

Commands for quality of life

Author

Jose Vicente Nunez

Certification

Understand /dev, your filesystem of hardware

Home Lab

Create a mini-cloud with Multipass on Ubuntu Linux

Career

Top 10 essential Linux commands

5 terminal tricks to manage your server workload

Published by Jose Vicente Nunez on 2023-06-022023-06-02

1. Directories with lots of files

Use a shell built-in

Use find when you want formatting options

Use xargs

2. Run more programs at once

Parallelism and throttling (the art of self control)

3.Maximize execution time with taskset

4. Overcome physical limitations with numactl

5. Keep an eye on things

A better way to watch with inotify-tools

Commands for quality of life

Author

Jose Vicente Nunez

Related Posts

Certification

Understand /dev, your filesystem of hardware

Home Lab

Create a mini-cloud with Multipass on Ubuntu Linux

Career

Top 10 essential Linux commands