maxcpu Architecture & Design

I am frequently trying to take full advantage of my SMP desktop machine, but have been repeatedly disappointed by my shell's inability to execute jobs on multiple CPUs. There is no way with a shell's "for" loop to tell it that all the commands in the for loop can be run separately on different CPUs.

The example I use most frequently is that of image manipulation. When I download all the images I've taken on my digital camera, they're huge: 1600x1200 pixels. I'm not in the habit of making people download such giant images with their browser, so for the images that I post on the web, I resize them to 640x480, which is a reasonable size for most web sites. To do the resizing quickly, I use the ImageMagick tool "convert" to do all the work. I don't need to mouse around and click things to make it happen (as I would if I was using the Gimp). So, I generally type in my shell:

	for i in *.jpg; do
		convert -geometry 640x480 -quality 70 $i $i
	done

This will get the job done, but it's mostly CPU intensive. There is disk access, but it's only for reading the original file, and writing back the result. The result is pure number crunching. Watching my CPU monitor on dual-CPU machine is sad, because that "for" loop executes one "convert" at a time, leaving an entire CPU idle. And as a result, I have to wait twice as long as I would have to if both CPUs were working on the task. So now, I can use both CPUs for my image processing (or anything else, for that matter). Note the added "echo" in the "for" loop:

	(for i in *.jpg; do
		echo convert -geometry 640x480 -quality 70 $i $i
	done) | maxcpu

To portably detect how many CPUs are available, I use the POSIX system call "sysconf". Unfortunately, Perl's POSIX module doesn't include the constant required for CPU detection (_SC_NPROCESSORS_ONLN). As a result, I use Inline::C to execute the detection function.