Clustering is combining multiple (usually cheap) CPUs to produce one computing system that is much more capable than any of the single CPUs. The first popular cluster (that I know of) was built by a group at NASA that had huge computing needs by a budget that had been cut. They could not afford the large supercomputers that they needed, so they parallelized the cheaper CPUs into something they called Beowulf. Go here to find out more about Beowulf class clusters.

Not all computing tasks lend themselves to running on a cluster. The task needs to be able to be parallelized. One example would be if you were processing a huge list (like maybe DNA chains). This list can be split into halves, thirds, or N number of sections, N being the number nodes in your cluster. In this case a million dollar supercomputer maybe be able to churn through the list in 10 days. If you had 100 $1000 nodes that could churn through 1/100th of the list in 10 days, then we have solved the problem with $100,000 dollars instead of $1,000,000.

Clustering mostly involves writing your application from the ground up to parallelize your task. An application that was written without parallelization in mind has little chance (with exceptions) of gaining extreme benefit from running on a cluster. There are libraries and development systems that help you build parallelization into your apps without expending a lot of effort. MPI is one such library, PVM is another. Mosix is a set of patches to the standard Linux kernel that tries to alleviate some of this programming overhead for you. It makes a cluster of Mosix patched Linux machines look like one machine with X number of processors. With Mosix, your application only has to be written mutlithreaded to take advantage of the cluster. A perfect example of this is the Apache web server. Apache by default forks off multiple instances of itself to serve user requests. In a Mosix Cluster, these instances could be running on each of the nodes in the cluster. If you don't know what forking is, go here. It's not a very easy explanation for non programmers, but it's the only thing I could find.