Didn't really agree with this
article as a general approach to concurrency by using unix pipes but there was a "bang-on" comment included below:
This is a classic hammer nail kind of thing where pipes and filters are a particularly old hammer and map reduce is about the same age but recently re-discovered. Neither will solve all your problems.
Yes, concurrency is hard, especially if you have no background in computer science, i.e. if you lack basic understanding of the abstractions that can make your life easier. If you do have such a background, the next step is understanding the different patterns that exist in this space. Producer consumer, semaphores, message queues, callbacks, functions without side effects, threads, blocking/non blocking IO, etc.
Still with me? Now the good news. Your needs are probably quite modest and well covered by some existing framework. Using the java.concurrent api is not exactly easy but if use properly will allow you to dodge synchronization issues.
A few useful tricks that you should practice regardless of whether you are going to run with multiple threads:
- Don't use global variables.
- Don't share object state with mutable objects.
- Use Dependency injection (i.e. don't initialize objects yourself). Keep the number of dependencies per class low.
- Separation of concerns. Make methods only one thing and keep your classes cohesive (i.e. don't dump random methods in one class).
If you do all this properly, your design will make a shared nothing approach a lot easier. Shared nothing is what you need to parallelize. Shared something means context switches and synchronization. These are the two things that make concurrent programming hard. If you can do shared nothing, concurrency is easy.
If you can't, work on how you share between processes/threads. Asynchronous is great here. Use call back mechanisms or some kind of queueing solution. Avoid manipulating semaphores and locks yourself, leave that to some off the shelf solution.
Unix pipelines are great if all processes in the pipe line are independent, don't contest the same resources, need to do about the same amount of work, and can work on partial results as they are streamed from the predecessor. If not, you've got a clogged pipe and a bunch of processes waiting for it to become unclogged.