Reply to comment
Partitioning vs. Parallel Processing
Just spent more time reading the "Spring Batch" documentation. It's taking a while to sink in. What's kind of struck me as amusing is that I trying to understand the difference between "Remote Chunking" and Partitioning, and I had forgotten that that I had read how they are different just yesterday...
To summarize my learning, in non-PHD terminology:
You can increase throughput of a batch process by adding parallelism to your processing, that is, multiple processes that are handling different pieces of data. How you decide to break up the data for the different parallel processes is where you get into "remote chunking" and partitioning.
Remote Chunking is what you use if you don't have a structural understanding of the data. Flat files could be such an example.
Partitioning is what you could use if you can obtain some knowledge of your data. For example, if you have an order table, columns such as order date, order id or sku could all function as a means to partition your data.
