A look at the variations on how to batch run tasks in parallel
I encountered the batch execution problem many times on many variations. Probably, you would’ve too. It is one of the most interesting problems to tackle on the technical spectrum rather than during interviews.
There are certain languages such as Java that make life easier while other languages make it a challenge. In this post, I will discuss the problem using Typescript.
Given an array of tasks to execute with limited resources, what is the best way to execute these tasks in batches in a way that after a task finishes executing, a new pending task replaces it?
- You have many SQL queries to execute through a JDBC with limited sessions.
- You want to batch running tasks with a retry mechanism
- The tasks have dependency relations among each other.
The complexity of the solutions goes up with the complexity of the definition of the task. If all the tasks consume the same amount of time, the next solution is the simplest one.
A simple solution would be to divide these tasks into groups and execute a group after the other synchronously while the tasks in the group under execution are executed simultaneously. When all the tasks in the current group are executed, the next group comes in.
Let’s demonstrate this solution with a simple problem — given 10 tasks where the essence of each task consumes a constant amount of time. We want to batch run these tasks 3 at a time.
The definition of the tasks would look like this:
We would like to run these tasks in batches as follows:
The tasks executor is straightforward:
That solution is neat. The output would look like this:
What if one of the tasks was finished earlier?
This solution will not replace the finished tasks with pending ones. That’s a waste of time.
Therefore, solution #2 comes in handy when the tasks run time varies.
Like a grinder, each time a bean is crushed, another bean replaces it. With the grinder in mind, how about creating several workers. Each worker pulls a task from a queue and executes it.
Upon finishing, the worker either takes the next task or enqueues it back in case it fails. The retries mechanism may become a deal-breaker for this solution. Therefore, let’s tackle it without any retries mechanism.
The first solution won’t work if the task’s run time varies. This solution will solve this issue. Let’s create our tasks first:
We have tasks with random runtime between 1 to 30ms.
We want to solve our 10 tasks using 4 workers. Each worker has an index between 1 to 4. The worker runs in a loop until the tasks queue is empty:
The workers are triggered once using a
After running this example, the output would be:
- Updating pending tasks: The code is simple. However, if we want to add new or failed tasks back to the queue, we may need to employ new workers instead of any retired ones so we keep up with the expected level of batching. It is a less appealing solution since it requires additional management to keep up with the expected level. Each worker needs to notify upon their retirement and each employed worker needs to be registered in order to keep the batch balance.
- Dependency: What if there is a dependency relationship between the tasks? Determining a constant number of workers won’t be the right solution for dependencies. For example, if we run a topological sort on the tasks and on the first level, we have one task that all the rest depends on. In this case, creating several workers that only one of them is actually working while the rest are locked pending for that worker to finish, is madness and bad behavior.
I remember the first time I had to solve this issue was back when I worked for Amazon. I had a task to design and implement a ranking model for search results. The search results would go through processes that rank them. Some of the processes depend on each other while some could run in parallel. In my design, I used the Observer design pattern.
The idea is to run a topological sort on the tasks and whenever a task or a group of tasks finish executing, it will publish a subject to notify on that. I implemented this mechanism using Java. I depended on the
Completablefuture so a task would lock itself on the prerequisite subjects until they are published and then it will continue the execution.
Allow me to brag about it. I was really proud of the outcome. My design document was a masterpiece and the work plan was 100% accurate with all the disturbance around. It was a success for me, so much effort I put there that made me believe in my capabilities.
This solution is perfect when the execution plan is well-defined. However, if new tasks need to be added or a retry-mechanism in a FIFO pattern, that would require management overhead.
We cannot implement this solution when there is no dependency among the tasks and the tasks queue is volatile. In this case, we may easily create starvation. A task may fail and due to hopeless retries, its successor would need to be pending. During this waiting time, the rest of the tasks may already have been completed while there is a waiting successor that is starving to be executed.
The starvation drawback can be solved by eliminating the chain idea where each subject is subscribed by a single subscriber. Once a task finishes execution, it chooses a task from the pending group and triggers it. Once that trigger happens, the first task lifetime comes to an end. This resembles the second solution without the loops.
I introduced various combinations for the batch running problem with emphasis on a technical solution approach rather than a theoretical approach. The technical approach best fits the programmer’s daily tasks while the theoretical approach is used in interviews.
If you would like to have a look at the code, visit my repository here.
The code may not be finite.