Today we're here to discuss batching asynchronous tasks in Vanilla JS.

// One at a time, Synchronous: Super Easy
arr.forEach(fn);
// All at a time, Async: Super Easy
Promise.all(arr.map(doThing));
// Some at a time... Not as easy...
// insert code here...

Not too long ago I shared my secret recipe for getting over a million downloads each week on npm: forEachAsync in just 7 lines of VanillaJS

This week I'll be uping the ante to talk about batching requests. Unfortunately, it's going to take more than 7 lines.

What we want

// Easy async (and sync) bounded task batching
batchAsync(batchSize, things, doStuff)

TL;DR batchasync.js

Check out https://git.coolaj86.com/coolaj86/batchasync.js (also on npm)

There are a number of concerns to address here:

  • Starting multiple tasks at once
  • Tracking how many tasks are in progress
  • Keeping the task queue full
  • Keeping results in order
  • Knowing when it's done
  • Stopping early when there's an error

This is tested and addresses them all:

'use strict';

function batchAsync(limit, arr, doStuff) {
    arr = arr.slice(0);
    return new Promise(function(resolve, reject) {
        var total = arr.length;
        var active = 0;
        var results = [];
        var error;

        function doMoreStuff() {
            // Don't take on any more tasks if we've errored,
            // or if too many are already in progress
            if (error || active > limit) {
                return;
            }

            // If there are no more tasks to start, return
            if (!arr.length) {
                // If everything is also *finished*, resolve
                if (active < 1) {
                    resolve(results);
                }
                return;
            }

            // We need to dequeue the task here so the index is correct
            // (keep in mind we want to support sync and async)
            var index = total - arr.length;
            var task = arr.shift();
            active += 1;

            // Spawn another task immediately,
            // which will be stopped if we're at the limit
            doMoreStuff();

            var p;
            try {
                p = doStuff(task);
            } catch (e) {
                // we need to handle, and bubble, synchronous errors
                error = e;
                reject(e);
                throw e;
            }
            // Do stuff and then decrease the active counter when done
            // add support for sync by rapping in a promise
            Promise.resolve(p)
                .then(function(result) {
                    if ('undefined' === typeof result) {
                        throw new Error(
                            "result was 'undefined'. Please return 'null' to signal that you didn't just forget to return another promise."
                        );
                    }
                    active -= 1;
                    results[index] = result;
                })
                .then(doMoreStuff)
                .catch(function(e) {
                    // handle async errors
                    error = e;
                    reject(e);
                });
        }

        doMoreStuff();
    });
}

Provided that you remove the comments and spaces, it's 49 lines long as promised. Yay!

How is it done?

Because I'm a nice guy, and because I enjoy trying to break things down, I'm going to try and explain the principles behind the problem solving that lead to that snippet of code.

Starting multiple tasks at once

If you wanted to fire off a bunch of tasks at once you might try the trusty while loop:

while (arr.length) {
    doStuff(arr.shift());
}

That's no good for a number of reasons. First of which is that there's no control on how many things happen at once.

We could also try limiting how many things are happening like this:

var i = 0;
var limit = 4;
while (i < limit) {
    i += 1;
    doStuff(arr.shift());
}

That's fine, but it will only ever fire off 4 items, which mean that we need some way of starting things again.

Keeping the task queue full

We could add a doMoreStuff function. Our new code might look like this:

var i = 0;
var limit = 4;

function doMoreStuff() {
    if (arr.length) {
        doStuff(arr.shift()).then(doMoreStuff);
    }
}

while (i < limit) {
    i += 1;
    doStuff(arr.shift()).then(doMoreStuff);
}

The nice thing about that is that it does give us a constant stream of 4 things to do at a time. However, if any of the doing results in synchronous behavior (such as an item being retrieved from in-memory cache, or being a null item that needs nothing done) we we'll get a number of tasks at once above the limit.

And, if we solve that, the while loop becomes redundant.

This brings us to our next issue:

Tracking how many tasks are in progress

We'll leave the while loop in place for just a moment more, but begin enforcing a limit on the number of active tasks.

var i = 0;
var limit = 4;
var active = 0;

function doMoreStuff() {
    if (arr.length && active <= limit) {
        active += 1;
        doStuff(arr.shift())
            .then(function(result) {
                active -= 1;
                return result;
            })
            .then(doMoreStuff);
    }
}

while (i < limit) {
    i += 1;
    active += 1;
    doStuff(arr.shift())
        .then(function(result) {
            active -= 1;
            return result;
        })
        .then(doMoreStuff);
}

At this point you may notice that we have some bad duplication. Not all duplication is bad, but keeping track of how many things are being done is bound to lead to bugs, so we can simplify by getting rid of the while altogether now:

var limit = 4;
var active = 0;

function doMoreStuff() {
    if (arr.length && active <= limit) {
        active += 1;
        doStuff(arr.shift())
            .then(function(result) {
                active -= 1;
                return result;
            })
            .then(doMoreStuff);
    }
}

doMoreStuff();

This is also a good time to invert our if condition, because now the complexity inside the if is much greater than the complexity outside of it:

var limit = 4;
var active = 0;

function doMoreStuff() {
    if (!arr.length || active > limit) {
        return;
    }
    active += 1;
    doStuff(arr.shift())
        .then(function(result) {
            active -= 1;
            return result;
        })
        .then(doMoreStuff);
}

doMoreStuff();

This is much more concise, but we're back down to the original problem of doing more than one thing at a time.

Since we have good over-limit protection, however, we can now be over-eager in how we start new tasks:

var limit = 4;
var active = 0;

function doMoreStuff() {
    // Don't take on any more tasks
    // if too many are in progress
    if (!arr.length || active > limit) {
        return;
    }
    active += 1;

    // Spawn another task immediately,
    // which will be stopped if we're at the limit
    doMoreStuff();

    // Do stuff and then decrease the active counter when done
    doStuff(arr.shift())
        .then(function(result) {
            active -= 1;
            return result;
        })
        .then(doMoreStuff);
}

doMoreStuff();

This is looking pretty good, except that we have no idea when this will complete.

Knowing when it's done

For this we'll want to check when active is back down to zero and there are no more things to do. This way if the task limit is 1, we won't quit early when the counter momentarily goes down to 0.

We'll also use a Promise to give us the ability to resolve the whole process:

var limit = 4;
new Promise(function(resolve, reject) {
    var active = 0;

    function doMoreStuff() {
        // Don't take on any more tasks
        // if too many are in progress
        if (active > limit) {
            return;
        }

        // If there are no more tasks to start, return
        if (!arr.length) {
            // If everything is also *finished*, resolve
            if (active < 1) {
                resolve();
            }
            return;
        }

        active += 1;

        // Spawn another task immediately,
        // which will be stopped if we're at the limit
        doMoreStuff();

        // Do stuff and then decrease the active counter when done
        doStuff(arr.shift())
            .then(function(result) {
                active -= 1;
                return result;
            })
            .then(doMoreStuff);
    }

    doMoreStuff();
});

Things are really coming together, but we still have no way to get the results of all the work that's being done.

We could "leave it as an exercise for the implementor" and just say "your doStuff function needs to keep track of results", but that's not very nice - and like I said, I'm a nice guy.

Keeping results, in order

It's about time to wrap this in a function, so we can actually return results. Additionally, we want those results in order.

What we need to do is faily simple - just tack the results onto an array, just like Promise.all() would do.

However, if we did just that we'd get things in random order (which is bad). Instead we'll need to have each task in a closure that keeps track of what number it is and assign results to that index.

We'll introduce total to store the original number of tasks and substract our decreasing length to get a higher count up each time:

function batchAsync(limit, arr, doStuff) {
    // make a copy of the original array to preserve it
    arr = arr.slice(0);
    return new Promise(function(resolve, reject) {
        var total = arr.length;
        var active = 0;
        var results = [];

        function doMoreStuff() {
            // Don't take on any more tasks
            // if too many are in progress
            if (active > limit) {
                return;
            }

            // If there are no more tasks to start, return
            if (!arr.length) {
                // If everything is also *finished*, resolve
                if (active < 1) {
                    resolve(results);
                }
                return;
            }

            // We need to dequeue the task here so the index is correct
            // (keep in mind we want to support sync and async)
            var index = total - arr.length;
            var task = arr.shift();
            active += 1;

            // Spawn another task immediately,
            // which will be stopped if we're at the limit
            doMoreStuff();

            // Do stuff and then decrease the active counter when done
            // add support for sync by wrapping in a promise
            Promise.resolve(doStuff(task))
                .then(function(result) {
                    active -= 1;
                    results[index] = result;
                })
                .then(doMoreStuff);
        }

        doMoreStuff();
    });
}

With those few additions, we've got a complete solution for the cases when everything goes right.

The only thing we're missing is how to handle it when things go horribly wrong.

Handling errors, and stopping early on errors

Here's what could go wrong:

  • doStuff() throws a synchronous error
  • doStuff() gives a rejection
  • tasks keep batching after doStuff() gives a rejection

We'll need to use a try/catch to handle possible synchronous errors and a .catch(e) for possible Promise rejections.

In this implementation I'll also cause a single rejection to stop doing tasks. That may or may not be desirable for your use case.

function batchAsync(limit, arr, doStuff) {
    // make a copy of the original array to preserve it
    arr = arr.slice(0);
    return new Promise(function(resolve, reject) {
        var total = arr.length;
        var active = 0;
        var results = [];
        // to keep track of error state
        var error;

        function doMoreStuff() {
            // bail immediately if there's been an error
            if (error || active > limit) {
                return;
            }

            if (!arr.length) {
                if (active < 1) {
                    resolve(results);
                }
                return;
            }

            var index = total - arr.length;
            var task = arr.shift();
            active += 1;

            doMoreStuff();

            var p;
            try {
                p = doStuff(task);
            } catch (e) {
                // handle synchronous errors
                error = e;
                reject(e);
                // we need synchronous errors to bubble up
                throw err;
            }
            Promise.resolve(p)
                .then(function(result) {
                    if ('undefined' === typeof result) {
                        throw new Error(
                            "result was 'undefined'. Please return 'null' to signal that you didn't just forget to return another promise."
                        );
                    }
                    active -= 1;
                    results[index] = result;
                })
                .then(doMoreStuff)
                .catch(function(e) {
                    // handle async errors
                    error = e;
                    reject(e);
                });
        }

        doMoreStuff();
    });
}

As icing on the cake, I also throw when the result is undefined. This can save hours of headaches when debugging asnyc code:

If you always expect a return of null or false as a way of explicitly signaling that there is no data to pass back, you'll never get caught forgetting to return a promise.

Where to take this next

My batching function is very generic and appropriate for general use, but there are some special things that you might want to do:

  • make an object that tracks a live task queue (not copy of a static array)
  • allow one-off tasks over the limit
  • on-the-fly prioritization (re.sort()ing before each thing, based on some criteria)
  • signal a cancel event to doStuff()

If you're reading this article then... maybe those are a bit more advanced, but at the very least they give you some ideas.

Have at it.

Tests

If you want to try your hand at rolling your own batchAsync function, you may find the zero-framework test.js found at https://git.coolaj86.com/coolaj86/batchasync.js to be useful.

This type of async magic is difficult to get right, especially when something like caching or null inputs, rightfully causes your code to execute a synchronous Promise (or other synchronous code) - so this will help, I promise.


By AJ ONeal

If you loved this and want more like it, sign up!


Did I make your day?
Buy me a coffeeBuy me a coffee  

(you can learn about the bigger picture I'm working towards on my patreon page )



Published

2019-6-13



Buy me a coffeeBuy me a coffee


  73% of Goal Reached


Want more like this?