Batching async requests in under 50 lines of VanillaJS
Published 2019-6-13Today we're here to discuss batching asynchronous tasks in Vanilla JS.
// One at a time, Synchronous: Super Easy arr.forEach(fn);
// All at a time, Async: Super Easy Promise.all(arr.map(doThing));
// Some at a time... Not as easy... // insert code here...
Not too long ago I shared my secret recipe for getting over a million
downloads each week on npm:
forEachAsync
in just 7 lines of VanillaJS
This week I'll be uping the ante to talk about batching requests. Unfortunately, it's going to take more than 7 lines.
What we want
// Easy async (and sync) bounded task batching batchAsync(batchSize, things, doStuff)
TL;DR batchasync.js
Check out https://git.coolaj86.com/coolaj86/batchasync.js (also on npm)
There are a number of concerns to address here:
- Starting multiple tasks at once
- Tracking how many tasks are in progress
- Keeping the task queue full
- Keeping results in order
- Knowing when it's done
- Stopping early when there's an error
This is tested and addresses them all:
'use strict';
function batchAsync(limit, arr, doStuff) {
arr = arr.slice(0);
return new Promise(function(resolve, reject) {
var total = arr.length;
var active = 0;
var results = [];
var error;
function doMoreStuff() {
// Don't take on any more tasks if we've errored,
// or if too many are already in progress
if (error || active > limit) {
return;
}
// If there are no more tasks to start, return
if (!arr.length) {
// If everything is also *finished*, resolve
if (active < 1) {
resolve(results);
}
return;
}
// We need to dequeue the task here so the index is correct
// (keep in mind we want to support sync and async)
var index = total - arr.length;
var task = arr.shift();
active += 1;
// Spawn another task immediately,
// which will be stopped if we're at the limit
doMoreStuff();
var p;
try {
p = doStuff(task);
} catch (e) {
// we need to handle, and bubble, synchronous errors
error = e;
reject(e);
throw e;
}
// Do stuff and then decrease the active counter when done
// add support for sync by rapping in a promise
Promise.resolve(p)
.then(function(result) {
if ('undefined' === typeof result) {
throw new Error(
"result was 'undefined'. Please return 'null' to signal that you didn't just forget to return another promise."
);
}
active -= 1;
results[index] = result;
})
.then(doMoreStuff)
.catch(function(e) {
// handle async errors
error = e;
reject(e);
});
}
doMoreStuff();
});
}
Provided that you remove the comments and spaces, it's 49 lines long as promised. Yay!
How is it done?
Because I'm a nice guy, and because I enjoy trying to break things down, I'm going to try and explain the principles behind the problem solving that lead to that snippet of code.
Starting multiple tasks at once
If you wanted to fire off a bunch of tasks at once you might try the trusty while loop:
while (arr.length) {
doStuff(arr.shift());
}
That's no good for a number of reasons. First of which is that there's no control on how many things happen at once.
We could also try limiting how many things are happening like this:
var i = 0;
var limit = 4;
while (i < limit) {
i += 1;
doStuff(arr.shift());
}
That's fine, but it will only ever fire off 4 items, which mean that we need some way of starting things again.
Keeping the task queue full
We could add a doMoreStuff
function. Our new code might look like this:
var i = 0;
var limit = 4;
function doMoreStuff() {
if (arr.length) {
doStuff(arr.shift()).then(doMoreStuff);
}
}
while (i < limit) {
i += 1;
doStuff(arr.shift()).then(doMoreStuff);
}
The nice thing about that is that it does give us a constant stream of 4 things to do at a time. However, if any of the doing results in synchronous behavior (such as an item being retrieved from in-memory cache, or being a null item that needs nothing done) we we'll get a number of tasks at once above the limit.
And, if we solve that, the while
loop becomes redundant.
This brings us to our next issue:
Tracking how many tasks are in progress
We'll leave the while loop in place for just a moment more,
but begin enforcing a limit on the number of active
tasks.
var i = 0;
var limit = 4;
var active = 0;
function doMoreStuff() {
if (arr.length && active <= limit) {
active += 1;
doStuff(arr.shift())
.then(function(result) {
active -= 1;
return result;
})
.then(doMoreStuff);
}
}
while (i < limit) {
i += 1;
active += 1;
doStuff(arr.shift())
.then(function(result) {
active -= 1;
return result;
})
.then(doMoreStuff);
}
At this point you may notice that we have some bad duplication.
Not all duplication is bad, but keeping track of how many things
are being done is bound to lead to bugs, so we can simplify by
getting rid of the while
altogether now:
var limit = 4;
var active = 0;
function doMoreStuff() {
if (arr.length && active <= limit) {
active += 1;
doStuff(arr.shift())
.then(function(result) {
active -= 1;
return result;
})
.then(doMoreStuff);
}
}
doMoreStuff();
This is also a good time to invert our if condition, because now the complexity inside the if is much greater than the complexity outside of it:
var limit = 4;
var active = 0;
function doMoreStuff() {
if (!arr.length || active > limit) {
return;
}
active += 1;
doStuff(arr.shift())
.then(function(result) {
active -= 1;
return result;
})
.then(doMoreStuff);
}
doMoreStuff();
This is much more concise, but we're back down to the original problem of doing more than one thing at a time.
Since we have good over-limit protection, however, we can now be over-eager in how we start new tasks:
var limit = 4;
var active = 0;
function doMoreStuff() {
// Don't take on any more tasks
// if too many are in progress
if (!arr.length || active > limit) {
return;
}
active += 1;
// Spawn another task immediately,
// which will be stopped if we're at the limit
doMoreStuff();
// Do stuff and then decrease the active counter when done
doStuff(arr.shift())
.then(function(result) {
active -= 1;
return result;
})
.then(doMoreStuff);
}
doMoreStuff();
This is looking pretty good, except that we have no idea when this will complete.
Knowing when it's done
For this we'll want to check when active
is back down to zero
and there are no more things to do. This way if the task
limit is 1, we won't quit early when the counter momentarily
goes down to 0.
We'll also use a Promise to give us the ability to resolve the whole process:
var limit = 4;
new Promise(function(resolve, reject) {
var active = 0;
function doMoreStuff() {
// Don't take on any more tasks
// if too many are in progress
if (active > limit) {
return;
}
// If there are no more tasks to start, return
if (!arr.length) {
// If everything is also *finished*, resolve
if (active < 1) {
resolve();
}
return;
}
active += 1;
// Spawn another task immediately,
// which will be stopped if we're at the limit
doMoreStuff();
// Do stuff and then decrease the active counter when done
doStuff(arr.shift())
.then(function(result) {
active -= 1;
return result;
})
.then(doMoreStuff);
}
doMoreStuff();
});
Things are really coming together, but we still have no way
to get the results
of all the work that's being done.
We could "leave it as an exercise for the implementor"
and just say "your doStuff
function needs to keep track of results",
but that's not very nice - and like I said, I'm a nice guy.
Keeping results, in order
It's about time to wrap this in a function, so we can actually return results. Additionally, we want those results in order.
What we need to do is faily simple - just tack the results onto an
array, just like Promise.all()
would do.
However, if we did just that we'd get things in random order
(which is bad). Instead we'll need to have each task in a
closure that keeps track of what number it is and assign results
to that index
.
We'll introduce total
to store the original number of tasks
and substract our decreasing length
to get a higher count up each time:
function batchAsync(limit, arr, doStuff) {
// make a copy of the original array to preserve it
arr = arr.slice(0);
return new Promise(function(resolve, reject) {
var total = arr.length;
var active = 0;
var results = [];
function doMoreStuff() {
// Don't take on any more tasks
// if too many are in progress
if (active > limit) {
return;
}
// If there are no more tasks to start, return
if (!arr.length) {
// If everything is also *finished*, resolve
if (active < 1) {
resolve(results);
}
return;
}
// We need to dequeue the task here so the index is correct
// (keep in mind we want to support sync and async)
var index = total - arr.length;
var task = arr.shift();
active += 1;
// Spawn another task immediately,
// which will be stopped if we're at the limit
doMoreStuff();
// Do stuff and then decrease the active counter when done
// add support for sync by wrapping in a promise
Promise.resolve(doStuff(task))
.then(function(result) {
active -= 1;
results[index] = result;
})
.then(doMoreStuff);
}
doMoreStuff();
});
}
With those few additions, we've got a complete solution for the cases when everything goes right.
The only thing we're missing is how to handle it when things go horribly wrong.
Handling errors, and stopping early on errors
Here's what could go wrong:
doStuff()
throws a synchronous errordoStuff()
gives a rejection- tasks keep batching after
doStuff()
gives a rejection
We'll need to use a try
/catch
to handle possible synchronous errors
and a .catch(e)
for possible Promise rejections.
In this implementation I'll also cause a single rejection to stop doing tasks. That may or may not be desirable for your use case.
function batchAsync(limit, arr, doStuff) {
// make a copy of the original array to preserve it
arr = arr.slice(0);
return new Promise(function(resolve, reject) {
var total = arr.length;
var active = 0;
var results = [];
// to keep track of error state
var error;
function doMoreStuff() {
// bail immediately if there's been an error
if (error || active > limit) {
return;
}
if (!arr.length) {
if (active < 1) {
resolve(results);
}
return;
}
var index = total - arr.length;
var task = arr.shift();
active += 1;
doMoreStuff();
var p;
try {
p = doStuff(task);
} catch (e) {
// handle synchronous errors
error = e;
reject(e);
// we need synchronous errors to bubble up
throw err;
}
Promise.resolve(p)
.then(function(result) {
if ('undefined' === typeof result) {
throw new Error(
"result was 'undefined'. Please return 'null' to signal that you didn't just forget to return another promise."
);
}
active -= 1;
results[index] = result;
})
.then(doMoreStuff)
.catch(function(e) {
// handle async errors
error = e;
reject(e);
});
}
doMoreStuff();
});
}
As icing on the cake, I also throw
when the result
is undefined.
This can save hours of headaches when debugging asnyc code:
If you always expect a return of null
or false
as a way
of explicitly signaling that there is no data to pass back,
you'll never get caught forgetting to return a promise.
Where to take this next
My batching function is very generic and appropriate for general use, but there are some special things that you might want to do:
- make an object that tracks a live task queue (not copy of a static array)
- allow one-off tasks over the limit
- on-the-fly prioritization (re
.sort()
ing before each thing, based on some criteria) - signal a cancel event to
doStuff()
If you're reading this article then... maybe those are a bit more advanced, but at the very least they give you some ideas.
Have at it.
Tests
If you want to try your hand at rolling your own batchAsync
function,
you may find the zero-framework test.js
found at https://git.coolaj86.com/coolaj86/batchasync.js
to be useful.
This type of async magic is difficult to get right,
especially when something like caching or null
inputs,
rightfully causes your code to execute a synchronous Promise
(or other synchronous code) - so this will help, I promise.
By AJ ONeal
Did I make your day?
Buy me a coffee
(you can learn about the bigger picture I'm working towards on my patreon page )