Today I released Simple Statistics 3.0.0.
Like other projects, it follows semver - so the jump from 2.5.0 to 3.0.0 was required because I changed something in a non-backwards compatible way. That thing is how Simple Statistics handles invalid input, like what it does if you request the maximum number out of an empty array. Until 3.0.0, I chose to return NaN when given invalid input. In 3.0.0 and beyond, Simple Statistics will throw an Error instead.
There are a few additional improvements in the release:
combineMeans
, subtractFromMean
, and combineVariances
- methods contributed
by Guillaume Plique that make online
statistics easier to implement, because they let you incrementally calculate
new aggregates with new data instead of completely re-calculating them.simple-statistics
performance to that of other libraries. This experiment
will continue - I’d love feedback on the methodology,
to make sure that it doesn’t bias toward any implementation. I’m still doing research
to determine why jStat is winning in several benchmarks, and having a great
time reading other implementations for inspiration, and finding a potential bug
because of some suspiciously good performance.Here are the benchmark results so far:
Simple Statistics | science.js | jStat | mathjs | |
---|---|---|---|---|
variance | 99,565 | 92,064 | 305,801 | |
median | 54,497 | 5,199 | 17,215 | 1,432 |
mode | 4,595 | 2,311 | 10,078 | 1,049 |
medianAbsoluteDeviation | 17,373 | 522 | ||
min | 384,394 | 528,290 | 41,598 |
The unfortunate truth is that JavaScript doesn’t have solid norms for error handling.
Even in the core language, some methods throw
error objects when things go wrong,
others return undefined
, others return special values like -1, like indexOf.
This confusion was worsened by word that using thrown errors (exceptions) was a performance drag, as documented by the bluebird project.
Thankfully, the V8 project, the JavaScript engine that powers Node.js & Chrome, fixed that performance uncertainty and try/catch is now performant.
For Simple Statistics, I decided to try using NaN as the ‘invalid’ value. Since the library is performance-related, I wanted to avoid what I thought was a potential performance drag, and NaN conveniently is considered to be a number, by JavaScript and Flow’s convention.
Previously, then, you might write your standard deviation command-like utility like:
#!/usr/bin/env node
var variance = require('simple-statistics').variance;
var inputs = process.argv.slice(2).map(parseFloat);
var result = variance(inputs);
if (isNaN(result)) {
console.log('Something went wrong');
process.exit(1);
}
console.log(result);
Then you could try it out:
/tmp/test〉variance.js 1 2
0.25
/tmp/test〉variance.js
Something went wrong
With simple-statistics 3, you can skip the isNaN
check: simple-statistics
itself will throw an error if there is one. I could go on about the pros & cons,
but I’ll just list what’s top of mind:
undefined
or NaN
,
which can propagate through an equation, leaving you wondering where it went
wrong.NaN
by comparing it to NaN
, the same way you could test for
undefined
. Unfortunately, NaN === NaN
is false.Check it out: Simple Statistics 3.0.0.