An Ode to Little Data

11001 · on Nov 22, 2013

Little innovation in "little data"? What about the whole academic field of Statistics? Most of the innovation is about how to make maximal use of limited/censored/missing data? I don't hear "big data" folks talking much about multiple imputations, structural equation modeling, mixed-effects regressions, etc.

glutamate · on Nov 22, 2013

Michael Jordan (the other Michael Jordan) wrote a nice article about how the long tail of big data is little data and hierarchical models etc. should work really well there.

http://bayesian.org/sites/default/files/fm/bulletins/1106.pd...

hackula1 · on Nov 22, 2013

As someone who writes predictive analytics software, I can say that you are dead on. Most of analytics is a process of doing one of two things:

1) Extrapolation of data where samples are lacking.

2) Taking big data and making it little data so that you can actually comprehend it.

My goal in practically every algorithm/tool I write is to take a several billion records and condense it into something that can be put in a spreadsheet, put on a chart, or rendered on a map. Analysts roll their eyes every time some big data guru releases another map with 7 trillion points on it. "Oh so you took 3 weeks rendering a map that looks like yet another population map, when you could have rendered this instantaneously with a choropleth?"

pskomoroch · on Nov 22, 2013

"90% of the world is using Excel and similar tools to analyze and visualize their data."

These types of statements always prompt me to ask "what is the specific problem you are trying to solve?". Replacing Excel? What is the unique advantage beyond a different UI? I think the real product challenge here is identifying why using a different UI or tool would have ROI for the average business user the author is describing. If it doesn't, why would they switch? What is your startup's tool going to do to increase my profits and justify the investment and switching cost?

I've often found it is the analyst and approach, not the tools that make a huge difference on these problems, and for those who care deeply about tools there are many open source options (R, SciPy, etc). Creating more generic "small data" tools runs the risk of solving a problem customers don't have.

zrail · on Nov 22, 2013

I posted some thoughts about "little data" from a more personal angle earlier this month[1] (hn discussion [2]). The solution that the OP is hinting at might very well be something that helps normal people organize and ask questions about the data they have on hand and scattered around the Internet. It'd be cool if I could mix and mash up little applications and visualizations with my data without actually pushing that data anywhere I don't have control over, though.

[1]: https://www.petekeen.net/little-data

[2]: https://news.ycombinator.com/item?id=6718422

glamp · on Nov 22, 2013

I've never understood why people like "big data". Small/medium data is way more fun!

glaugh · on Nov 22, 2013

Definitely a lot of different angles on little data. Our aim at Statwing is to take the power of statistical analysis and embed it in a UI that's easier to use than Excel's PivotTables (via automated selection and interpretation of statistical tests). And hopefully in doing so we're filling in the gap between Excel and tools like R.

https://www.statwing.com/demo

That said, I certainly think Paul's approach makes a lot of sense, and I hope someone builds that. Closest thing I can think of is http://anapsis.com/

</pitch>

glutamate · on Nov 22, 2013

On bayeshive.com, we have this cool little feature where if people want to build a more complicated statistical model that involves entering an equation - think non-linear regression or a dynamical system - then you can save that equation and share it with the other users, or run it against other datasets.

We haven't figured about how to extend that to visualisations, which I think is what the OP is really looking for as well.

christopheraden · on Nov 22, 2013

False dichotomy about Excel vs. DSLs like R. "90% of the world is using Excel and similar tools to analyze and visualize their data. [...] What’s the next step-up from the spreadsheet? Learning how to code. Learning stats. Leaving the comfort of a spreadsheet’s visual display for R, maybe?" To make this statement would be ignoring that numerous non-free statistical packages have offered GUI front-ends for years. SAS/EG, JMP, Minitab, and SPSS instantly come to mind. Two out of those four are even marketed directly to people as statistical extensions of spreadsheets. Granted, the "long tail" will still need to learn a little bit of stats, but I fail to see how this is a problem that Excel solves (unless we're talking only about plots).

I don't think it's a hard leap to consider that if the big-box statistical package companies realized how much of their money came from industry, they'd do what they could to make their software seem like an alluring proposition. Statistical software costs an order of magnitude more than Excel, so they'd need pretty good arguments on how to sell upper management that the business team actually needed an 8000 dollar piece of software.

I'm not sold that there's nothing in between Excel and R. From my experience, they require a slight learning curve (nowhere near the learning curve of going from Excel to R), but not an insurmountable one. What these solutions lack is the name recognition that Excel has, or decent integration into a MS Office stack (exceptions, of course--I remember seeing a statistics toolbox for Excel once), or they cost too much.

I think part of the real problem is that for a lot of companies, Excel is "good enough". There's plenty of stuff it can't do well. It chokes on larger data sets, has limited statistical functionality, poor scripting capabilities, and shaky random number generators. But it's good enough for people who don't want to do much with their data.

If they wanted to do harder-core analysis, they'd outsource it to an analytics team. This perspective comes from a viewpoint inside BigCo. The prohibitive cost of some of these solutions might be a harder pill for a small firm to swallow.

hax0rsehat · on Nov 22, 2013

Interesting