Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm by no means an expert at machine learning, but, given how organized the scikit learn libraries are, building a simple classifier as show in the example would be a few days of work at the max. In fact, an initial first version can be built within a day. After that, one has to tune the hyper parameters and spend time with feature selection to improve the baseline accuracy.

The most important thing will be the training data. You need a good number of samples, and the data also needs to be reasonably "clean".



At US rates, that smells like a few tens of thousands of dollars. At the core of my "concern" is that magnitudes of turnover, margins, and required increases in sales due to the analysis make application of the idea uneconomical.

To put it another way, the business case feels week most [i.e. small] profit seeking enterprises.


"few days of work" for "tens of thousands of dollars" seems a bit absurd. Are you assuming they are making 10K per day? Seems a bit high. I would assume 200-300/hour tops.


The amount of time you can spend preparing the training data is unbounded. The number of times you can do the training with data that ends up not actually looking like what you see in-the-wild a week later is unbounded. When all is said and done, the yak-shaving alone will be tens of thousands.


At the rate of a couple of hundred bucks an hour that I'd expect to pay for a qualified consultant, 200 hours works out toward the high end of the few in "a few tens of thousands".


How do you cram 200 hours into a few days?


Just like with the vast majority of project forecasts the "few days" is what you say to get the sale - internally or to outside clients. If you think it is that simple, well, I would like to sell you just a few days of machine learning expertise if you have a project... :) Even very simple tasks that you can let the intern do can - and often does - take days longer than projected.


I think I misread the comment as hours.


There are lots of tools to help with the hyperparameter problem to make that faster/cheaper as well. This problem is often orthogonal to the domain expertise required to do good feature selection.

Scikit-learn implements things like grid search natively [0], and tools like SigOpt [1] (YC W15, disclosure: I'm a founder) do this automatically as well.

[0]: http://scikit-learn.org/stable/modules/generated/sklearn.gri...

[1]: https://github.com/sigopt/sigopt_sklearn




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: