Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeh I tend to agree. Real value comes from carefully curating the data and applying smart optimizations, which is something few companies focus on. But I also get the sense that a lot of energy ends up being spent elsewhere - on integration, infrastructure, lots of fragmented OS libraries, etc at the expense of iteration speed and relevance-focused experimentation.


I was frustrated with enterprise search vendors and their customers because they didn't see it my way. Here are some ways of thinking about it.

Most cynically, enterprise software is bought by different people than those who use it. The buyers have a list of items to check and the fastest way to get eliminated is to not have an integration for a data source they have so vendors will put up a comprehensive list of them on their web site. The buyers will never test the relevance of the results against their data, though the users will feel it every day, unless the search engine is so bad that they just don't use it. (Common!)

On the other hand, if the integration doesn't work, you get recall of 0% no matter how smart and well tuned your search engine is.

I think a lot of founders and data scientists believe in a variant of the Pareto principle which comes down to "I want to do the 20% of the work that gets me 80% of the way there". The trouble is that a minimum viable product has to be viable, and you have to get to 100% of that minimum or you are always going to be a bridesmaid and never a bride.

The awful truth about data science, relevance, ML and all that is that data is dirty and takes a huge amount of work to wrangle. If you want "iteration speed and relevance-focused experimentation" you have to make investments in product, people and process to run more cycles in less calendar time. Look up my profile and ping me if you want to hear war stories.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: