Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OK, fair enough.

But it looks like there's some awareness of these issues in that project, I thought you were trying to pigeonhole this as a glorified hashtable or something. Quoting from the Whoosh docs:

"Like its ancestor Lucene, Whoosh is not really a search engine, it's a toolkit for creating a search engine."

Lucene has support for proximity queries, fuzzy "minimumSimilarity", etc., and it looks like Whoosh wants to be a tool like this.

Would "search engine library" be a more appropriate term for you? Something that is used to construct a useful search engine.



Search engine library is better, ya.

However, creating the search engine is the hard problem. It helps to have tools to make some of the outlying problems a bit easier, but the challenge is being able to calculate a configurable set of multi-dimensional scores.

There are 3 places the scoring takes place: indexing-time, run-time, and search-time. Every solution (commercial and open-source) has an indexing process. Most have some run-time processing but only a few have multi-dimensional scoring at search-time and that is the coup de grâce.

When you make the relations configurable and increase the data set to GB or TB you have several incredibly difficult problems. There are commercial products that do it, big and small, but to my knowledge there are no open source projects that can.

In other words, the state of commercial products is so far ahead of open source ones that it's misleading to refer to both as "search engines".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: