Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While Whoosh isn't there yet, I don't think all open source libraries have the same weaknesses.

The two dimensions you described are synonym expansion and stemming. Lucene (and Solr, a search server that uses Lucene) can handle these two dimensions by ordering the analysis of indexed text and query terms appropriately - perform synonym expansion first, then apply stemming. The default scoring algorithm works well in a lot of cases, and Lucene/Solr's architecture allows plugging in a custom scoring implementation to boost individual keywords. Misspellings and wildcard handling is also available out of the box, and you can override most of the functionality easily. Reddit, Netflix, CNET, Digg etc. have all deployed Solr/Lucene on their public sites (http://wiki.apache.org/solr/PublicServers).



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: