Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: need your help optimizing substring MySQL query
1 point by rorrr on Dec 10, 2010 | hide | past | favorite | 3 comments
Imagine we have this table with a million unique rows.

    | id |        url        |
    +----+-------------------+
    | 1  | google.com        |
    | 2  | yahoo.com         |
    | 3  | reddit.com        |
    | 4  | digg.com          |
    | ...| ...               |
I'm running queries like

    SELECT id,url 
    FROM table
    WHERE url LIKE '%hoo%'
When you have a million rows, this query is slow (around 2-5 sec depending on what you search for). I need to make it fast.

So far the fastest solution I found is to concatenate everything into one long string, and do regular sub-string searches in the app (not the DB). It's a horrible and ugly solution, and I have re-build that string every time we insert a new row.

So the string looks like

    1|google.com#2|yahoo.com#3|reddit.com|4#digg.com ...
It's pretty long, but substring searches on it are pretty fast. I get the same results in around 70-200ms.

I would prefer to find a non-ugly solution.

thanks!



The main issue is that since you are using %searchterm% you aren't getting any real use out of indexes as you would when using something like searchterm%.

You could have a second table that has permutations of the data allowing indexing of the content, but your records would increase dramatically in that second table;

i.e.

    | id | link_id | url_permuted |
    +----+---------+--------------+
    | 1  | 1       | oogle.com    |
    | 2  | 1       | ogle.com     |
    | 3  | 1       | gle.com      |
    | 4  | 1       | le.com       |
This would allow you to search by searchterm% while gaining use of indexing...

Not sure if this is the most elegant solution to your current problem though - and it could result in multiple hits to the same record...


Thanks for the idea.

Many of the URLs are long. The average length is 43.9 characters. That would make the auxiliary table huge.


Just curious... are you using innodb or myisam? also stackoverflow.com is a perfect place for this sort of question, too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: