Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Silver Searcher: An attempt to make something better than ack (github.com/ggreer)
29 points by robin_reala on April 13, 2012 | hide | past | favorite | 22 comments


I haven't used ack since I discovered git's built-in `git grep` command, which, as I understand it, doesn't even need to sift through your files; it just examines your git index, so it's waaaay faster than either grep or ack. It's also recursive by default, doesn't require a file-matching parameter, and colorizes its output, which together were the reasons that I used ack over grep in the first place. Pretty brilliant, all told.


The index is just a list of the files and associated metadata, it is not their contents. So git grep still needs to read the files from disk, though it doesn't need to walk the filesystem to locate them[@].

[@] Technically git grep has five modes of operation:

1. Search the contents of the tracked files as they currently are on disk. This is the default.

2. With --cached, search the contents of the tracked files as they are in the index (i.e. ignore any un-added changes).

3. With --no-index, search all files recursively from the current directory down. This allows you to use "git grep" as a "grep -R" replacement even when your CWD is not inside a repo.

4. With --untracked, search all files recursively from the current directory down in addition to files in the index. (The difference between this and --no-index when used inside a repo is that --untracked honors the .gitignore mechanism by default, i.e., --untracked is a synonym for --no-index --exclude-standard when inside a repo.)

5. With a tree'ish (commit, tag, branch name, etc), search all files in the tree.


"git grep" only searches tracked files by default, so to get correct behaviour, you will need to create an alias (I don't think there's a way to change the default).

It also only works on Git repos; having to think before choosing between "git grep" and "ack" just adds unnecessary mental context switching.

Personally, I prefer ack's output, which puts the file name on a separate line, and includes line numbers by default. Together with -C you get a much more readable output.

Mostly I end up just using Sublime Text's built-in file search, which is like ack, but has a generous -C setting enabled by default, and supports replacing.


Oooo. I've still got to search lots of things that aren't grep, so it's not a complete ack replacement for me, but wow, git grep is like two orders of magnitude faster in my quick tests!


Much as I've loved ack, I'd be all in favor of a replacement that was really significantly better.

That said, the readme there lists five reasons why Silver Searcher is better than ack. Two of them are nonsense (who cares what language it's written in, or same-order-of-magnitude differences in how big the executables are?), two sound like they would be pretty trivial changes to ack, and the last is a significant speed improvement. But then you read further down the readme, and it says the current development state is somewhere between "Runs" and "Behaves correctly". Isn't it kind of premature to be bragging about how fast you are before your code actually behaves correctly?

Also, it makes me wonder how much of the speed increase is based on the easy changes filtering out more files...


Not arguing with your basic point, which is good, but when the developer says "The binary name is 33% shorter than ack!", he's referring to the difference between typing "ag" vs. typing "ack". Not the size in bytes of the executable.

It's a little inside joke, because one of the points in favor of "ack", offered jokingly years ago by its developer, is that "ack" is 25% fewer letters to type compared to "grep".

(This feature is still there as point #10 in http://betterthangrep.com/why-ack/, now not offered 100% jokingly.)


It's a joke but not. The less typing you have to do, the better. Less typing means fewer mistakes and less time waiting for the search to start.

Defaults matter. ack is all about having sensible defaults for your most common uses.


  (who cares what language it's written in, or same-order-of-magnitude differences in how big the executables are?)
That last reason there is about the length of the filename, not about the size of the file. I don't think anyone's supposed to really care, no.


Oh! Well, then that's another for the trivial to make ack do it list... :)


The list is somewhat tongue-in-cheek. Also I haven't updated the README in a while. While ag certainly isn't as stable or feature-rich as ack, I have fixed bugs in ag that are still in ack. https://github.com/petdance/ack/issues/220 for example.


Hey, if you've got a solution for it, I'd love to hear.



I'm the author. People are asking how this thing is faster than ack or grep. Here's how:

- Literal matches use Boyer-Moore-Horspool strstr.[1]

- Files are mmap()ed instead of read into a buffer.

- If you're building with PCRE 8.21 or greater, regex searches use the JIT compiler.[2] Also I call pcre_study() before executing the regex on a jillion files.

- Ag reads your .gitignore and .hgignore files to ignore code you don't care about.

- Instead of calling fnmatch() on every pattern in your ignore files, non-regex patterns are loaded into an array and binary searched.

I wrote a couple of blog posts about profiling The Silver Searcher and improving performance. http://geoff.greer.fm/2012/01/23/making-programs-faster-prof... is the most informative one, IMO.

1. http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Hor...

2. http://sljit.sourceforge.net/pcre.html


Oooh, I forgot about your project when I did the betterthangrep.com website overhaul. I just added it to http://betterthangrep.com/more-tools/


FWIW, git grep also uses threads in some circumstances to get better performance.

Also, obligatory link whenever Boyer-Moore is mentioned: http://ridiculousfish.com/blog/posts/old-age-and-treachery.h...


That's a cool blog post. I've tried not to look at the grep source code until I've written my own solutions, so I didn't know grep made that tradeoff.


I prefer the much simpler approach of reusing the existing utils with a simple shell script, which is much faster than ack:

http://www.pixelbeat.org/scripts/findrepo

A quick test on a moderately big repo:

    $ time findrepo test '*' | wc -l
    158819
    real	0m0.532s
    $ time ack -a test | wc -l
    76526
    real	0m8.762s


I'm more than happy to include findrepo on the betterthengrep.com/more-tools page. If you'll make a page for it, or at least have something for newbies to read, I'll add a link.

My concern is that someone who's new to all of this isn't going to understand what to do if I just link to http://www.pixelbeat.org/scripts/findrepo


Wait, how is anything faster than grep?


I believe it's because ack and friends skip files you ordinarily don't want by default. So there's no .svn-base duplicates, no cache files generated by tools, etc.


So `find ... | xargs grep ...` is still faster, right?


Depends on if you include time to type the command.

If your find command looks like

    find . -name '*.pl' -o -name '*.pm' | xargs grep foo
and it takes 1 second to finish, and your ack command is

    ack foo --perl
and it takes 1.5 seconds to finish, you can say the grep is faster.

But I just timed the time it takes to type those, and they took me 9.2 vs. 2.3 seconds.

So which is faster: 9.2+1.0 or 2.3+1.5?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: