Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But if it wasn't for the GROUP BY, filtering 100 results of a million down to 10 results wouldn't change performance much even if you read every column of every row of those 100.

The trick is the fact that the GROUP BY means that "It used to return 100 it now returns 10" is a red herring, it still has to read every row to make up those 10.



I don't understand what you mean "trick".

  SELECT date_column, count(*)
  FROM tbl
  WHERE a = @a
    AND b = @b
  GROUP BY date_column;
The "AND b = @b" causes the sql engine to access data in the table instead of solely relying on the index. GROUP BY has 0 to do with it. If you changed the query to

  SELECT a, date_column
  FROM tbl
  WHERE a = @a
and

  SELECT a, date_column
  FROM tbl
  WHERE a = @a
    AND b = @b
The answer would be the same.


No, it wouldn't.

If we're told that:

SELECT a, date_column FROM tbl WHERE a = @a

Returns 100 rows.

Then:

SELECT a, date_column FROM tbl WHERE a = @a AND b = @b

Will only have to scan column b over 100 rows.

Even without an index that will always be neglible, not compared to using the index to grab 100 rows from 10million but just compared to running a query and returning results at all.

The reason that the original can be a lot slower is that the 100 and 10 rows of results are comprised of a lot more rows of actual information, because of the grouping.

You're right that:

SELECT a, date_column FROM tbl WHERE a = @a AND b = @b

would be a lot slower, given the same data, but that isn't the scenario, the group by has implications about what "returns 100 rows, returns 10 rows" actually means in terms of data read.


Query 1 is an index seek only. It does not access the table data.

Query 2 will perform the same index seek but will need to do a key lookup on each row and filter.

It's not negligible. The 100 results are not comprised of a lot more information in this case, regardless of the grouping, because the 1st query does not access the table.

Edit:

I happen to have a table laying around with a little over a million rows and set up a similar set of queries.

The query optimizer suggested the index seek taking 6% of total operation time while the key lookup taking up the other 94%. The rest was negligible.


While you're correct, GROUP BY apparently also does kill the indexing:

http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization...


What are you talking about? According to that page, the example from the quiz would result in a tight index scan:

> The GROUP BY does not begin with the first part of the key, but there is a condition that provides a constant for that part:

> SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: