But if it wasn't for the GROUP BY, filtering 100 results of a million down to 10...

buckbova · on Feb 20, 2014

I don't understand what you mean "trick".

  SELECT date_column, count(*)
  FROM tbl
  WHERE a = @a
    AND b = @b
  GROUP BY date_column;

The "AND b = @b" causes the sql engine to access data in the table instead of solely relying on the index. GROUP BY has 0 to do with it. If you changed the query to

  SELECT a, date_column
  FROM tbl
  WHERE a = @a

and

  SELECT a, date_column
  FROM tbl
  WHERE a = @a
    AND b = @b

The answer would be the same.

eterm · on Feb 21, 2014

No, it wouldn't.

If we're told that:

SELECT a, date_column FROM tbl WHERE a = @a

Returns 100 rows.

Then:

SELECT a, date_column FROM tbl WHERE a = @a AND b = @b

Will only have to scan column b over 100 rows.

Even without an index that will always be neglible, not compared to using the index to grab 100 rows from 10million but just compared to running a query and returning results at all.

The reason that the original can be a lot slower is that the 100 and 10 rows of results are comprised of a lot more rows of actual information, because of the grouping.

You're right that:

SELECT a, date_column FROM tbl WHERE a = @a AND b = @b

would be a lot slower, given the same data, but that isn't the scenario, the group by has implications about what "returns 100 rows, returns 10 rows" actually means in terms of data read.

buckbova · on Feb 21, 2014

Query 1 is an index seek only. It does not access the table data.

Query 2 will perform the same index seek but will need to do a key lookup on each row and filter.

It's not negligible. The 100 results are not comprised of a lot more information in this case, regardless of the grouping, because the 1st query does not access the table.

Edit:

I happen to have a table laying around with a little over a million rows and set up a similar set of queries.

The query optimizer suggested the index seek taking 6% of total operation time while the key lookup taking up the other 94%. The rest was negligible.

Alex3917 · on Feb 21, 2014

While you're correct, GROUP BY apparently also does kill the indexing:

http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization...

henrikschroder · on Feb 21, 2014

What are you talking about? According to that page, the example from the quiz would result in a tight index scan:

> The GROUP BY does not begin with the first part of the key, but there is a condition that provides a constant for that part:

> SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;