Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm a software developer, but the code I work does a lot of DB access. I always ask a "two joins and an aggregate question."

Given three tables:

    authors     books       sales
    ----------  ----------  ----------
    id          id          id
    name        author_id   book_id
                title       date
                            price
find the total sales by author in November 2013. The result set should have two columns: author's name and their total sales.


I would make a couple of assumptions:

- price could be 0.0 for a book

- Total sales means the sum and not count

- Date format will be yyyymmdd (if not, assume that we converted in Oracle or Sybase)

    SELECT a.name, SUM(s.price) FROM authors a
    INNER JOIN books b ON (a.id=b.author_id)
    LEFT OUTER JOIN sales s ON (b.id = s.book_id)
    GROUP BY a.name
    HAVING (s.date >= '20131101' AND s.date <= '20131130')


In SQL, this would be an issue due to the date column not existing in the group by. Additionally, because of the having statement, the left outer join is being treated as an inner join with respect to the results given.

Something like this could work (without going for an alternate approach such as Unions or sub-selects):

    SELECT a.name, ISNULL(SUM(s.price), 0.0) FROM authors a
    INNER JOIN books b ON (a.id=b.author_id)
    LEFT OUTER JOIN sales s ON (b.id = s.book_id)
    GROUP BY a.name, s.date
    HAVING (ISNULL(s.date, '20131101') >= '20131101' AND ISNULL(s.date, '20131101') <= '20131130')
Edit: Actually might be better to keep the null in the results like the original query. It's definitely better to know whether a book sold 1000 copies for $0.00 each or if no copies were sold.


Instead of the Left Outer Join, couldn't you also make it an Inner Join to S.BOOK_ID? The question, if I read it correctly, only requires you to return sales in Nov 2013. If there is no data for a book_id, you want to exclude it from your returned data, no?


I would imagine it is up to interpretation. It would probably be a good thing to ask the interviewer about to show that you are thinking about edge cases.

If it were me, I'd prefer to see those that made no sales based on the question. I guess if I didn't want to see them, I would ask for "where sales are greater than $0.00".


s.date is not a group column and a.name seems like a bad PK


My solution:

    SELECT authors.name, SUM(sales.price) FROM authors
      INNER JOIN books ON authors.id = books.author_id
      INNER JOIN sales ON books.id = sales.book_id
      WHERE sales.date >= Nov 2013 AND sales.date < Dec 2013
      GROUP BY authors.name
Although a coworker pointed out that inner joins would drop authors who haven't written books, and books that have no sales. If you want to include all of those, then use LEFT OUTER JOIN instead of INNER JOINs.


I think the fact that all solutions posted so far are broken and even repeat bugs I already pointed out validates this as a useful FizzBuzz test :-)


By "total sales", do you mean number sold or revenue?


I wonder if that could be part of the assessment, to tell if the candidate is likely to verify uncertainties or make assumptions.


Now that you mention it, probably so.


I mean total revenue, but I leave it unspecified in the hope that someone will ask for clarification. Same with if they should include authors with no sales during the time period.

I like this question because the base query, however you interpret the unclear parts, is easy, but it still allows candidates to impress me by asking about the "edge" cases. And even if they just makes assumptions, I can still ask them to do the other versions to see if they really know what they're talking about. I'm always amazed at how many people can't convert an inner join to a left join, or can't change a sum() to a count().




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: