Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The exercise is a bit weird. The problem statement says "find the record" but all the answers are about printing the median age.

So answer to the question posed? None of the above. One can make a simplifying observation that ages are all < 255. I'd then use a radix sort.

   from random import randint
   from time import time

   x = [(randint(20,80), "John Smith")  for x in range(2_300_000)]

   start = time.time()
   ages = [0] * 256
   for entry in x:
       ages[entry] += 1
   for age, count in enumerate(ages):
       total += count
       if total >= 2_300_000//2:
           print(age)
           break
   print(time.time() - start)
This is ~1.6x faster than the variant that uses sort with a custom key that someone else posted. If you simplify it into an array of numbers, it's still 5-10% faster on my machine than sort (i.e. comparing pure native code running sort vs running radix sort in interpreted Python).

If you converted this to native code I'm sure this would be over an order of magnitude faster, both because O(N) is faster than O(n log n) and because you're doing a linear scan through the records (just jumping randomly around within a 255 byte array which can be 1 cache line on some CPUs). Indeed, when I compare this in native code [1], on my machine it's 3ms for radix sort processing of 2.3M records vs 77ms (~25x faster). When run with -O0, it's 24ms vs 724ms (~30x faster).

If I needed to find the actual record for the median employee(s) (there can be more than one obviously), I'd just scan over the data linearly a second time which should STILL be faster [2] (in the benchmark it's still 7x faster than comparison sort).

[1] https://quick-bench.com/q/Bra8T5nv9nFzgUUUKxjVLNCie6I [2] https://quick-bench.com/q/CN0WBMpyICM2ZwnwgQadBX3cPj4



I had to reread this comment multiple times since I felt that I had a fundamental failure of reading comprehension somewhere along the line.

> The problem statement says "find the record" but all the answers are about printing the median age.

> So answer to the question posed? None of the above.

The problem statement does not say "find the record". The problem statement says, fully quoted:

> Rank the responses in the order of their relative concern with programming considerations, economic considerations, or other important considerations. If you were the programmer, which approach would you prefer? If you were Mary Jones, which approach would you prefer?


Indeed. The question asks you to rank responses. Furthermore,

> The problem statement says "find the record" but all the answers are about printing the median age.

No. it doesn't. It asks you to "compute the median age" without asking about any records.

> she has a file of personnel records [2022 update: an Excel sheet] and asks you to develop a program to compute the median age of the personnel in the file. Here are four different ways you might respond:

So the textbook asks you to rank possible ways to respond, and the hypothetical is just asking "can you process this list and tell me the median age". No one in any context is asking for a record.


You’re right. I was looking at option A. Therefore radix sort is strictly the right answer. Simple, easy to read, easy to understand and an order of magnitude faster.


If you cared about speed, you wouldn't sort at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: