Python development is done in public so you can just benchmark against the development version to see its improvement. In fact, daily benchmarks are already posted to [1]; it indicates around 20% improvement (corresponding to 1.25x in the table) since 3.10. The only thing you can't easily verify is that whether GIL was indeed historically necessary in the past.
[1] https://github.com/faster-cpython/benchmarking-public