I don't understand the complication of using a database. The sensible approach would be something like BMDiff with [page] indexing on top for random access.
I remember a spate of academic articles a few (3-7?) years ago talking about how all filesystems were going to be replaced by single huge databases to hold all our "files", maybe this is partially a continuation of that research.