Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don’t understand how this works. If the trace API gave you the data in the writes then I can see how it works. You run your copy, then just run the writes over your copy and you have a snapshot that is consistent at some point in time.

However, if you just have a page modification flag then if you try and recopy the data that has been modified then it seems like you could just end up in a loop where you make no progress because the disk is continually being modified. If none of the modified pages have been modified during your second pass then everything is ok but if some have been modified then that could invalidate other pages that have weren’t modified during the first pass but we’re modified during the second pass.



The first pass, reading the whole disk, takes a long time (often hours for HDDs), so you'll have to deal with a lot of modifications.

As long as the system was only under normal near-idle load, there will be some modifications, but not too many, so you'll be able to catch up much more quickly, leaving the window for new modifications even smaller.

Of course, if the disk is continuously receiving a high write volume, the race might never end. But if it's idle long enough for you to do a sync and a final collection pass before the next write, you'll have a full image.


Plus, if the same few pages are getting written over and over (as often happens in high-write-load scenarios), you don’t need to queue up N reads, but instead can just do one, to capture the latest value. Similar to POSIX signal coalescing—once you know “X happened”, any other “X happened” notifications are redundant and can be dropped, until you actually handle the first one.


As long as you can catch up to a point where you read all busy pages and none were written in the meantime, yes.

Generally, the way VM migrations deal with this is by suspending the VM once a small enough set of dirty pages remain. I think the same can be done with block devices on Linux; if I remember correctly, there's some file in sysfs that you can poke to quiesce a block device. Just make sure your app is fully in RAM/cache, ideally locked, if you do this to your root filesystem :)


Except that's not how this tool works. It only does a single pass over the modified blocks as far as I can tell.[1][2]

[1] https://github.com/benjojo/hot-clone/blob/6a019efe28bdbbeeb1...

[2] https://github.com/benjojo/hot-clone/blob/6a019efe28bdbbeeb1...


You could potentially also speed it up a bit by, after the first pass, trying to read first blocks that are unlikely to be written again soon (normally the longer a block has been written last, the less likely it is to be written again soon). This would implicitly leave blocks that are being written over and over for last.


You missed the step where the author unmounts the filesystem. That idles all writes.

But yeah, the consistency test doesn't match the real scenario...

I guess it may work on a live filesystem if you iteratively ever get to a point that after reading the changed data you don't get any new changes in the trace and stop right there.


The unmount was just for verification. The tool is clearly intended to be used without unmounting the disk at any point.

Repeatedly copying until there are no new changes would produce a consistent image (if it terminates -- it requires the writing speed to the output device to be faster than the rate at which new data is being written to the input disk).

It may possible to stop earlier. The precise condition for producing a consistent image is as follows:

Let A[i] be the most recent modification time of block i, let B[i] be the modification time of block i at the time it was copied, and let C[i] be the time of the first modification to block i after it was copied (or infinity if the block hasn't been modified since it was copied).

We can stop if max(B) < min(C).

In detail, here is how to compute A, B, and C:

1. Initially, A[i] = B[i] = C[i] = 0.

2. When we are notified that block i has been altered, set A[i] to the modification timestamp. If C[i] == infinity, set C[i] to A[i].

3. When we copy block i, set B[i] to A[i] and set C[i] to infinity.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: