Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is possible to design a Dropbox-like system with the following properties:

1. Files are stored encrypted.

2. The service provider does not have the ability to arbitrarily decrypt the files. By "arbitrarily decrypt" I mean decrypt at any time they wish. They will be able to decrypt if the owner's client is actively connected.

3. When someone uploads a file that is identical to an existing file, it initially is stored separately, but in most cases can be eventually de-duplicated, without compromising #1 or #2.

I'll leave the details as a fun exercise.



Scratch that. I've got an even better design than what I was thinking of above. It makes it so the service provider never has access to the unencrypted data, and they can fully de-dup immediately, and it supports all Dropbox features.

   Let F be an arbitrary file.
   Let N(F) be the name your client knows the file by.
   Let H(F) be a hash of the file that produces a 256 bit hash.
   Let AES(X,K) be X encrypted using AES with key K.
When you upload to the cloud, you upload AES(F,H(F)). In a local database, you store (N(F), H(F)). When you later retrieve the file from the cloud, you receive the encrypted data, and you can lookup the key, H(F), in your local database.

Note that if two different upload files with the same content, they pick the same encryption key (since the key comes from a hash of the content), and so the same data gets uploaded. The service can thus do de-duplication, even though it has no access to unencrypted data.

So far, all this provides is secure storage. What makes Dropbox useful is that a file uploaded on one computer can be downloaded on another, and that only works if the downloader knows H(F).

This is solved by also uploading a copy of that local database I mentioned, the one that stores the (N(F), H(F)) pairs. This can be encrypted with the account password.

Syncing between different devices on the same account is then a two step process. First, the name/key database is synced, and then both devices have access to the keys and then the files can be synced.

I believe web access can be handled via this system. Dropbox's web interface requires Javascript, so it could have the browser retrieve the name/key database and decrypt it using the account password, which gives it the access to the key to decrypt a given file.

For shared folders, you can use a public key system, where the keys for the shared files are encrypted with the public keys of each person you are sharing the folder with, and the encrypted key files are stored in the cloud. Anyone accessing the shared folder grabs the key file for the folder and uses their private key (which is protected by the account password) to get K(F) for the file.

I believe this covers everything Dropbox does, with the properties that:

1. They can't decrypt your files.

2. They can de-duplicate completely.

3. Your account password is the key for everything for you.

4. It satisfies all of their advertising claims for security.


That's an excellent solution. A couple of points:

1. It is usually not a good idea to use your key as a function of the message. Here, you would require your hash function to have a good min-entropy given inputs from whatever distribution M comes from. I believe SHA-256, as of today, will satisfy these needs.

2. Even if H is modeled as a perfect hash function (i.e., a random oracle, in crypto literature) you would require that AES itself does not use H in any particular special way. Think of AES' which is just like AES except when k=h(m) for any message, it just outputs k (i.e., cheats). It would be nearly impossible to detect this behaviour of AES' under normal circumstances because H is pre-image resistant, but clearly, this would trivially void the security of the scheme.

The suggestion is exactly what came to my mind, but the proof, although should most definitely hold when instantiated with AES and SHA-256 will require some work to be proven in general.


There's still a big problem with de-duplication: Dropbox can still figure out which users have the same file, thus leaking information. That, combined with the fact that they'll know the size of the file already gives them a lot of info.

For example, if the FBI seizes a computer and finds some illegal files, they can still request Dropbox to give a list of users that have the same file.


As has been mentioned elsewhere in the thread -- de-dupe isn't responsible. If Dropbox is storing your files -- then the TLA can always request Dropbox to give a list of users that have the same file. (Unless you have some form of independant crypto/hashing)


yes, thank you for rewording my comment.


at the expense of conveniences like web access, document previewing, simple sharing, etc. - sure :-). if your answer to the web access concern is: derive the key from the password, who's to say we wouldn't store the key and later use it to decrypt your data?

web access non-withstanding, you'd be making a leap of faith to believe that the client is 100% trustworthy and that encryption is actually happening. at some point you have to make a decision as to whether or not you trust the entity (dropbox, google, or anybody else). if you don't, you should use something like truecrypt between you and the service.

all arguments made against dropbox apply to your gmail attachments, gmail mail, google docs, etc.


> make a decision as to whether or not you trust the entity

If I don't trust the entity, how could I be installing any of its software on my machines? I have to trust what I am told if I am to use the software for its intended purpose.

If Dropbox claims what Miguel has quoted in his post, and then it happens that claims are (basically) not true, then it raises the question of integrity, i.e. what other assumptions that I have made were off? Say, that your .sys is not doubling as a key logger or your software is not scanning my disks at government's request, etc.


it's unclear to me what statements we're making are 'not true'.

if you don't believe our statements, I'm not going to be able to convince you to trust us over the discourse in this thread :-)


If you publish security spec and adhere to it in a way that allows independent verification of its implementation, then - yes, you will convince that what was claimed is true.

Perhaps, the easier route for you would be to just drop the whole "encrypted" angle and simply state that you provide reasonable protection of files while in transit and in your possession. That would satisfy 99.9% of real users and it will not rub cryptographic pedants the wrong way. The issue at hand is not that you don't encrypt properly, but that you over-promised, and over-promised in a very sensitive area.

(correction) "over-promised" = "implied more than what was said", i.e. what Miguel referred to as "wishy-washy statement".


Tahoe-LAFS seems to fit the bill, except I don't know about #3. From what I understand it's rather stronger on #2, though.


This system already exists.

* Wuala (using encryption but somewhat insecure deduplication)

* SpiderOak (using more secure deduplication: https://spideroak.com/blog/20100827150530-why-spideroak-does...)


* AltDrive - client side deduplication.


If the service provider can ever decrypt the file, then you have to trust them, and then it isn't secure. When it comes to security, you can't trust anybody.

The only way to secure your files in a Dropbox style situation is to use your own client-side encryption that the service provider has no access to... IE the Truecrypt solution that keeps being suggested.


Check http://camlistore.org/ it's by a few googlers including the guy that did memcached (Brad Fitz).


This should not be a service, this should a protocol, with RFC.


tarsnap


I'm guessing tarsnap doesn't satisfy the 3rd criteria, nor is tarsnap able to decrypt the user's files, even when the client is connected. Perhaps cperciva can comment.


3rd criteria can be considered a vulnerability, it may allow to know that certain user has a known file. It can be exploited to reveal information about the encrypted files.


Yes, I read that article too. That information leakage vulnerability can be eliminated by requiring the user always upload the file the first time they store it. Subsequent uploads of the same file for the same user could be skipped. De-duplication across users in storage is also possible without leaking information.

What is not possible (AFAIK) is the combination of the two requirements: 1) de-duplication across users, and 2) service provider is not able to decrypt your files. The latter requires the encryption/decryption be done on the client only (service provider doesn't have the keys at all). The former requires access to the unencrypted file, or for the clients to share keys.


You can do (1) and (2) but it is not trivial to prove that it works. See: http://news.ycombinator.com/item?id=2461713


Clever. I like it.


Yes, exactly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: