Most people are arguing that a password shouldn't ever be recoverable and that "even root level access should not grant you passwords".
This feels like shaky logic though. Hashing is a good defence against DB harvesting but it doesn't stop a root level admin from listening to inbound unencrypted logins. Prolonged root access is therefore still a viable attack vector. The question is only how quickly you can harvest those passwords.
Other people are arguing that with sufficient decoupling and safeguards between the encryption key and the database there is an acceptable risk associated with storing a password.
Since services like Yodlee clearly do store passwords this is something that companies do address. Could someone who really knows this area well please describe how this is done in a way to minimise risk and how the risk compares to a traditional 1-way hashing?
What I find interesting about the debate regarding password storage ethics is the question, "What is a website's ethical responsibility in regards to a user's password?"
On the face of it, one could argue that the extent of a site's ethical responsibility ends at their IP address -- i.e. a site is only responsible for protecting one's password to the extent that it reasonably protects the user from harm resulting from someone else using their credentials on the site. For example, under this scenario, HN's responsibility would be to take reasonable steps to protect my password in accordance with the level of harm I would experience should my account be compromised. HN's "reasonable steps" are different from those of Bank of America, and emailing my password in plaintext would not shock me (even if unlikely).
However, there is a tendency for people to argue that the extent of a site's ethical responsibility is to protect the user from harm elsewhere on the internet - the logic of course being that many people use the same password for HN and BOA. In some sense, this argument is premised on websites having some level responsibility for general welfare of their users (i.e. a website's responsibility towards users extends across the entire internet to some degree).
A weakness of that argument is that once general welfare of the user is the standard, plaintext storage of a password may promote the user's general welfare to a greater extent than more secure measures - security is just one criteria in regards to utility. A house with few windows is usually more secure but often less healthy for its occupants.
Although technical considerations are important, the issues surrounding password security methods for most sites are social: trivial passwords, password reuse, and "lost" passwords. Holding all sites to the standards which apply to sites with fiduciary responsibility such as banks or corporate IT centers is, in my opinion, somewhat asinine. Every web service does not need to be locked down, and good architecture will balance security with commodity and delight.
How do people try to justify storing passwords in recoverable form? So that they can remind users of the old password.
How do they actually send the reminder? By email.
But email is not secure, you should always assume that someone is eavesdropping on your email. Far too many users reuse a password from one service on others, so sending a password in email is a huge violation of trust.
Storing passwords in plain text (or using an encryption method) and sending password over email are two distinct issues.
I think storing passwords (and not only hashes of passwords) might be a good idea if it is harder for the attacker to recover the passwords than to gain root access to the http/database server, however, I would NEVER send passwords in emails (not even newly generated passwords). A simple solution is to send a one-time login link, and have the password displayed in the browser, over a https:// connection.
There is still a small possibility of stealing the password (someone intercepts the email and clicks on the link before the legitimate user does), but at least the user would become aware of it (his link wouldn't work any more).
One issue is that even if the password database is more secure than root access on the server, having the database at all means that it's possible to steal the entire thing in a small amount of time and in a way that may be difficult to detect. An attacker with root access on the server would have to keep the exploit open to continue gathering user credentials. And I still don't see what convenience having the password database offers.
One-time links are definitely the right way, though I think it's better to give the user a new password dialog than to tell them a randomly generated password.
> Hashing is a good defence against DB harvesting but it doesn't stop a root level admin from listening to inbound unencrypted logins.
Forgive my ignorance of web authentication, but aren't passwords hashed in the browser before being sent to the server for authentication?
If not, why? It seems to me that it would be just as easy to hash on the browser side as on the server side, but passwords are less exposed if you do it on the browser side.
(Apologies for hijacking your thread, but I'm interested in the technical details here.)
Passwords are rarely hashed client side. It doesn't matter, though, if you have enough control over a server to listen to POSTs for unhashed passwords, you probably have enough to inject some script and capture them on the client side.
True, but then you have to mimic the form submission without alerting the user. It's certainly possible, but much less feasible compared to just passive logging of all POSTs that have a field called [password|pw|pass|passwd].
I guess if you're targeting a single site the difference is small.
It would be just slightly more complicated to inject a script that POSTs back anything typed into a password field.
Hashing passwords client-side would just provide a false sense of security. Surely if it's worth doing it, it would be worth the cost of an SSL certificate and the overhead of protecting (a minimum of) all pages and requests handling passwords?
Not even that. You simply have to delete the JavaScript that hashes them on the client side, and have the browser send them unhashed (which is the default behaviour of HTML forms).
The point of hashing is from the difficulty of reversing it; generating a string that when hashed matches that value should be difficult. That way, even if someone has access to the hashes, he is unable to login - he needs the password.
If you accept the hash from the client, anyone having the hash is able to login, making hashing pointless.
It's not hard to fix that. You have the server send the client a random salt each time. The client then sends back a hash computed with that salt, and the server doesn't accept the same salt more than once. Done right, the server never sees your plaintext password and replay attacks still fail.
The difference between hmac(salt,hash(password)) and hash(password) is trivial when the salt is known. It doesn't matter that the server only accepts the salt once, if the server is willing to produce another salt for the next attempt. If the server limits the number of salts it will produce in a given timeframe, it might as well limit the number of attempts to begin with, and so there's no point in "double hashing."
The goal, I believe, is to prevent recovery of the plaintext password so that it can't be used to gain access to other accounts with the same password. The double hash prevents an attacker from ever obtaining that plaintext password.
The issue was, I believe, in preventing a remote attacker who knows the hash from using it to authenticate[1]. That is why passwords are transmitted in plaintext then hashed and validated on the server side. You should never trust the client to perform your cryptography for you, because you have no idea who--or what--the client is. No amount of obfuscation can alleviate that fact.
Your comment about not trusting the client confuses me. You don't have to "trust" the client to do cryptography for you. You simply define the cryptography it has to do, and if it doesn't do it, then it doesn't produce what you need to log in.
This is not an unusual practice, either. For example, any time you log into a remote computer using an ssh key, your ssh client is performing cryptography to authenticate you with the server. There is no problem in doing this, because the only way to perform the cryptography such that the authentication is successful is to have the right secret key. You could certainly write a custom ssh client that does some different cryptography, but that would be pointless, as the result would be an inability to connect.
There are, I think, two goals at play here. One is what you linked: preventing replay attacks. That is fairly easily solved by doing a double hash with a randomly generated salt. There is no problem in "trusting" the client to do this, because if they don't do it the way you specify, they don't produce the correct result. The only (feasible) way to generate a result that lets you log in is by combining the random salt with the correct password. This is important because otherwise an attacker could sniff your traffic and then impersonate you.
The second goal is in not having your password exposed if the site's database is compromised. Related, it would also be nice to not have your password exposed if the site is compromised and the attacker is watching incoming connections.
SSL solves the first goal, of preventing replay attacks. Not having your password exposed if the database is compromised is solved by hashing passwords. However, if you send the password in plaintext (or encrypted in such a way that the other end can retrieve plaintext, as with SSL) then the last, related goal fails: an attacker with total control can grab your password if you log in during the time he has control.
By doing hashing on the client, you can prevent that, and when implemented properly it can still avoid the rest.
The situation we were discussing is not analogous to the SSH situation you describe because the key, namely hash(password), is not secret (the attacker knowing it was the premise of the scenario). It doesn't matter what mumbo-jumbo (not an insult, just an illustrative phrasing) you require the client to do, an attacker can do it just as well. The system is still vulnerable to attacks (which are not exactly replay attacks by the strict definition, but are still trivially equivalent) because it depends upon the key being secret, which is not the case.
Hashing on the client does not satisfy the first goal and sending the password in plaintext does not satisfy the second goal. A different solution altogether would be necessary to satisfy both goals.
I'm still confused. With my proposed system, how do you perform a replay attack (or the trivial equivalent thereof) without access to the server's database?
There are two scenarios here, and I think we are each discussing a different one.
The first scenario is where the attacker obtained the hash from the wire which is what I think you're assuming to be the case. Then yes, double hashing would protect against replay attacks.
The second scenario is where the attacker obtained the hash from the database which is what I assumed to be the case. Double hashing would not protect against these attacks (which are not, strictly speaking, replay attacks).
I'm not interested in belaboring this point any further, because I think you came up with a better solution to both issues in another post[1].
When the attacker obtains the hash from the database, this does indeed let him carry out the rough equivalent of a reply attack, but it doesn't let him carry it out on any other site. The reason you don't want passwords stored (or transmitted) in plaintext is so that an attack doesn't compromise the user's accounts everywhere. This double hashing scheme avoids that, but yes, leaves your account on this one site compromised if the database is copied.
Glad you liked the other scheme. It's more complex, but seems better. I really have to try it out at some point.
IMO, what would be really ideal is the following goal
- An attacker who has access to the server database and also is watching the network traffic should not be able to impersonate the client(user) later.
The double hashing scheme doesn't achieve this. Since hash(password) is stored in the server DB, the attacker can just copy that and use it to login later. He just needs the hash, not the plain-text password to compute what the server asks for during authentication.
The above goal can be achieved by public-key cryptography. Its just like SSL working in reverse - server authenticating
the client/user based on user's private key. The practicality of assigning a private key to every user is a different matter though :)
Normally, private-key authentication doesn't work for most of these things because users expect to be able to log in with something they can remember, and any private key of decent strength is too long.
How about this for a login solution? For each user, you generate a private key and keep it on the server. But, you encrypt that private key with their password, and you don't keep that password or anything derived from it anywhere.
To log in, the server sends the encrypted private key and an authentication challenge to the client. The client then uses the password to decrypt the private key and respond to the challenge. Replay doesn't work, since the response is only good for that challenge. Watching the traffic on the server doesn't work, since you can only get the encrypted private key and the response. Snarfing the contents of the database doesn't even help, since you only get the encrypted private key.
For bonus points, I don't know if this is actually possible, make it so that it's impossible to tell whether a particular decryption of the private key is valid without using it to respond to an authentication challenge and sending that response to the server. This way it's impossible to brute-force the encrypted key, substantially mitigating problems with weak passwords.
Of course, I may have missed something obvious here....
Also, for your bonus question you can do something like this.
Enc-Priv-Key = Priv-Key XOR Trun(Hash(password))
where Trunc = Function to truncate to the length of the priv-key.
So, every (Enc-Priv-Key,password) pair combination is valid - it gives you a valid Priv-Key. Also, if you break into the server, you have Enc-Priv-Key and Pub-Key. Enc-Priv-Key is a random number assuming your hash(password) is random (which isn't true if your password is short but let's say it is). So, having Enc-Priv-Key shouldn't give you any information about Priv-Key.
Is every arbitrary sequence of bits a valid private key, though? Something tells me that it may not be, but I can't remember my RSA quite well enough to say for sure.
I suppose that even if not every sequence is valid, enough incorrect sequences will still be valid private keys to make a brute force attack impractical without server cooperation.
I was actually thinking that the client would store the private key somewhere (e.g. in an HTML5 browser:)) but what you described is brilliant and definitely works.
Something like that is not used probably because the bigger problem is clients getting hijacked, not servers.
Right, and when servers do get hijacked it tends to be a quick in and out, so compromises based on transient connections aren't much of a concern. Sending passwords over SSL with a good password hash on the other end solves everything if you assume that the attackers won't stick around listening on incoming connections.
Now I really want to implement my scheme using JavaScript crypto. If only I had a web site that needed secure logins.
what you're describing is basically cram-md5. Which is nice if you're using plain http and don't want the password to travel over the wire. But both sides need the plaintext password. Otherwise the hash of the password replaces the password and is just as good as an authentication token as the password itself, i.e I just need the hash to authenticate, not the pw itself.
But at least you can't recover the password to log into other sites. I thought that the main goal here was to avoid the scenario where an attacker recovers your password on one site and can therefore log in as you all over the place if you made the mistake of reusing it (as most people do).
The server needs the plaintext password once, to generate a salted hash. From then on, it can send the salt and a challenge to the client. The client hashes the password, salt and then hashes that with the challenge and sends it back to the server. The server hashes the challenge with its salted hash and they should match.
This still has to be done over SSL because otherwise the JS doing this on the client side is subject to MITM.
well. but if you're using ssl then you could as well send the password over the wire and have it properly stored with bcrypt on the serverside. And if I control the application I can make the client emit a plaintext password by sending an empty salt. The result is a simple hash of the password which can be reverted with a regular rainbow table. If the client uses bcrypt, I send a work factor of 1 and construct a proper rainbow table (that's rather simple). Or I could just have the javascript send the password in plain. So there's no gain here.
For the vast majority of sites storing passwords in the clear is a bad idea.
However sometimes you need to store sensitive data. Generally this is done with hardware security modules which rotate encryption keys in a periodic basis. Access to the sensitive data is also audited.
Risk can be minimized. It just comes down to if it is worth if for your business. PCI DSS is just one example of minimizing risk.
Uh... the code in question is retrieving those passwords and emailing them over unencrypted SMTP. How is that design improved by the use of key rotation or "hardware security modules"?
By using a HSM you store the encryption key on a separate hardware device. I am already assuming that if you want to implement something like this you will encrypt the passwords. Thus the problem is knowing where to put the keys.
This feels like shaky logic though. Hashing is a good defence against DB harvesting but it doesn't stop a root level admin from listening to inbound unencrypted logins. Prolonged root access is therefore still a viable attack vector. The question is only how quickly you can harvest those passwords.
Other people are arguing that with sufficient decoupling and safeguards between the encryption key and the database there is an acceptable risk associated with storing a password.
Since services like Yodlee clearly do store passwords this is something that companies do address. Could someone who really knows this area well please describe how this is done in a way to minimise risk and how the risk compares to a traditional 1-way hashing?