If you have any kind of staging/testing server I'd highly recommend using your production backups to populate that on a regular basis. That way you test your new code releases with real data, and you know that your backups work.
Quick cherry bomb to lob into this conversation: populating insecure test servers with sensitive production data is a classic web app company security failure. It probably doesn't matter for you, but be cognizant of it.
I agree. One of our big financial clients has an automated tool to scrub such data, but then they have social security numbers as well as lots of other juicy financial data. So they're worried about all sorts of stuff that most of us never ponder as a business risk.
One of the santizing steps is to replace all passwords with a set value, such as six/seven of a letter (like "A") or a number (eg, "111111"). Another sanitizing step is to scramble names and addresses. Usually the first letter gets preserved, and the rest gets replaced with a hash (say, MD5 it, and then base64 it and truncate it to length, that way it preserves max lengths and typical size of words).
example:
John Doe, 1313 Mockingbird Lane might get munged into
Jiqw Dyh, 1313 Masdfasdfas Lfds
We just have username/password/address/phone, so all we do is set all passwords to a default value (all emails, if any, get set to mine), and munge up telephone numbers. Later this year I'll cobble up a better sanitizer. Our parent company has to worry about GLBA compliance, but our little apps don't "collect" enough information to worry about GLBA at this time.
I don't understand this obsession about only storing hashes, as if that's the primary critical issue with site security. There are plenty of reasons to store the plaintext, and in a well secured database I really don't think it is much of an issue. Or as I heard someone say once, "If you can break into my database, and show me how, I will quite literally give you a million dollars".
Off the top of my head, here's a couple of very good reasons to store plaintext:
- password recoverability: if the user knows they can recover the password, they're more likely to use a more complex one
- flexibility with authentication: to use something like HTTP Digest Auth, you need the plaintext to be able to hash it with a one-time nonce
And like many will no doubt point out, hashing it isn't all THAT secure anyway. If it's not a very strong hash, or there's enough information to reset it somehow, they can get what they want anyway. Not to mention that if your database has been cracked they probably have everything they want anyway - why even bother logging in?
I just don't get it. Sure, defence in depth is the best strategy and everyone should practise it whereever possible. But whether the password is stored hashed or not is not the lynchpin security issue many make it out to be, IMO.
"obsession about only storing hashes, as if that's the primary critical issue with site security"
Can you point me to something I said that implies this is an obsession or that this is what I think the primary critical issue is with site security?
" password recoverability: if the user knows they can recover the password, they're more likely to use a more complex one"
Why would I as a user care at all if I could retrieve the actual value of a complex password -- and why would knowing I could recover it make me then choose a more complex one?
(The user should be given an option of resetting the password via a link sent by email. Sending passwords themselves over email is a great way to have it revealed for someone else to use later.)
"to use something like HTTP Digest Auth"
Good thing no one needs this mediocre authentication method if SSL is available.
The majority of people use the same passwords at different sites. So even if someone's cracked your database, it's still a good idea. Storing passwords in plaintext is a non-neighborly thing to do.
"Can you point me to something I said that implies this is an obsession or that this is what I think the primary critical issue is with site security?"
You asked for the financial institution's name so you could avoid them, based solely on the password storage issue. That counts as obsession to me. Oh and I forgot to write it before, but financial institutions often need to store in plaintext anyway, for telephone authentication.
"Why would I as a user care at all if I could retrieve the actual value of a complex password -- and why would knowing I could recover it make me then choose a more complex one?"
If people know they have to remember it, they tend to choose simpler passwords, or they write it down. If you tell users to set a hard password, and they can recover it later if necessary, they would hopefully tend to use better ones. I can't really back that up with a study, though, so it could just be my experience.
"The user should be given an option of resetting the password via a link sent by email. Sending passwords themselves over the email is a great way to have it revealed for someone else to use later."
This is veering off topic, but you either trust the email or you don't. What, pray tell, is the difference between sending the password and sending a link to reset the password, if an attacker has access to the victim's email?
"Good thing no one needs this mediocre authentication method if SSL is available."
Yeah, pity SSL is not an authentication method. You did know that, right?
Digest authentication is heavily used in APIs and other non-browser applications, where you need some authentication but the tunnel is not necessary and you don't want to maintain heavy sessions. SSL, apart from NOT being an authentication method, is anyway slow and heavy and requires proper certs, so is mainly used only for user-facing web sites. Not to mention intranets, devices, etc.
Anyway, even if HTTP Digest Auth were in fact rare, trying to wave it away with "good thing no-one needs it" is ridiculous. I, personally, need it, and am very far from alone.
I'd like to mention that I do agree in principle, and am playing devil's advocate to some degree. My point is that password hashing is not a panacea, it is often not even possible, and I would certainly not avoid a site just because they store in plaintext if I otherwise had a good impression of their security practises.
I suspect that many companies you know, trust and use have a plaintext copy of your password with them, and you wouldn't even know it.
"financial institutions often need to store in plaintext anyway, for telephone authentication."
Mine doesn't. And yes, if they did, I would not be their customer. Just because I may not know exactly what happens behind the scenes somewhere doesn't mean I can't react to the red flags I can see.
"If you tell users to set a hard password, and they can recover it later if necessary, they would hopefully tend to use better ones"
How is that any different than if the user can reset the password?
"What, pray tell, is the difference between sending the password and sending a link to reset the password, if an attacker has access to the victim's email?"
There is a big difference. Anyone who has access to the text of the mail at any point in time now has your password. It's about mitigating the risks of the crappy vetting channel (email) with a time limited method (a reset URL).
"Yeah, pity SSL is not an authentication method. You did know that, right?"
For password based things, I am referring to the channel used to avoid the well known problems with digest access authentication such as man in the middle attacks.
Besides what I was referring to: used with non-anonymous X509 client certs, yes SSL is in fact used for authentication. Entire infrastructures are built on it. All of the clusters I have access to only let me in by virtue of X509 client certificates over SSL.
""good thing no-one needs it" is ridiculous. I, personally, need it, and am very far from alone."
I said good thing no one needs it if SSL is available not that no one needs it...
I use it myself in software we release that runs behind a firewall, I'm well aware it's cheaper.
"I would certainly not avoid a site just because they store in plaintext"
I admit it's a little on the reactionary side for me to say that, it was quick snarky comment.
Fair enough. I think we agree anyway, I'm just being difficult : )
"Mine doesn't. And yes, if they did, I would not be their customer."
Are you sure about that? However would you know? And how would they do telephone banking?
I wouldn't expect a bank to store plaintext either, I'd expect them to encrypt it and handle decryption at the terminal. But that's a whole different kettle of fish.
"Anyone who has access to the text of the mail at any point in time now has your password."
Yeah, there is no way I want my passwords going through email either. That argument was a bit flaky.
"avoid the well known problems with digest access authentication such as man in the middle attacks"
Your point is valid, but I wanted to respond by saying we're talking mainly about large-scale DB theft, 99 times out of 100 done by an insider. You seem to have experience inside a large organisation so you will know that often, SSL terminates at the load balancer, a password form will pass into the server from the balancer in plaintext. If there's an attacker on the inside, he can sniff that to his heart's content. You could argue Digest is actually more secure in this setting.
Toss up between more security on the user's LAN/WLAN (SSL) and more security inside the DC (Digest).. OK, this is a bit whimsical.
"All of the clusters I have access to only work by virtue of X509 client certificates over SSL"
Me too, actually. But, sadly, that's not appropriate for the public at large.
Anyway, I agree it's a red flag, just trying to make a point that it's not as black and white as it seemed you were suggesting. There can be good reasons to store in plaintext, and if it's carefully implemented I don't have a problem with it. As long as it's an informed choice, and not just a naive default, and that goes the other way as well.
"First, the finserv organizations we've worked with tend not to store plaintext passwords."
I am talking about retail banking, specifically those with telephone service. If they don't store in a recoverable form (encrypted or plaintext) then I would love to know how the telephone operators verify passwords.
"Secondly, the difference between sending the password and the reset link in email is that the former compromises every other app the user uses."
Sorry, I don't understand the meaning of this sentence.
Anyway, I wasn't really serious with the "password recovery by email" argument, I was just trying to come up with a list of reasons an org might want to store plaintext, but that was probably a pretty flaky one. Any site that sent me my password in plaintext via unencrypted mail would lose me as a customer pretty damn quickly, too.
I meant encrypted for banking, of course. The key point being that the passwords are readable. Two-way, vs the one-way hash discussed before. Maybe I didn't explain myself properly.
Web site passwords might be one-way hashed, I don't know, but telephone banking passwords must be displayed on screen for the operator to read.
"Use of HTTP Auth --- digest or otherwise --- at all --- a doc-able finding."
Uh-huh. So, your photocopiers have SSL certs do they? More likely they have nothing at all. I wonder if that's a "doc-able finding", whatever that is, presumably something bad.
This obsession with HTTP Auth being "evil" is laughable. A lot of the time it's absolutely fine. Hell, a lot of the time it's overkill.
And that rule, if true, is a Dilbert-esque joke. You can't legislate security by banning arbitrary protocols like that. Yes SSL is more secure but other methodologies are still useful, used appropriately. It's like the army banning pistols because machine guns are "better".
A number of hacks involved downloading the database of some websites. Usually this involves defeating the operating system that runs the website (or a server in the same datacenter) and FTPing the database away. Now the thieves have every username and every password, and can log in and abuse the system as needed.
People tend to use the same username/password combination everywhere they go. In the early days of Everquest, there were a number of websites set up just to harvest usernames and passwords, and about 5% of the EQ userbase used the same username/pwd as their game accounts. Steal one forum database (which is probably why the efforts to crack vBulletin and phpBB), or set up your own honeypot, and harvest a number of game accounts to loot and plunder. This applies to other games as well, not just evercrack.
The reason for the love of hashing is that it is one-way. There isn't any feasible method for reversing it. While things like rainbow tables can make crackers' lives easier, it still puts the burden on them.
Backup tapes get lost. Crooked employees have been known to sell access, account info, or whole dumps of DBs. External opponents aren't the only threats, you have malevolant and stupid internal threats to deal with as well.
These are all good points and I agree with you completely.
I think I haven't explained myself well. Firstly I made the mistake of saying "plaintext" when what I really meant was "recoverable", ie encrypted but not hashed. I would never suggest that passwords be stored in plaintext, protected only by operating system and DB passwords. I didn't make that clear and I think I've deservedly gotten some heat for it.
All I am trying to say is that with enough effort, data can be secure, or secure enough. In my job I store customers' credit card information. This must be in a recoverable form. If it leaks I am dead and probably so is the company. Nothing is perfect but I have gone to a lot of trouble and I have reasonable confidence in my efforts.
Same goes for server private keys, financial records, etc. All of it is "ring 0" secure data and extraordinary efforts are made to keep it that way.
I do not actually store user passwords currently, but I know people who do. I have similar confidence in their precautions and skill. Obviously it can be done badly, just like hashed passwords can be implemented badly. But if done properly, I stand by my assertions that storing user passwords in a recoverable format can be no greater a risk than any other part of the system, and no easier or more likely an attack.
I have harped on about this enough, but I'd just like to point out two more things - one, if people enter the same login details to a honeypot then nothing can save them, and two, that crooked employees can circumvent the hashing anyway, either by sniffing inside SSL or just inserting a logging hook. Relying on hashes to thwart crooked employees is folly and could breed complacency.
Hashing passwords is a security layer, nothing less and nothing more. A site's security relies on the skill and care of its architect and staff, not on any single hot topic buzzword. Right, that's enough on this topic. Thanks for the thoughtful reply.
Because you are going to lose your entire database, and everyone's password along with it, to the first SQL Injection vulnerability you miss in your application.
I was referring to a careful implementation by competent people. An organisation opting to store in plaintext would have to have special precautions so that could never happen.
The implementation I have knowledge of is separate from the "main" DB, accessible only over an internal REST interface and heavily secured. There is no way a simple attack or compromisation of a web server could make it spit out the kind of "jackpot!" list you're talking about. The infrastructure is layered like a frickin' matryoshka doll and frankly you would have more luck just robbing a bank.
So it is possible to do well, with due care and attention and a competent paranoid admin. Not advisable or desirable for a "normal" system, I agree completely.
>> An organisation opting to store in plaintext would have to have special precautions so that could never happen.
What everyone is trying to say is there are no foolproof measures to securing data. Everyone who thinks their method is safe becomes the case study for the next generation.
Also, social engineering and disgruntled employees trump internal software architecture everytime.
My point is that although password hashing is a very wise practise, there are situations in which the plaintext is necessary, and with careful design a plaintext password store can be made no weaker in security than the rest of the system.
This seems to me to be common sense and I have no idea why it's so controversial.
Well then, in your parallel universe, by all means store plaintext passwords. I'll go on busting up the real apps, and recommending they don't tempt fate by storing passwords.
I recommend that too but sometimes it is unavoidable. When, not if, but when you come across that necessity in your encounters with "real apps", I hope you rightfully feel like a bit of a douche for writing the above.
So you trust the developers of a web site's word that they hashed your password, because you don't trust them to not look at it? You trust someone's word that they're not doing something you don't trust them not to do?
Right. Anyway, for future reference, just remember that if you send your password to someone, and they want to look at it for some reason, they can, regardless of their (claimed) database/authentication design.
Who said they're storing passwords in plain text? They could be paranoid enough to remove hashed passwords. If you know what makes the hash you can reproduce. If it's a database with financial information, I can see crackers devoting time to do this, making their rainbow tables and botnets or whatever to guess the passwords.
I guess I wasn't clear enough. The "user" tables get updated so that the password used to log in will be "111111" or such, and that means all the salts and hashes will be the same value.
This is what I do. Using Amazon's EC2+EBS makes it dead simple. Every time we test a release I setup a staging env (from scratch) and restore the latest EBS snapshot. That way I'm testing both the database backup and that I've kept all of the server images up to date as well. It doesn't take much time either (about 30 minutes) and is done weekly.