Hacker Newsnew | past | comments | ask | show | jobs | submit | blr246's commentslogin

Hi beforeolives—

I like your breakdown, and I've observed similar things in my experience as an engineering focused data person! I've had many discussions with my colleagues about how to manage effectively these different blends of roles and skills.

I'm looking for someone for an engineering type of data role right now. Is there a way to get in touch with you about it?

Our product helps companies listen to their customers by unifying natural language feedback across various channels, applying signals using various natural language modeling techniques, then aggregating them to help teams deliver better outcomes using more relevant information.

Hope to hear from you (brandon at frame.ai) :)

edit: forgot to share agreement for your breakdown


>Instead, what kills cities is a long period in which their leaders fail to reckon honestly with ongoing, everyday problems—how workers are treated, whether infrastructure is repaired. Unsustainable, unresponsive governance in the face of long-term challenges may not look like a world-historical problem, but it’s the real threat that cities face.

The feels correct to me.

I lived in New York City for 15 years. Until last year. I've thought about this theme all year. Decades of policy supporting foreign investment and developer speculation gutted the chance for even affluent upper middle class New Yorkers to afford housing and setup a home base, and so many left. The situation has been incomparably more challenging for low income residents.

I agree the urban collapse meme is much easier to spread than a thoughtful discussion about policy and priorities and how to balance the economic strength of a city's major players with the daily priorities of everyday citizens. I hope the New York remainders shift priorities and initiate a different kind of prosperous era than the one I got to enjoy.


Massive investment and ever increasing market value of abodes is generally the opposite problem of 'urban collapse' - I mean, unless the bridges are falling down, which I doubt.

NYC in the late 1960's and 1970's underwent a kind of urban collapse.

Detroit underwent urban collapse and never came back.

I think this is maybe what the author meant by 'we don't know what this means'.

If companies, middle class and power flee a city, there is a 100% chance of urban collapse due to the lost tax base.

A thriving city like NYC or SF that are a dysfunctional mishmash of 'barely effective' - well that's another kind of problem but it's not quite urban collapse.


We are agreeing that NYC is not at a moment of urban collapse. The processes that drives away the tax base includes policies and social and market forces that erode the city's effectiveness as a sustaining economic and social hub.

The 1960s and 1970s crisis had a lot to do with the end of NYC's industrial epoch. Suburban development and globalization eliminated manufacturing and pulled workers and residents out of the city. The recovery of NYC was bringing high-value services, retail, and tourism back along with arts and culture.

In the time since, NYC has become increasingly a luxury experience, which is indeed part of its strength but also its weakness, since it accelerates decline when people can up and leave without having roots.


NYC isn't going to collapse, but I believe it is going to degrade. But it's going to degrade primarily for the middle and lower income families.

Median household income in NYC is actually below median across US, making the existing tax base small. The flight of the middle class, that was only accelerated by COVID, was already happening.

We left NYC just a few months ago. And we are one of many, who left.


Regular apartments were subject to rent control. Luxury were not.

So one declined, the other thrived.


Explain why they only build luxury apartments in the Midwest too. You're analysis is faulty.


maybe it's not collapse yet, but it certainly isn't making the city an easy place to live, and seems to fit the precursors mentioned in the article. the paper valuation of real estate and landlords don't make a city. referencing their efficiency of extraction as a metric for city health works right up until the moment the burden becomes too great.



And this comment proves the point about why this stuff is hard: the paragraph about living in NYC for 15 years is entirely wrong.


Why is it wrong?


I had the same initial thought based on the title. Unfortunately, the answer is no.

The article discusses a low-dimensional KNN problem. The curse of dimensionality guides intuition that the methods here likely will not apply to extremely high-dimensional problems.

faiss actually comes with a lot of excellent documentation that describes the problems unique to KNN on embedding vectors. In particular, for extremely large datasets, most of the tractable methods are approximations that make use of clustering, quantization, and centriod-difference tricks to make computation efficient.

See https://github.com/facebookresearch/faiss/wiki/Faiss-indexes and related links for more information.


At Frame.ai, we are using both PostgreSQL and faiss (and other tools) in our stack to do several different kinds of inference tasks on semantic representations of text to help companies understand and act on customer chats, emails, and phone call transcripts.

We've frequently had the same dream of adding more native support for nearest-neighbor type queries, since that is the workhorse of so many useful techniques in the modern NLP stack.

Right now, we have lots of dense vectors stored in massive toast tables in PG. It's faster to fetch them rather than recompute them, especially since there are a number of preprocessing steps that limit what we pay attention to.

The discussion here about full text search versus semantic search is interesting. In our experience, both are highly relevant. Sometimes it's most useful for our customers to segment their conversation data by exact text matches, and other times semantic clustering is most effective. I think there's plenty of reason to offer both kinds of capabilities.


GMail seems to have opened a vector to amplify dark patterns by placing action buttons on messages. My least favorite is the LinkedIn accept invitation button, which I've clicked now several times by accident because I've spent years using GMail without it taking actions like opening GitHub PRs and accepting LinkedIn invites.

I can't find a way to disable this feature. Does anybody know how?


Not sure this is a GMail feature. This would technically be possible with any HTML link styled like a button.


It's built into the inbox view, so GMail is extracting the action from the message content and placing a button on the row element. Sorry if that wasn't clear from my initial post.


One more reason to use basic html gmail (or avoiding gmail entirely)


For query plans using a sort node, there can be a major difference in performance depending on the row width.


It's worth mentioning that it's a bad idea to invalidate refresh_token grants ever during the lifetime of an authorization. I've seen APIs do this immediately upon sending the response to the token endpoint, which makes the system unusable due to the frequency of network transmission errors that would result in having to contact the resource owner to grant access again. Even an expiry after days and years is only likely to result in more support requests to the API maintainer without increasing security enough to justify it.

The reason this bad practice is common is that it is allowed by the spec in https://tools.ietf.org/html/rfc6749#section-6 as an optional action to take on refresh grants. Please, do not do this.


I've always believed it should be the responsibility of the consumer (perhaps with the aid of client libraries) to properly handle the refresh lifecycle. I'm not a fan of password rotation, generally speaking, but it's because humans are terrible at remembering and creating complex passwords. Software processes don't have the same problem.


"sending the response to the token endpoint" Which response?

"makes the system unusable due to the frequency of network transmission errors that would result in having to contact the resource owner" What kind of network transmission errors are you getting? And at what quantity? This shouldn't be too difficult to do

I disagree and think refresh_token expirations do add to security and believe it is the client's job to handle any difficulties that can come with the expiration period


I don’t think I saw a support ticket in years around refresh tokens and we definitely expired them to avoid storing an infinite history of the things. And also a nagging sense of not increasing account token leakage.

Tl;dr: I don’t know kids, seemed fine to me, world didn’t end


Appreciate the detail here. It's a great writeup. Wondering what folks think about one of the changes:

  5. Changing the SOP to do staged rollouts of rules in
     the same manner used for other software at Cloudflare
     while retaining the ability to do emergency global
     deployment for active attacks.
One concern I'd have is whether or not I'm exercising the global rollout procedure often enough to be confident it works when it's needed. Of the hundreds of WAF rule changes rolled out every month, how many are global emergencies?

It's a fact of managing process that branches are liability and the hot path is the thing that will have the highest level of reliability. I wonder if anyone there has concerns about diluting the rapid response path (the one having the highest associated risk) by making this process change.

edit: fix verbatim formatting


Yep, that's the exact bullet point I was writing a response on. Security and abuse are of course special little snowflakes, with configs that need to be pushed very fast, contrary to all best practices for safe deployments of globally distributed systems. An anti-abuse rule that takes three days to roll out might as well not exist.

The only way this makes sense is if they mean that there'll be a staged rollout of some sort, but it won't be the same process as for the rest of their software. I.e. for this purpose you need much faster staging just due to the problem domain, but even a 10 minute canary should provide meaningful push safety against this kind of catastrophic meltdown. And the emergency process is something you'll use once every five years.


Your response highlights a good idea to mitigate the risk I was trying to highlight in mine.

They want to have a rapid response path (little to no delay using staging envs) to respond to emergencies. The old SOP allowed all releases to use the emergency path. By not using it in the SOP anymore, I'd be concerned that it would break silently from some other refactor or change.

Your notion is to maintain the emergency rollout as a relaxation of the new SOP such that the time in staging is reduced to almost nothing. That sounds like a good idea since it avoids maintaining two processes and having greater risk of breakage. So, same logic but using different thresholds versus two independent processes.


Right. The emergency path is either something you end up using always, or something you use so rarely that it gets eaten by bit-rot before it gets ever used[0]. So I think we're in full agreement on your original point. This was just an attempt to parse a working policy out of that bullet point.

[0] My favorite example of this had somebody accidentally trigger an ancient emergency config push procedure. It worked, made a (pre-canned) global configuration change that broke everything. Since the change was made via this non-standard and obsolete method, rolling it back took ages. Now, in theory it should have been trivial. But in practice, in the years since the functionality had been written (and never used), somehow all humans had lost the rights to override the emergency system.


My personal rule is that any code which doesn't get exercised at least weekly is untrustworthy. I once inherited a codebase with a heavy, custom blue-green deploy system (it made sense for the original authors). While we deployed about once a week, we set up CI to test the deployment every day.

Cold code is dead code.


> Security and abuse are of course special little snowflakes, with configs that need to be pushed very fast, contrary to all best practices for safe deployments of globally distributed systems.

Once upon a time, I worked on a system where many values which would otherwise be statically defined in similar systems where instead put into a database table. This particular system didn't have a proper testing and deployment pipeline set up, so whereas a normal system would just change the static value at some hard-coded point in the code and quickly roll it out, this system needed to keep it in the database so that it would be changeable in between manual deployments (months or even years apart). The ability to change the value facing the user by changing the value in the database inflated the time it took to test a release, thus exacerbating the amount of time it took to release a new version, but well, it worked.

My point is that if security and abuse rules need to be rolled out quickly, then the system needs security and abuse systems where the entire range of security and abuse configurations (i.e. their types) are a testable part of the original pipeline. Then the configurations can safely be changed on the fly, so long as the changes type-check.

It's easy to understand why it's never been built though - you'd need both a security background and a Haskell-ish/type-theory kind of background. Best of luck finding people like that.


The main problem is that their Regex library doesn't have a recrusion limit. I'm honestly amazed they've been able to scale Lua scripts to the point they can use it as a global WAF. Knowing this, it may be easy to create attacks against their filters.

My takeaway is that it's time to move to a custom solution using a more flexible language. A simple async watchdog on total rule execution time would have prevented this. When running tons of Regex rules I'm amazed they didn't have this


I am wondering why you are being downvoted. This outage could have been prevented with better deployment procedures too.

For example my company (nowhere near the scale of Cloudflare) does progressive deployments. New code is deployed only to a handful machines first, and then as the hours pass and checks remain green it propagates to the rest of the server fleet. Full deployment takes 24 hours. We never had code breaking changes in production in the past 3 years. And before that, us breaking things was the most common occurence for production issues. Of course that's not the only thing we do, good test practices, code reviews etc.

The second thing, is separation of monitoring and production. If production going down takes down the monitoring systems too, you will have a very hard time figuring out what's wrong. Cloudflare says "We had difficulty accessing our own systems because of the outage". That sounds very bad.

I 'd wager there are many wrong things at play here other than "regex is hard". But I guess HN loves cloudflare way too much to ask the hard questions.


Yeah, they get some points for admitting WAF rule updates bypass canary deployments so that they can be applied ASAP. But still.

Recursion attacks against Regex are extremely well known. The only reason I can fathom for not having an execution time watchdog is that Nginx Lua runtime doesn't allow it. I assume the scripts run during a single request cycle on one thread due to Nginx async IO (one thread per core only).

That's still no excuse. They admit to running THOUSANDS of Regex rules in custom Lua scripts embedded in Nginx. This sounds like a bad idea to anyone that knows anything about software because it is.

My previous employer embedded way too much Lua script inside Nginx plugins for the same reasons (it's easy). Even at our "scale" (50 requests/second) we had constant issues. To think they run ~10% of internet traffic on such a rube Goldberg machine is proof you can use just about anything in prod (until it inevitably explodes at least)


Confused what this response is trying to say? Did you read the whole post? They addressed exactly those two things and explained how they're fixing them. You're just repeating part the blog post essentially; which is why I wonder if you finished reading it.


I'm interested in why they wouldn't use LPeg instead. Those seem a lot easier to compose, reason about and debug; plus they have restricted backtracking.


They still retain the global rollout for the other use cases detailed in the write up, so its generally tested, though not for this one use case as you point out. I suspect the tradeoff is reasonable, however having a short pre-stage deploy before global in all cases would be a more conservative option that would prevent an emergent push from becoming an even bigger emergency!


One way of dealing with this is regular drills. My employer has a cross-cutting rotation that exercises stuff like this weekly.


The other part of this story I did not see mentioned is that I suspect that password expiration also makes organizations more vulnerable to social engineering hacks because legitimate users (I have done this) become locked out due to poorly managed password expiration, then have to call in to restore access. The use of insecure identity and authentication mechanisms like student IDs and security questions is a recipe for abuse.

Good riddance to password expiration.


Unfortunately we still have to have similar authentication methods for other password resets. Users have an alarming tendency to forget their passwords after a week or two of holiday.


Also not helped by the fact that passwords have to include every symbol and their mother, cannot include sequential digits, cannot include sequential letters, cannot include any letter of your name, and a bunch of other inane rules that could be changed to simply having a minimum length of 12 instead of 8...


Always a joy when your generated password is refused:

    694*C73&4:Ekp>fy>SE&o![RC
(This is an example of what password-store generates.)

Not good enough, because it's too long. Nothing throws you back ten years in time like having to handcraft a password to comply with all the silly rules.


Not so long ago I had to register to a website allowing a comma (or was it a semicolon?) in a password during registration but refusing to login using said password. Fun times.


I once spent 15 minutes trying to register in a local Domino's website which kept bugging me about lack of a special character - even though I had one in it. Turned out to be that the app truncates the entered password after the first 20 characters and only considers the first part. Thankfully the special character was after the 20th position so I noticed the error and fixed it, but if it wasn't I'd be wondering the next time I'm logging in why it's not letting me login with a valid password.


Why do you need a secure account to order pizza?


How else are they going to run pepperoni-based big data analytics?


I've had the same problem with Verizon, in the past the password would only store the first 20 characters. Took me an hour or two to figure it out and fix the problem. I'm not sure if that's still the case, hopefully not.


My issue with Verizon is they lock an account after 3 bad attempts, and the "username" is the cell phone number for the account. Which seems to be slurped into some automated brute force engine.

Every single time I want to login, I have to do a password reset first. Makes me which I had the phone number to every manager in the company, so I could lock them all out every day.

Also, since having the phone is the only second factor for authentication, that's all you need to access an account.


You might not even have noticed if the login form truncated the input as well before hashing it. (You probably would have noticed after some update to their website removes the truncation though.)


Presumably it also truncates the password when doing sign-in?


I've had that with work passwords - using my password generator I give them a 64-character gibberish mess, but it turns out that they only accept 16 characters, and I rendered my account useless until they could reset it for me. How frustrating.


Hm, yes. I had that happen with a '#'. Presumably there was some nasty evaluation going on and the rest of the password was treated as a comment.

I wonder why I never pursued that.


You will usually get far better entropy by simply stitching together a random array of everyday words. Example: stitching better everyday words array entropy level. Anyway, as for the too-long problem, then I guess we're back to square one. :)


Strings of everyday words are better than the passwords most people choose, they're memorable, and they're often good enough from a practical perspective. But if you're using a password manager and don't have to remember passwords, you might as well use truly random passwords, which have more entropy.


I disagree. I use a 1Password, and I used to do the "totally random string of letters, symbols and digits", but I've dropped it for the "four or five random english words" alternative for all new passwords.

There are a couple of situations where having these symbol strings is really inconvenient. For instance, reading a password out loud to another person, or when logging in on a device where you can't (or don't want to) install your password manager on (e.g. a PS4 or an Apple TV). In those cases, "puncture-foible-irish-ducat-rejoice" is a lot easier to handle than "jh&6dQ#F]9.Z>u^t]6u+".

The "symbol" password has more entropy for sure, but the actual security benefit is essentially non-existent. No one's going to guess either password, and I'm never using the same password in two different places anyway. The extra convenience is totally worth it.

EDIT: as other people have pointed out in the thread, another example would be badly behaving sites that prevent "paste" or use other techniques to block password managers. Much easier to type in those words then.


... for the same length. But length doesn't really matter unless you're manually typing it in, in which case many people will be faster at typing words than random characters.


> [...] you might as well use truly random passwords, which have more entropy.

At what point is more entropy simply diminishing returns? Five random words gives you 64 bits, and six gives you 77 bits (each word = 12.9 bites):

* https://en.wikipedia.org/wiki/Diceware * https://www.rempe.us/diceware/#eff


The primary benefit of Diceware over a "random" string of characters is that it is easy to remember and truly random. With a password manager you don't need to remember the password and it will be generated truly randomly. A string of 11 random alphanumeric charatcers has more entropy than a 5 word diceware passphrase with the added benefit that it is less to type if you need to do so manually. But diceware can be a good idea for creating the master password for your password manager and if you do that you should probably use a 10 word passphrase rather than 5.


For anyone keeping score at home, some handy-dandy tables with entropy per symbol:

* https://en.wikipedia.org/wiki/Password_strength#Random_passw...


> Far better entropy

> random array of everyday words

No, that's not how entropy works. Random characters still score better.


Good try, but that doesn't apply to password managers. Several everyday words contain more bits than a garbled single word, but a string of dozens of truly random characters beat both.

You might be referring to https://xkcd.com/936/


That's the one! ^^ And you're of course correct.


A couple of years ago one of my banks "upgraded" its web site, forcing me to change my password to comply with its revised password guidelines since my old password was no longer permitted.

The result was a password that was shorter, less varied, and less secure than the previous one.

Good job, Chase.


I had a similar problem with Lloyds - every time I wanted to transfer money using the mobile app, I had to type in the password manually as they had disabled the "paste" option. Given my password was auto-generated and 16 characters long - and the password field wiped every time I did an app-switch, I just gave up.



It's very disturbing to see that your worst passwords are for your bank accounts. Each bank I've worked with has some weird limitation like this. Not to forget that the only form of MFA that most banks allow is SMS - assuming they even offer MFA.


Banks are probably still running on the old mainframe (old as in upgraded in 1998 when y2k forced it), with password storage that was state of the art in 1960 (plain text, but the file is actually protected well so hackers can't get it). That isn't to say better password cannot be used, just that they have never enabled it.


I don't understand that - I get that the system that holds the data is old, but when creating an online banking system shouldn't the piece that holds the data be a good half dozen steps removed from the website and authentication?


Not if you want a single sign on. Of course customers only use the web login, but internal people have to deal with all these different logins.


I have a Keepass app on my phone, Keypass2Android:

https://play.google.com/store/apps/details?id=keepass2androi...

It has a keyboard bundled into it that will ghost type in the currently opened username and password. Works awesome for stupid stuff like your story.


> my old password was no longer permitted.

But how did they know? They should just have the hash...


If they implemented it properly they could have checked the current password against the revised guidelines on the next login. No need to store it in plain text


The website can check the password during login without storing it in plaintext


The login form usually sends the password in cleartext and it's then hashed on the server-side prior to comparing it to the hash stored in the database.

So they can just determine the password's strength at the time when the user is logging in


FirstDirect's "digital secure key" Android app, which allows me (or someone else who happens to get hold of my phone while it is unlocked) to transfer a few grand out pretty easily, limits passwords to "between 6 and 9 characters". And offers fingerprint based auth as an alternative, because we all know how infallibly secure that method is.


People were forgetting their passwords and using up valuable support time.

Since the responsibility for password storage is on the customer anyway, we might as well make our password a maximum of 6 characters!


Some never memorize their passwords at all. Instead relying on 'forgot' emails and "Remember Me" features entirely.


Which isn't even that bad of an idea. Some website basically use this as the only way to log in.


Slack does this exceptionally well. If you forget which accounts you have, you can put in an email address and it will email you a list of your Slack accounts. If you forget your password, you can get a magic link that automatically signs in through a deep link into the app, no password needed.


It's such a cool idea. If you can reset your password using only your email, there's no security reason you can't just log in with it. It might even be better, since you can then add more annoying steps to the password reset strategy.


But Slack then must rely on the security of your email. If the site is dealing with sensitive information like credit cards, this could be a no go.


Any site that has a "enter your email for a reset link" feature relies on your email security.


Almost every website in existence except the most security sensitive like bank websites will allow you to reset your password with email.


What email based log in that doesn't use 2FA doesn't ultimately rely on the security of your email?


Indeed. On sites I have to register but know I won't go frequently I enter a random password I don't even write down, relying on the Forgot password feature if I ever need to come back later.


I have wondered if some web pages effectively have this as the main log in method. If you have a hurricane tracking page, everyone is going to forget their passwords in between hurricane seasons.


Steam has nearly done this for me.

Oh, it has a password. But if I remember my password I have to check my email and copy and paste a code from there. And if I forget my password I have to... check my email and copy and paste a code from there... really not much point to the password.


Bulb energy supplier in the UK trialled this - they soon switched due to complaints although I didn't really mind it.

Assume it was due to the inconvenience of not being able to remember password/stay signed in.


Yahoo Japan (not really related to the defunct original Yahoo and still very successful in Japan) recently abolished passwords for new accounts.

You can only login with reset emails or SMS codes, which is pretty annoying.


I wondered about this too and asked about it on the security stackexchange forum in case I was overlooking some glaringly obvious reason not to. Turns out that most thought it was reasonable too, though maybe too frustrating for some.

https://security.stackexchange.com/q/12828/8518


I've seen Blendle and a couple of other web sites do this.

You go to the login page, and your choices are federated login, standard login, or a one-time login e-mail.


medium.com is famous for email OTP authentication. They even blogged about it (search hacker news for more information)


In India, most mobile apps have phone number for username and OTP instead of password. Makes perfect sense for mobile apps. Except when OTP doesn't arrive due to congested sms networks. Or that your account gets hacked with sim takeover or sms MitM (both are currently unheard of in India).


I think you just jinxed it.


It is one way to go "passwordless" .. though you're piggy backing on the security that your email system already has.

Shameless plug of old post that describes how to restrict login to only the initiator even if login is initiated via an email link - http://sriku.org/blog/2017/04/29/forget-password/


You're relying on your email security either way, since anyone can trigger the password reset email if they get access to your email account.


This breaks my workflow -- I almost never open the forgot-password email on the same machine I used to initiate the request. Usually I need to briefly access a personal account from somebody else's computer or my work computer, and when I'm told I need to check my personal email, I only want to open that on my phone.


Honestly this just seems like there needs to be a better way. Maybe some multi-factor system that requires like a physical key and either a secret and/or some identifying thing.


Kinesis is not necessarily well-suited fan-out. It is very well suited for fan-in (single consumer, multiple producers).

Each shard allows at most 5 GetRecords operations per second. If you want to fan out to many consumers, you will reach those limits quickly and have to implement a significant latency/throughput tradeoff to make it work.

For API limits, see: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: