Cas9's comments

Cas9 · on Sept 2, 2021

CloudWalk | Remote | Full-time | https://www.cloudwalk.io | https://www.cloudwalk.io/self

All positions are full-time and 100% remote. Work remotely from anywhere.

We are creating a global payment network on blockchain.

The existing legacy payments infrastructure that connects banks, merchants and consumers is outdated and not ready for a future where humans and AIs are going to exchange value in decentralized ways, using a variety of different credit products and currencies.

First, we set out to fix the broken Brazilian payments ecosystem by providing extremely low fees in a highly concentrated and unjust market. Now we are going global, creating ways to process credit card, mobile wallet payments and lending in real time using smart contracts for issuing, authorization, and settlement of financial products.

Our mission is complex and we are looking for ambitious and talented people to help us shape the future of payments and global financial networks, while providing privacy and first-class security for consumers, merchants, credit issuers, bank members.

We are hiring for multiple positions, including:

* Senior Software Engineer - Backend

* Senior Software Engineer - Frontend

* Senior Software Engineer - Mobile

* Software Engineer - Risk

* Data Scientist - Risk

See these and all other positions here: https://careers.cloudwalk.io

Tech Stack: Ruby, Solidity, Python, JS, Rust, Flutter/Dart, Go, React, Elm, Postgres, GCP, AWS.

Cas9 · on July 30, 2021

2016 · Cited by 6038

Cas9 · on July 15, 2021

Honest question: since AlphaFold doesn't really _solve_ the protein folding problem (it's NP-complete after all), but only _approximates_ solutions very well, what are the real impacts of this? Isn't a good approximation of a protein enough to cause unexpected problems? How do we know that an approximate structure will perform the same as the correct solution?

Ultimatt · on July 15, 2021

There is a lot of bias in the chat here from a more chemistry and pharma slant. If you ignore this AlphaFold solves in a very meaningful way the problem blocking a lot of science investigation.

For comparative and evolutionary analysis structure is far more conserved than sequence. Especially in things like viruses or anything with a high rate of reproduction like bacteria. Just knowing the general fold or overall structure is enough to do structural alignment and tell if two genes are related on that basis, even if their genomic sequence is completely dissimilar. Large groups of researchers rely on sequence homology built from sequences of known structure.

But AlphaFold works well in new sequence space to far more accuracy than is needed. If we had an AlphaFold prediction for every known sequence suddenly the evolutionary relationships between all genes and even all species would be far clearer. This on its own unlocks a new foundation to reason about function and molecular interaction with a wholistic systems view without gaps in what we can know with some reasonable assurance.

For an analogy think of the difference between having books in different languages describing objects. You know what some of the book in English might say but you dont even know if the book in Spanish is even talking about the same things. AlphaFold is like an AI that transforms all the books into picture books and now we can use image similarity or have one person look at all pictures.

haihaibye · on July 16, 2021

> even if their genomic sequence is completely dissimilar

I think you mean amino acid homology? (due to synonymous mutations)

I looked it up and you're right, protein structure/motifs are much more highly conserved than amino acid sequence https://humgenomics.biomedcentral.com/articles/10.1186/1479-...

Cas9 · on July 15, 2021

Honest question: since AlphaFold doesn't really _solve_ the protein folding problem (it's NP-complete after all), but only _approximates_ solutions very well, what are the real impacts of this? Isn't a good approximation of a protein enough to cause unexpected problems? How do we know that an approximate structure will perform the same as the correct solution?

thxg · on July 15, 2021

> (it's NP-complete after all)

Protein folding is a physical/biological phenomenon. AFAIK we don't currently have a proper exact mathematical formulation of the problem that would let one determine its complexity.

You may be referring to this paper [1]. It only claims that one particular optimization problem, believed to give a solution to protein folding problems, is NP-hard. So, even if a suitable exact formulation exists, it is not yet proven that protein folding is hard, although it for sure seems plausible.

By the way, it is perfectly possible today to solve some very large-scale NP-hard problems (think millions of variables and constraints) in reasonable amounts of time (think minutes or hours). Examples are knapsack problems, SAT problems [2], the Traveling Salesman Problem [3] or more generally Mixed Integer Programming [4].

[1] "Complexity of protein folding", 1993, by Aviezri S. Fraenkel

[2] http://www.satcompetition.org

[3] http://www.math.uwaterloo.ca/tsp/

[4] http://plato.asu.edu/bench.html

dekhn · on July 15, 2021

The protein folding problem is not NP complete. The "formal" protein folding problem, as posed (find the set of dihedral angles whose resulting structure has the lowest energy) might be, but that bears only a distant resemblance to how people "solve" the problem today. At the very least, the statement is incorrect because many proteins don't actually fold to their energy minimum, they get stuck in kinetic traps, and the formal PF defintion never accomodated that idea.

ashtonbaker · on July 15, 2021

Not really an answer to your question, but is the problem really NP-complete, or just combinatorially difficult? For example how is this condition of NP-completeness satisfied?

> it is a problem for which the correctness of each solution can be verified quickly [0]

[0] https://en.wikipedia.org/wiki/NP-completeness

Cas9 · on July 15, 2021

According to this answer[0] it seems it's actually NP-Hard, my bad. Haven't seen the proof though, and I'm not an expert.

[0] https://cs.stackexchange.com/questions/128493/is-protein-fol...

bawolff · on July 15, 2021

I dont know much about protein folding, but for most things in life,exact solutions to NPC problems usually aren't needed for non-contrived problems. In many cases, approximations are good enough.

Besides, this is real life - if predictions and real life match, that's great. If they don't, well you know you went wrong somewhere.

jerven · on July 16, 2021

I would upvote this twice if I could. Life science quite often NP-hard still approximate results are extremely useful.

Joke, which I think is from Sean Eddy (hammer).

Bioinformatics approaches a Computer Scientist for help with a hard problem. CS agrees to help. Year later CS comes back very excitedly. "your problem is not hard it is NP-hard!". Bioinformatics nods, and says "I still got to solve it" and continues finding ever faster and better approximations ;)

Also problem space is both bounded (you don't have infinite length proteins) and f'd up in reality. e.g protein hijacking and re-conformation in the face of an infectious agent.

wpasc · on July 15, 2021

A very-non-expert opinion, if an approach approximates it pretty well and can be improved upon, then it could end up being quite useful. Given that biology exists on a real, tangible scale then perfection in the fold prediction isn't necessary, instead just an approximation that is sufficiently good to be functionally useful.

^ That sounds like word-salad BS but I think there's some truth to it. I know protein folding has been postulated to be useful in terms of understanding basic biology, understanding disease pathology, and drug prediction. While a wide range of approximations are functionally useless, perhaps the Alphafold approach or some improved version of it surpasses the functionally useful threshold.

At least I hope so

radus · on July 15, 2021

Yes, it is still useful. Even structures obtained through traditional means (eg. x-ray crystallography) are approximations to an extent since there are limits to the resolution that you can obtain and oftentimes regions of proteins are "disordered". Additionally, these structures are only snapshots of a protein in a particular state, which may not completely reflect the dynamics of the protein in its native environment.

whimsicalism · on July 15, 2021

You want to find a protein that has X structure (since structure determines function to a degree).

If AlphaFold is substantially more accurate at solving proteins, it can mean that drug discovery is faster, assays are faster, etc. etc.

The "unexpected problems" would be caught in the assay stage.

radus · on July 15, 2021

Kind of disagree with this.. solving protein structures is not the rate limiting step in drug discovery or in biochemical assays -- not by a long shot. See this excellent comment by @dekhn on a related submission: https://news.ycombinator.com/item?id=27849046

hobofan · on July 15, 2021

I would expect that once AlphaFold has helped you identify a potential protein (e.g. as a drug) out of a bigger set of potential proteins, there will still be a manual step of traditional cryoEM, NMR, etc. to get an accurate high-resolution structure.

t_serpico · on July 15, 2021

To me, the interesting thing is not the specific results but rather that you can accurately predict crystal structures from sequence alone. This begets the question: what other physical biological properties can we predict?

mrfusion · on July 15, 2021

Is it really np complete? If so we could map other np complete problems onto it and let biology solve it for us.

nmca · on July 15, 2021

NP completeness tells you about the hardest cases, not the most useful cases.

saithound · on July 15, 2021

AlphaFold is not about solving any kind of NP-complete problem.

Proteins consist of chains of amino acids which spontaneously fold up to form a structure. Understanding how the amino acid chain determines the protein structure is highly challenging, and this is called the "protein folding problem".

People use mathematical models to predict how proteins fold in nature. Many such mathematical models are stated in terms such as "proteins fold into a configuration that minimizes a certain energy function". Even the simplest such models [1] give rise to NP-hard decision problems, which are also known (somewhat confusingly) as "protein folding problems". To make this a bit less confusing, I will call the mathematical decision problems PFPs.

Like all mathematical models, our protein folding models don't correspond exactly to reality. Even if you are somehow able to determine the exact mathematical solution to a mathematical PFP, that _still_ doesn't guarantee that the real protein that you were trying to model behaves like the mathematical solution would indicate. E.g. the protein may fold in such a way that it gets stuck in a local optimum of the energy function you were using.

How do we detect this? We make inferences about how the protein should behave, given the mathematical solution to the Protein Folding Problem, and then we perform experiments, and find out (empirically) that the protein behaves in a manner that is inconsistent with the inferences drawn from the mathematical model. Scientists _do_ do this. And they would have to do it even if they had a fast, exact way to solve NP-complete problems, because the NP-complete problems are still just part of a mathematical model, and need not correspond to reality in any way.

The success of AlphaFold is not measured by how well it solves (or approximates) mathematical PFPs. The success of AlphaFold is measured by making successful predictions about how certain proteins will fold. And this is exactly how it was tested [2]: they threw it at a bunch of problems for which scientists have empirically determined how certain amino acid chains fold, but didn't release the results. And then they compared the solutions predicted by AlphaFold, and found that most of the predictions were consistent with what they knew to be the case.*

[1] https://en.wikipedia.org/wiki/Lattice_protein

[2] https://predictioncenter.org/casp14/index.cgi

* That's an understatement. The solutions were really very good, much better than those produced by any other submission to CASP14.

Cas9 · on July 16, 2021

Thanks a lot for the detailed explanation :-)