The Nature Methods Method of the Year for 2021 is protein structure prediction. Here is my feature on this. And I did a series of podcasts.
Here is episode 4, about AlphaFold and the impact it is having on junior scientists. I spoke with a group of researchers from different labs at the Max Planck Institute of Biochemistry.
I spoke with Dr Isabell Bludau, a postdoctoral fellow and computational biologist in the lab of Dr Matthias Mann, Dr. Bastian Bräuning, a postdoctoral fellow and project group leader in the Department of Brenda Schulman and Juan Restropo a PhD student in the lab of Dr Jürgen Cox.
You can listen to the podcast right here, a transcript is pasted below.
Here is episode 1, a chat with Dame Janet Thornton from the European Bioinformatics Institute and Dr. David Jones from University College London who talk about what AlphaFold can and cannot do, at least not yet, and the future of the field.
Here is episode 2, a conversation with Dr Helen Berman, co-founder of the Protein Data Bank. director emerita and, at least in my observation, the co-architect of the next phase of the Protein Data Bank. It's rather under wraps, but she does share a tidbit about what's next.
Here is episode 3, a chat with with some members of the Rost lab at the Technical University of Munich: Dr. Maria Littmann, postdoctoral fellow, and PhD students Konstantin Weissenow and Michael Heinzinger and Dr Burkhard Rost, principal investigator.
Transcript of podcast episode 4
Note: These podcasts are produced to be heard. If you can, please tune in. Transcripts are generated using speech recognition software and there’s a human editor. But a transcript may contain errors. Please check the corresponding audio before quoting.
And, of course, as a computational biologist, it's also super exciting to see that, like machine learning or computational novelty can actually really revolutionize biology, which I think hasn't been seen to this extent before. And this is very fascinating and motivating for me.
That’s Dr. Isabell Bludau, a postdoctoral fellow and computational biologist in the lab of Dr Matthias Mann at the Max Planck Institute of Biochemistry near Munich Germany.
Hi and welcome to Conversations with Scientists, I'm Vivien Marx. This podcast is with Dr Bludau and two other scientists at the Max Planck Institute whom I asked about the role of AI in biology. We talked about AlphaFold and the way it can predict the structure of proteins. We talked about how they view these new developments, about their careers and about how AlphaFold is influencing their work.
Just briefly, before we get to that, about this podcast series. In my reporting, I speak with scientists around the world, and this podcast is a way to share more of what I find out. It takes you into the science, it's about the people doing the science.
You can find some of my work, for example, in Nature journals. That's where you find studies by working scientists. A number of these journals offer science journalism. These are pieces by science journalists like me. This podcast episode is one of several I'm producing about the Nature Methods method of the year 2021, which is protein structure prediction.
It was chosen because of AlphaFold, a computational approach from DeepMind Technologies that has changed the way and the speed at which protein structures can be predicted.
You can find a bundle of commentaries on the Nature Methods site about how deep learning, especially AlphaFold, is shaping protein structure prediction, structural biology more generally and maybe even biology itself.
And I have a story there in that package for which I spoke to a number of scientists including this team at the Max Planck Institute.
I asked the scientists to introduce themselves, to teach me how to pronounce their names and to talk about how COVID19 has been and still is affecting their work.
Juan, do you want to go first? You're hardest, I think
I knew maybe you were a little bit confused with my name. Actually, this is not my full name. I have another one. So I actually have four names. So my full name is Juan Luis Restrepo Lopez. But of course, no one called me my full name. People call me Juan. Juan Restrepo will be like my full name, like what people will normally use.
Right. And so when your mother said, come to the table and eat, it wasn't one Restrepo and all the other names.
It was just Juan
Got it. Okay. Cool. I'll try, but I don't have that nice R that you have.
Just call me Juan. It's fine. Juan is fine.
My name is very easy. Isabell Bludau. Very straightforward.
Isabel Bludau, okay.
Ok, so I'm a postdoctoral fellow as a bioinformatician in the lab of Matthias Mann. And the general focus of our department is proteomics, where the goal is to identify and quantify all of the proteins in a given biological sample. And I'm working in the computational team. We develop software for analyzing large scale proteomics experiments, and this ranges from raw data processing all the way to systematic analysis of biological patterns from a systems biology perspective. So since I work full-time on the computer, I could fairly easily work from home during the COVID crisis.
Fairly easily being said that this means I can do, in theory, all my work. But the biggest challenge for me was to really keep up the enthusiasm over an extended period of time, because really, the interaction with the other colleagues and everything that makes science a lot of fun was limited down to a minimum, I would say. And I'm very happy that we can now go back to the lab more often as well.
And my name is Bastian Bräuning It's like in English. I think you would say Sebastian, but just without the se at the
Got it. Bastian Bräuning and you have a bit of an echo going on.
Oh, I do?
So let me just shift real quick. There's another room available that I can go to.
Okeee tick tock tick tock just showing the passage of time in a podcast here. As Bastian switches rooms. Ok he’s ready and has no more echo.
Hi, my name is Bastian Bräuning and I'm a postdoc and project group leader in the Department of Brenda Schulman at the Max Planck Institute for Biochemistry. I'm originally from Munich and have been at the Institute for about three years. Now. Our Department studies the proteins machinery inside the cell that recognize and target proteins for degradation. To this end, we assemble them inside the test tube and study their reactions. We also determine structural snapshots of these proteins complexes in action, using mainly cryoEM. As for COVID. So I would say a pure experimental
So I really have to be at the bench to do my work. And during the first and second lockdowns that we had in Germany for at least a year, I had limited access to the lab and to the bench, so it was difficult, and I had to arrange myself with colleagues very well try to be very organized about having access to the lab and a limited amount, but still be productive in the time.
Then we circled back to Juan Restrepo. When I spoke with him, he had recently finished his master’s degree at Technical University of Munich. Here, he talks about the impact of COVID on his work.
My name is Juan Restrepo. I'm originally Colombian. My background is in physics. I started my Bachelor in theoretical physics and then moved to applied physics. I did my master's here in Munich in that topic in the TUM. That's where I started working in data science, applied to biomedical data science. And then I met Jürgen Cox and working in his lab, now. I wrote my master thesis with him in machine learning, applied to DIA, data independent analysis in mass spec data.
After I finished my master, I joined his lab as a PhD student, where I'm currently developing more on my master project and also other projects also related with machine learning and its connection with biomedical data science, in particular mass spec data. So that's for my state of the current state of my career,. Regarding COVID, it was more or less similar to Isabell. I'm also a computational scientist, so I still have all the tools I needed to work from home. But the biggest challenges were that you were isolated from colleagues, so it was hard to have discussions. Also, the PI was in his office or at his place. So it was also a little bit harder to get some guidance, especially during my master thesis, but somehow it worked out.
And now we are back on the lab. And yeah, as now it's more fun again. And now we can really connect with people again.
Next, Juan Restrepo talked about the science that captivates his interest and Bastian Bräuning and Isabell Bludau riff on that from their perspectivse. They all come from different sub-specialites within science so it’s fun to hear the spectrum of views. Well, ok, I think it’s fun, you decide if you think so.
Juan Restrepo [8:40]
I guess in my case, I see myself as an applied physicist working on biomedical data science. For me, it's more about the problem, honestly, it's like as a physicist, as a mathematician, I don't know. Even as a computer scientist, you can just tackle different problems during your career. And I guess at the end sometimes they add up to something because you have so many like you took so many paths. Let's say, for me, it's more like if there is an interesting problem.
In the other one, you can PCR, like cloning, you can clone your molecules and then expand it and amplify them. And in the other one, you can't. So you have to do some tricks because you can't replicate your data. So all of those things kind of allow you to see a bigger picture. Let's say.
In the other one, you can PCR, like cloning. You can clone your molecules and then expand it and amplify them. And in the other one, you can't. So you have to do some tricks because you can't replicate your data. So all of those things kind of allow you to see a bigger picture. Let's say.
I asked the scientists about the tasks they do in their studies and about how they define themselves. Here’s Isabell Bludau and Bastian Bräuning. Isabell Bludau dosen’t like labels.
Very difficult to put a label on it. I would say, like I'm a computational biologist and I specialize in. I studied molecular biotechnology, actually, and focused on bioinformatics fairly early in my studies. And my big goal kind of during the work that I've been doing throughout my bachelor, master's, PhD and now my postdoc has always been to find cool biological questions that can be solved computationally. Ranging from method development to data analysis to answer some challenging, cool questions. And usually for me, this has been in context of systems biology.
So I first worked in genomics, and now I focus during my PhD and postdoc in proteomics. But in general, I think computational biology or bioinformatics is like the more appropriate term if you want to nail it down.
I guess I was trained throughout my PhD and a good part of my postdoc as a structural biologist. But really, since my time here at the institute, which is very interdisciplinary and also through collaboration outside of the Institute, I more and more see the importance of doing cell biology, biochemistry together with structural biology. So actually, right now, since a few months, it almost coincides with when AlphaFold came out, funny enough, but we're joking in the lab that actually I'm not doing any or barely any structural biology these days.
I'm going down some other paths. Cell biology is something I do a lot now, and it's great to learn this sort of. I wasn't trained as a cell biologist.
But I realized that it's very important and I have a lot of fun with it. So I guess I'm lucky that within my postdoc, I have the freedom to explore other fields, methods, and I hope that this will benefit me after my postdoc to be able to just have a more comprehensive outlook and perspective on doing proteins science as a whole.
AlphaFold is making waves with its computational approach--it uses machine learning-- that has been trained on the datasets in the Protein Data Bank, the PDB. And there are other platforms that also handle protein prediction with machine learning such as RosettaFold.
The Max Planck scientists gave about their first impressions of AlphaFold and explained when they first heard about it. For some that was when AlphaFold2 did really well at the Critical Assessment of Protein Structure Prediction (CASP). That’s a competition in which scientists test how well their methods do in computationally predicting protein structures.
Deep Mind Technologies developed AlphaFold. Deep Mind is a company that was bought by Google.
The AlphaFold team has begun generating protein structures that are filling up --in a rather big way--a database called the European Bioinformatics Institute-AlphaFold Protein structure database. And there was a paper that presented most of the human proteome predected by AlphaFold
Here’s Bastian Bräuning about his first impressions of AlphaFold
Bastian Bräuning [14:20]
I'm a structural biologist. So determining structures of proteins and their complexes is really what I do. And I have to be honest, I was not so aware of the AlphaFold before the database came online. To be honest, maybe much less aware than maybe some others, now hearing the podcast. But yeah, when the database came online, I looked through it and I looked at proteins that we were interested in. And I think I was just really very surprised at how well it was predicting even small details of protein structures and
So at that point, I think for at least a week, I was really struck by it. I spent a lot of time on the database. I spent a lot of time looking at things that I'm interested in now, things I used to be interested in. And yeah, I was just blown away by how much it was getting right.
Isabell Bludau talks about how she first heard about AlphaFold. Yes, of course scientists take in social media.
Isabell Bludau [15:25]
Maybe I can go next. Basically, I just first read about AlphaFold actually on Twitter, I think, and I noticed that a lot of people got extremely excited, and I first really couldn't believe how good the results were. But then I have to say that the real impact, maybe similar to Bastian, that AlphaFold had on my personal work happened when EBI, in collaboration with DeepMind published the structures of all human proteins. And this suddenly expanded available structures, structural information from a few thousand proteins to basically the entire proteome.
And this information can now be easily integrated into any systems biology analysis that I do. So since I'm doing a lot of these kind of global analyses of patterns in proteomics data, this is super interesting because we can now complement the information about the presence and quantity of proteins with the structural information. And this is, of course, enabling us to draw more complete picture. And this is, for me, probably the most exciting part of it. And, of course, as a computational biologist, it's also super exciting to see that, like machine learning or computational novelty can actually really revolutionize biology, which I think hasn't been seen to this extent before. And this is very fascinating and motivating for me.
I wondered if Juan Restrepo, as a computer scientist and physicist, had maybe heard about Alpha Fold earlier than most.
Juan Restrepo [17:10]
Well, I didn't know about it all along, of course. But the first time I heard about AlphaFold2 was in a course I was taking at the university called protein prediction, actually, where the teaching assistants were actually discussing a models they have for predicting structure based on some fancy machine learning algorithms. And they were like, 'there is this super cool model that just came out that is basically the best we can do so far, and it's really revolutionizing the way people are doing machine learning in computational biology. The paper is still not out, but the results look really cool.
This is probably AlphaFold1 then, right?
AlphaFold two. Yeah. Exactly.
I just finished my master, like, a few months ago. So that's why.
Thank you. So after that, I started looking or going into it like seeing which kind of information there was available. There wasn't much at the moment. There was only a blog post by DeepMind, and there was already the paper of AlphaFold1 one out. So I read that I saw what they were doing. I saw they also won the CASP13. And I thought it was pretty cool what they were doing because they were really solving, using machine learning for solving really important problems that not even the physics based models could address in a meaningful way.
So I thought it was really interesting that you could extract, like, only from the data a different way of tackling the problem. Let's say that there was a different way of tackling the problem.
AlphaFold trained on the experimentally determined prortein structures in the Protein Data Bank, the PDB. And I have a separate podcast in which I spoke with Dr. Helen Berman a co-founder of the Protein Data Bank and current co-architect of the next chapter of the PDB. She shares a little bit about what the next chapter of the PDB will look like.
For now there is the PDB and there is the EBI-AlphaFold Protein Structure Database. This duality is likely not going to remain. It’s all hush-hush about what exactly will unfold but she did say a few things in that podcast. So please tune into that if you have a chance.
Here, I am just going to insert one passage about how the PDB got its start. Here’s Helen Berman who mentions a crystallographer by the name of Walter Hamilton.
Helen Berman [20:20]
Remember the PDB was started by postdocs and trainees and graduate students. That's who was agitating for it way back.
I wasn't the only one. There were a few of us back then. This was the 60s. We were very young. We talked a lot. We were so excited by looking at the structures, and we thought, what can we do with all this? And I remember we had these meetings and we wrote petitions and we did all kinds of things to see if we could get the data out there.
It was the kids who did it. And then we had to convince this elderly guy, the 40 year old guy. We knew that somebody important had to make it happen, that we couldn't make it happen. But we had to convince him to do it. And we did. But the initial people involved were all very young.
Among the people who were involved among many people. But it was Gerson Cohen. Unfortunately passed away. Edgar Meyer passed away, myself, were people who were very active and collaborated sort of. And we had to do it all by snail mail. There was no email. And we would have these meetings about how do we make this happen? And we were all very young people, just beginning in our careers. And then we went to this Cold Spring Harbor meeting in 1971 and Walter had driven down from Brookhaven, and we kind of assaulted him and said, you know, we really need someone to do this.
And we knew we had enough sense to know that on our own, we couldn't do it. We needed somebody who had credentials.
We were writing letters and telling people what we thought had to happen. I think Edgar, Edgar, and Gerson were both, like beginning in their independent careers. Or one of them might have been a postdoc or research associate when this all began. I met Edgar when he was a postdoc. So we were young and I was a student. So that's how things really happened.
Yeah. That's absolutely the way it happened. And then we had the, as they say in my language, the chutzpah to go and say, this is what should happen. And that's what happened.
Back to the junior scientists at the Max Planck Institute of Biochemistry. Here’s they talk about what they would like to see in the data resource that will emerge and how they use these resources.
Isabell Bludau [22:30]
I would say generally, it's always nice if you have one platform where you can get the most information possible in a combined way. Because generally I'm struggling with this a lot because I'm usually going to different places and trying to integrate data from different databases. And it's a lot of pain, also, when different institutes or databases use different formats, everyone has their different conventions for how to supply the information. I think already now the structures are basically in the same format as the current PDB structure. So this is already nic e. But generally, I think having things on a combined shared platform is usually beneficial from my perspective.
Yeah, I think I would also agree. To some extent AlphaFold and PDB data has been already combined. For instance in the UniProt, which is really a go-to website. If you want to check sequence, if you want to check proteins and regulated organisms, if you go to one of these proteins sites, you will find these days for a lot of things that there's the available PDB structures as well as AlphaFold models available. It's from one website, essentially.
Just some short comment because I don't really know a lot about biological databases, honestly. My projects are like, after getting the data, I don't have that much information about what would be an ideal database. I’m not the person who will go to different databases and will fetch all the data in one project that I'm using. AlphaFold. We downloaded the database that they have available, and we had a problem because not all the proteins, not all the proteins were there. So we tried to look where else they were. And then we found the compatibility problem. Like you don't always have all the resources in one place and that's, of course, very painful. And that's not ideal. And then you have to spend a lot of time just trying to match everything.
An issue that has come up in my interviews is confidence. What does it take for users of these protein structure models to have confidence in them.
Isabell Bludau [25:00]
Maybe one comment from my side directly. So AlphaFold regarding the AlphaFold quality, it actually provides a per residue quality score. So each individual amino acid has a confidence metric to it. And this gives you some information about how certain you are that the amino acid is actually in the correct position. And in general, in terms of confidence on protein structures, like looking at the whole thing from a systems biology perspective, I'm more happy that there are less certain structures for a lot of proteins than super-high confidence ones for a very smaller subset.
And like one specific interest of mine, for example, post translational modifications, which are like small chemical, let's say, decorations on the protein. And these are often on unstructured regions on the protein. And it's still very good to know where they are approximately on the three-dimensional space, not only knowing where they are, but really in the one-dimensional sequence, but in 3D. And if you look in the PDB structures, there are, they are often not covered. And this is now super useful to have also these lower confidence regions available. And so I'm actually very happy that the information for the low confidence regions is provided AlphaFold.
Just a small comment on this. Structural biologists, when we build models, typically, we've always used structure prediction to help us model. When you're building a model, residue by residue, you will rarely do this without having in front of you secondary structure predictions of that protein. There's a lot of prediction that exists even before AlphaFold, and everyone knows or everyone in structural who works with this knows that it's never very entirely confident. Or some regions are more confident than others, but it's very useful data to have, and you're never quite entirely in the dark, even when you're building something from scratch. I think people know how to deal with confidence or low confidence when they need to.
I asked Bastian Bräuning and Juan Restrepo a bit more about these varying confidence levels of protein regions.
Bastian Bräuning [27:55]
So this becomes important, I think along different parts of the project of a structural biology or biochemistry project. So when you're actually building the model and let's say you have an electron microscopy map, the peripheries of this map will be of lower resolution usually. And modeling becomes less confident or accurate in those regions. And you have a core of this protein or complex where the data is better. And you can model with more confidence. This is important for building the model, and you want to deposit a model that is closest.
Let's say you don't want to models or let's say good practice would be to not models, maybe parts of the structure that just have low confidence data to back them up. But the lack of confidence or the lack of resolution in one part of the models can be a very good source to follow up either to generate hypotheses, this might be a more flexible or dynamic domain. And these regions can also be targeted for follow-up experiments. Biochemically, because as a mentor of mine said, often the most dynamic and least confident and most movable parts are the most interesting parts. That's where the action happens. So, it's not a positive or negative thing, really, whether something is low or high confidence. But it's a matter of how you interpret it what you use this kind of data for even the low confidence ones. I think that's what's important.
Yeah. I think that their approach, which they follow to actually come out with this confidence value that is called the PLDDT, the confidence is called the PLDDT. The approach they follow is really interesting because what they did. So let's remember that this is actually a prediction for every atom at a scale of nanometers. So this is extremely accurate. So what they did was they didn't predict a specific, let's say, position for the atom, but they actually predicted a probability distribution that the atom will be in a specific place.
So according to how, let's say wide or narrow the probability distribution is, they will see how confident the model itself thinks that the prediction is correct, which is very useful. Because it's actually representing how a spread the confidence of the machine learning algorithm is. And how do I see that integrated in, like, standard machine learning workflow or a standard way of solving a problem or like in my specific projects? I think it's extremely important because one of the things you're never sure is if you're doing everything correct. As a scientist, right, you try something out, you see the results, you know that it makes sense.
So you think you are correct, but you cannot ever be 100% sure. So to have as many tests as possible for a specific problem is, of course, something good. And I think that to build in, in the model is something that we should take as a lesson and implement it ourselves as much as we can. For example, using this confidence value, you could choose which points are better for you to choose for training. So if you want to train a new algorithm, if you want to include as you said, something in MaxQuant, then what we will do is only to take the regions where the confidence value is larger than 90. That is what they say larger than 90 for them is almost sure that is correct.
So those are the points of the region we will take for training. So between 70 and 90 will be something acceptable for prediction, but not for training. So this confidence value actually allows us to differentiate between something relatively good at something they think is right. And they have good reason to think is right, which is, of course, extremely useful.
Incorporating confidence levels and metrics will be important in many types of protein-based studies. Even when there are lower confidence levels about parts of a structure, they are helpful. Here’s Juan Restrepo and Isabell Bludau.
Juan Restrepo [32:30]
It’s essential, right.
I would second that the most important thing is that you do have this information. So I think having the full structure, but not knowing exactly like the confidence for each of the amino acids on an atomic resolution would be not so great but since this information is available, this really provides us a lot of grounds to also base our analyses on considerations that take the confidence into account.
How does AlphaFold do what it does computationally? AlphaFold was developed by Deep Mind Technologies a company that Google bought in 2014. So Deep Mind is part of Alphabet. That gave the AlphaFold developers access to Google compute resources. Which is something people in academia do not have.
And some of my interviewees discussed whether academia would have achieved this result of a fast protein structure prediction method on their own. Google uses TPUs not GPUs and that is what DeepMind had access to. TPUs.
Juan Restrepo [33:45]
First of all, they have great hardware like they have all the resources in the world, so you can run even the models, you don't think they will converge. You can run any type of models. You can go crazy and run 200,000 experiments because you have unlimited resources. So that's, of course, something very nice and something academic people in academia doesn't have. No one can answer if academia would have done the same with such a hardware. But what is for sure is that AlphaFold also use ideas that were currently there already.
Some of the ideas were there already. So at the beginning, they do something called the multiple sequence analysis. And that came from academia. They also use transformers; like the core, the machine learning algorithm of this is transformer. And that was developed by Google as well. And that was trained on a TPU. So AlphaFold2 was inconceivable without the hardware, for sure. Alphafold1 might be conceivable without that hardware because the architecture was much simpler.
I guess that will be something to note, they kind of go to Alpha One up to a place where they improved over the other teams. But they didn't really solve the problem. But to really solve the problem, they had to go very deep. And for going that deep, they needed a very fancy models with very fancy hardware that not everyone has.
An issue that has come up is the degree to which AlphaFold might elevate the role of structural biology. To some in biology, tho, proteins are just blobs. Blobs that stick to other blobs.
Here’s Bastian Bräuning and then Isabell Bludau on this aspect.
Bastian Bräuning [35:45]
So I think if you go back and think about all the important. When I was in high school and I was in biology class, everything was drawn on the board as blobs ribosomes, really the basic process of life, just because you draw them as blobs or maybe teach them as blobs to some extent, it doesn't take away from how important this is. So I think it depends on what you need, all this extreme evolution for when a lot of things, especially in teaching and for bringing people to science, are really well explained and demonstrated by quote 'blob', I don't think it has to be a bad term, depending on what you want to get across or what process you want to teach. And that's my personal take on this.
I fully agree with Bastian, depends on what you want to communicate. And a blob might be nice for some aspects, but totally miss the point in other directions.
AlphaFold is going to influence ongoing and future science in many ways. And there are yet still many unanswered questions about proteins that AlphaFold does not have an answer for. The scientists talk about what they are keen on understanding about proteins.
Isabell Bludau [37:05]
One very general big challenge in proteomics I think basically, of any kind of biomedical research in general is that we often detect a lot of interesting changes, like, for example, in terms of proteins abundance or the presence or absence of specific post-translation modifications. If we look at comparisons, for example, between healthy people and patients with a certain disease. But then the question is to identify which of these tons of observed changes actually have any functional
And. For example, it's known that post translational modifications are relevant for the formation of protein-protein interactions. But for this, they, of course, need to be somehow available on the accessible area of the protein and now, basically with AlphaFold there. This makes a lot of things more easy because we can investigate where are our modifications and the follow up steps. So I think I guess most of you have seen that there's already AlphaFold Multimer extension available as a preprint. And with this, you can investigate proteins complex formation and this, of course, now opens up a lot of opportunities for us to really in the proteomics field. Look at which of the targets or hits that we get are most likely to be, like, functionally relevant, and to narrow down our hits. And this will be, of course, extremely relevant for any kind of applied research. Also, if you want to go to more clinical applications, et cetera.
So maybe from like, this would be one of these general points. Right. So I think when AlphaFold2 first came out, everyone was like, okay, now maybe protein complexes will come next, but next probably being next year. But next was in a couple of months. And now it's, of course, super-exciting what comes next? So I think small molecule-binding will have a huge impact if you can do this even better than what's possible right now. And this will facilitate drug screens immensely. But then also what you mentioned already like modeling proteins dynamics, because this is, of course, again, another interesting aspect to actually see which proteins are likely to have multiple confirmations. And I think Bastian already mentioned that the regions that are currently predicted less confident might be the most interesting ones because they are actually doing interesting biological things. And then finally again, going back to the post-translational modifications and different alternative proteoform that a proteins can have.
So I was always working with this also, during my PhD, and if future models can really predict the impact of specific modifications or truncated proteins versions on their structure. This will really facilitate our understanding of whether these modifications have any impact on function, which I think are all, like, super interesting avenues. And this is probably one thing that I find most fascinating about AlphaFold in general, that this really not only solves a big challenge that was there in the field, but it also just opens up even more like avenues that have a massive impact themselves.
Maybe I want to comment on the work that I did myself here in the lab. So about a year ago, I was working on solving the structure of a proteins complex in the cell that has nine subunits and that assists other proteins in entering the membranes inside of cells. Basically, it helps other proteins fold inside membranes.
Human or other organisms?
It was a human structure that I myself was working on and had I had AlphaFold back then, I think assembling putting together the individual proteins of this complex would have been much easier for sure. It would have cut down the work. It took me something like six months. It would have for sure cut it down to a month or less. But I think what we really learned from this study was as gratifying as it is to look at the finished structure of the complex, you still really don't know what parts of the proteins are doing, how they work together in doing their biological function.
So even if we had AlphaFold back then, this would have been work that we would still have to do. And in the end, we had to still dig deep. After we got the structure, we had to dig very deep to sort of find dynamics in our data. And on top of that, follow up with a lot of biochemistry mutagenesis on our complex to really bring the structure to life and to really start to see functionalities on the structure which AlphaFold can't replace at this point. Even if AlphaFold back then could have predicted protein-protein interfaces or interactions.
And even if AlphaFold would have built the whole structure by itself, really, our story that we published could not have been published without everything else that came with the structural biology, biochemistry, all of it. So that was just one comment, I guess.
Maybe I could comment from another perspective. So I don't have the background to comment on like, how will affect that biology, biochemistry, system biology. But from the machine learning point of view, I think this is arguably the most important machine learning model of the decade. So it actually solves a 50-year old problem. And it's actually very similar to how the , how the first wave of machine learning started ten years ago. So ten years ago, it actually started when AlexNet was published. They actually won image processing competition called ImageNet, and they outperformed their own competitors by a lot.
So people kind of started to ask themselves, hey, what is this new technology? What can it do for us? And it's very similar to what is happening now. So this solved a 50 year old problem. This new architecture solved a 50 year old problem that is actually non-trivial to solve, of course. So they will only, like, open many doors, right.
And that will bring more people into the community. And increase the awareness and increase, let's say, the energy and the resources that are going to the field because it already proved that it c
In 2017, Google Brain scientists presented at the Conference on Neural Information Processing Systems and they published their approach in a paper called ‘Attention is all you need.’ Attention plays a role in the big jump that AlphaFold2 took at CASP14.
Juan Restrepo [45:10]
So this is exactly what they use. So 'attention is all you need' was the paper where Google actually introduced transformers. And transformers are the core of the AlphaFold two models. So in AlphaFold one, they use convolutional neural networks. And in AlphaFold2, they got rid of the CNNs and included transformers. And this was not like the idea of DeepMind. So transformers are actually revolutionizing. They come from natural language processing, but now they started to go to other fields in machine learning. And this is one of the applications of it.
So they are particularly interesting because they can detect long correlations. Maybe this is too technical.
But when you're processing a sequence, you normally do a recursive neural network, and then you sort of unroll the sequence and then process it somehow. And then with this new architecture, you can process the whole sequence in parallel. And because you are processing in parallel one, like you got information from left and right. So you're actually seeing the whole sequence every time. So that allows the models to actually figure out long term dependencies very well.
And this is what happened in proteins, because proteins fold. So two residues, two amino acids that are very far away in sequence. When they fold, they are actually very close. So you need to somehow introduce this to the models. And the best models to do that is a transformer and was introduced in that paper.
AlphaFold might have a negative effect on some scientific fields and specialties. And one of those areas is x-ray crystallography, which has been an important way to experimentally determine protein structures.
Juan Restrepo and Bastian Bräuning have some comments on that.
Juan Restrepo [47:10]
I was honestly thinking maybe Bastian. I don't know if Bastian would like to do it. It's just a question because Bastian actually did crystallography for his PhD, and this is something that AlphaFold actually is built on crystallography data, let's say. What took a crystallographer one year or two years to solve actually, AlphaFold can solve it now in I don't know, a few hours. So I guess maybe Bastian would be interested in explaining how was it obtained before the data. And why is this important? I don't know if that's important at all.
Right. Yeah. I did my PhD in protein crystallography. I dabbled a bit in cryoEM through a collaboration, and ever since I was a postdoc after that, I've really started to appreciate how enabling cryoEM was for structural biology coming as a crystallographer. And the next thing, of course, is now AlphaFold. And I think I see around me in my own lab or others on Twitter, how AlphaFold is now also helping efforts in cryo-electron tomography, for instance, which is also a next big thing I think in structural biology, which typically on average produces lower confidence data than single particle cryoEM.
Because we have better and better predictions for parts of bigger complexes, this will really enable cryoelectron tomography, too. Just like what I'm saying is I've gone through from one revolution to the next between my PhD and my postdoc, it's an interesting time to be in. As big of a surprise and maybe of a shock it was to structural biologists, I think once that's settled and you really start looking at the opportunities it gives to you, it becomes less worrying or something. There's still so much to be done, and not one method or one revolution is going to solve everything.
So I think if you keep this in mind in structural biology, I don't think you have to be super worried. And frankly, I think you were talking about ligands before, Juan. I think solving accurate structures of proteins bound to ligands, small ligands is still, to a large extent requires crystallographic data. Because crystallography now at the modern synchrotrons, it becomes a very high throughput method to screen ligands in a way that cryo-EM can't deliver yet. So these are technologies that are not going away. On the contrary, they're important in this time.
I think just very recently, even today or yesterday I saw on Twitter, better and better tomography data, cryo-EM data on the nuclear pore complex, which is really the largest proteins complex in the cell. This is really the realm of tomography and because it has so many subunits and luckily one knows the subunits mostly to the largest extent. So having something like AlphaFold in this instance is incredible because you can take a pretty accurate models of the part and fit it into your map and do this really in the context of the largest complex there is in this cell.
So I think cryoET people are very grateful for AlphaFold. I would think. Back then there was a few years ago, there were papers on the nuclear pore complex, and it was really if you look at these papers, which are really tour de force studies of crystallography, actually, where just individual parts of this pore complex have been determined by crystallography. And now periods later, you have a method in cryo-ET where you can really start plopping in these structures. So I don't know, everything has helped along the way to reach this point today. Really, there's room for every method, really, there has to be.
That was Conversations with scientists. Today's episode was with scientists at the Max Planck Institute of Biochemistry near Munich, Germany. I spoke with Dr. Isabell Bludau, a postdoctoral fellow and computational biologist in the lab of Dr. Matthias Mann, Dr. Bastian Bräuning, a postdoctoral fellow and project group leader in the Department of Brenda Schulman and Juan Restropo a PhD student in the lab of Dr. Jürgen Cox.
And I’d like to give a shout out here to Dr. Christiane Menzfeld at the Max Planck Institute of Biochemistry who helped find these participants.
And I just wanted to say, because there's confusion about these things sometimes, the Max Planck Institute of Biochemistry did not pay to be in this podcast. This is independent journalism produced by me in my living room. I'm Vivien Marx. Thanks for listening.