Part 17
Article: Biotechnology My Blog Title: The world, from the past to the present, retold from the timelines.
2023: Notable innovations: a large language model (ProGen) that could generate functional protein sequences with a predictable function, with the input including tags specifying protein properties. The Author(s), under exclusive license to Springer Nature America, Inc. Deep-learning language models had shown promise in various biotechnological applications, including protein design and engineering. Generative modeling for protein engineering was key to solving fundamental problems in synthetic biology, medicine, and material science. We posed protein engineering as an unsupervised sequence generation problem in order to leverage the exponentially growing set of proteins that lacked costly, structural annotations. We trained a 1.2 Billion-parameter language model, ProGen, on nearly 280 Millions protein sequences conditioned on taxonomic and keyword tags such as molecular function and cellular component. Taxonomy is the science of naming, describing and classifying organisms and includes all plants, animals and microorganisms of the world. This provided ProGen with an unprecedented range of evolutionary sequence diversity and allowed it to generate with fine-grained control as demonstrated by metrics based on primary sequence similarity, secondary structure accuracy, and conformational energy; conformational energy refers to the energy associated with the different spatial arrangements or conformations that a molecule can adopt due to rotation around single bonds. Here we described ProGen, a language model that could generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from more than 19,000 families and was augmented with control tags specifying protein properties. ProGen can be further fine-tuned or modified to curated sequences and tags to improve controllable generation performance of proteins from families with sufficient homologous samples. Homology refers to biological features including genes and their products that are descended from a feature present in a common ancestor. Large amounts of lysozyme can be found in egg white. Based on different characteristics (e.g., structure, catalysis and immunization), lysozymes were divided into three dominant families: chicken-type (c-type), goose-type (g-type) and invertebrate-type (i-type). In addition, several other types of lysozymes, including phage-type, bacterial-type and plant-type lysozyme, have also been recognized (Callewaert & Michiels, 2010; Cao et al., 2015). The artificial designs performed much better than designs that were inspired by the evolutionary process, said James Fraser, PhD, professor of bioengineering and therapeutic sciences at the UCSF School of Pharmacy, and an author of the work, which was published Jan. 26, in Nature Biotechnology. To create the model, scientists simply fed the amino acid sequences of 280 million different proteins of all kinds into the machine learning model and let it digest the information for a couple of weeks. Then, they fine-tuned the model by priming it with 56,000 sequences from five lysozyme families, along with some contextual information about these proteins. The model quickly generated a million sequences, and the research team selected 100 to test based on how closely they resembled the sequences of natural proteins as well how naturalistic the AI proteins’ underlying amino acid “grammar” and “semantics” were. Out of this first batch of 100 proteins, which were screened in vitro by Tierra Biosciences, the team made five artificial proteins to test in cells and compared their activity to an enzyme found in the whites of chicken eggs, known as hen egg white lysozyme (HEWL). Similar lysozymes are found in human tears, saliva and milk, where they defend against bacteria and fungi. The first documented structure of a functional artificial protein fully designed by AI was a lysozyme generated by ProGen with 69% identity to any known natural protein. Two of the artificial enzymes were able to break down the cell walls of bacteria with activity comparable to HEWL, yet their sequences were only about 18% identical to one another. The two sequences were about 90% and 70% identical to any known protein. Just one mutation in a natural protein can make it stop working, but in a different round of screening, the team found that the AI-generated enzymes showed activity even when as little as 31.4% of their sequence resembled any known natural protein.
Resources APP Composition
[Appstore Playstore]
Video Maker
PowerDirector
Picture Maker
Social Media Post Maker stylish app world Art & Design
In-text voice
[aiReader: AI Text to Speech]
[TTS Reader - Text To Speech withtheflow01]
MP3 volume-increase conversion
[MP3 Audio Gain and Equalizer]
[Super Sound Editor: Music Audio Editor, MP3 Cutter]
Music Sources and Titles: Pixabay
[Content composition of “In-Brief Archives Facebook Page” and of my blogger page “www.ilovemytimeoranothertimeofyours.blogspot.com” in sound and music does not represent the pictures, videos and text contents.] [Music volume is increased if deviated from the actual files.]
[an-epic-cinematic-short-version-2-243783]
[quirky-amp-fun-165272]
[corporate-swing-221471]
[for-short-video-stories-era-of-cyborgs-60-seconds-199953]
[nhattan-background-orchestral-hip-hop-music-36sec-234800]
[hip-hop-just-in-my-style-141059]
Picture sources: Peakpx.com and Pexels, Pixabay in PowerDirector and other websites:
2:https://m.media-amazon.com/images/I/61r96V2e8lL._AC_UF1000,1000_QL80_.jpg
3:https://m.media-amazon.com/images/I/61+NtYUWq4L._AC_UF1000,1000_QL80_FMwebp_.jpg
5:https://centuryofbio.com/p/designing-dna-with-ai
7:https://pbs.twimg.com/media/GNlWy4vX0AAPIgV?format=jpg&name=4096x4096
8:https://avatars.mds.yandex.net/i?id=38429d715d4afdeea40ec98bd3e4dd99_l-4433927-images-thumbs&n=13
9:https://m.media-amazon.com/images/I/61lqjPHNxQL._AC_UF1000,1000_QL80_.jpg
11:https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2.full
12:https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2.full
14:https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2.full
15:https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2.full
16:https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2.full
17:https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2.full
18:https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2.full
19:https://www.biorxiv.org/content/10.1101/2020.03.07.982272v2.full
20:https://cdn1.byjus.com/wp-content/uploads/2018/11/taxonomic-hierarchy.png
22:https://www.amazon.in/Cell-Molecular-Biology-S-C-Rastogi/dp/9395161868
23:https://www.saraspublication.com/books/cell-biology-molecular-biology/#iLightbox[]/0
24:https://pubs.acs.org/cms/10.1021/acs.jpclett.1c00778/asset/images/medium/jz1c00778_0008.gif
26:https://upload.wikimedia.org/wikipedia/commons/d/da/DNA_RNA_structure_%28full%29.png
31:https://media.geeksforgeeks.org/wp-content/uploads/20230601122106/Binominal-Nomenclature.webp
32:https://amphibiaweb.org/images/phylo-primer/Fig5_primer.png
33:https://www.pinterest.com/pin/288652657337869051#imgViewer
35:https://es.pinterest.com/pin/419468152769725182/
36:https://es.pinterest.com/pin/380202393524539722/
37:https://es.pinterest.com/pin/3940718419318541/
38:https://es.pinterest.com/pin/257690409922324916/
39:https://www.agatfilms-exnihilo.com/app/uploads/2020/07/Especes-despeces-%C2%A9-Ex-Nihilo-768x432.jpg
40:https://www.biorxiv.org/content/10.1101/2020.03.07.982272v1.full
43:https://www.scielo.org.mx/img/revistas/rmfi/v38n3//2007-8080-rmfi-38-03-360-gf2.jpg
49:https://i.pinimg.com/736x/65/dd/4c/65dd4c5cfe30d8efdf4ef9b5292fe589.jpg
53:https://www.chemborun.com/upfiles/pic1676960000.jpg
54:https://www.chemborun.com/upfiles/pic1676959971.jpg
55:https://www.chemborun.com/upfiles/pic1676959895.jpg
60:https://pubs.rsc.org/image/article/2024/FB/d4fb00155a/d4fb00155a-f1_hi-res.gif
61:https://biosciences.lbl.gov/wp-content/uploads/2023/02/41587_2022_1618_Fig1cd.png
62:https://biosciences.lbl.gov/wp-content/uploads/2023/02/41587_2022_1618_Fig3e.png
65:https://www.ucsf.edu/sites/default/files/2023-01/lysozeme.png
Video Sources: Pexels and Pixabay in PowerDirector and other websites:
68:https://www.pond5.com/stock-footage/item/94456147-dna-double-helix-strand-4k
69:https://www.pond5.com/stock-footage/item/39911871-dna-strand-double-helix-model
75:https://www.pond5.com/stock-footage/item/83653423-ribosome-rotating-model
77:https://www.pond5.com/stock-footage/item/103669608-taxonomy-tag-cloud-animated-isolated-white
80:https://www.pond5.com/stock-footage/item/74465647-cellular-level-molecular-science-animation
95:https://www.pond5.com/stock-footage/item/47377463-wild-animals-collage-montage-sequence-4k
99:https://www.pond5.com/stock-footage/item/44188346-dna-sequence-animation
101:https://www.pond5.com/stock-footage/item/146833924-medical-technology-concept-dna-gene-therapy
102:https://www.pond5.com/stock-footage/item/118284382-dna-concept-genetic-engineering-gene-therapy
103:https://www.pond5.com/stock-footage/item/103563869-lysozyme-wordcloud-animated-isolated
104:https://www.pond5.com/stock-footage/item/305639345-lysozyme-1lyz-spin-structure
106:https://www.pond5.com/stock-footage/item/284119831-bacteriophages
108:https://www.pond5.com/stock-footage/item/123863312-bacteriophage-dna-rna-fluid-3d-video-render
114:https://www.pond5.com/stock-footage/item/105647572-dna-encoding-proteins
115:https://www.tierrabiosciences.com/
116:https://www.pond5.com/stock-footage/item/156633730-protein-or-enzymes-or-hormones-cell-membrane
117:https://www.pond5.com/stock-footage/item/306536749-protein-structure-human-form
119:https://www.pond5.com/stock-footage/item/265747624-biotechnology-concept
120:https://www.pond5.com/stock-footage/item/105490745-3d-illustration-protein-or-enzyme
121:https://www.pond5.com/stock-footage/item/95153758-universal-blood-red-blood-cells-enzyme-motion
122:https://www.pond5.com/stock-footage/item/92864617-protein-or-enzyme-motion
Consulted References:
Refer to Part 3 for all consolidated references for all parts.


Comments
Post a Comment