1
$\begingroup$

I am actually a pure mathematician, who stumbled over this paper «Protein-Folding Analysis Using Features Obtained by Persistent Homology» by Ichinomiya et al. (Biophys. J. 118, 2020, 2926-2937; link), and then red some introductions on protein folding, as well as articles on Deepminds progress on the issue.

Assume we are given a protein $p$ in primary structure, that is $p = [a_1,...a_n]$ is a finite, non empty list of amino acids denoted by $a_i$. A protein $p' = [b_1,...b_m]$ is called a subprotein of $p$ if there is some number $l \in [1:n]$ such that $a_{l+i} = b_i$ for all $i \in [1,...m]$.

Let $Q(p)$ denote the quaternary protein structure of $p$. It can be viewn as a subset of vertices (and edges if one needs that) of $\mathbb{R}^3$.

Question 1: Are there known examples (or even a database?) of proteins $p$ in primary structure, such that a subprotein $p'$ even exists?

Question 2: Assume we are given $p$ and a subprotein $p'$. Are there known examples for which $Q(p')$ is a subset of $Q(p)$? Note that we do not just need some injection, but a distance and angle preserving injection!

Do not forget that i am a noob. So these questions may be anything, from trivial to impossible to answer. I can not guess the outcome. Otherwise i would not ask.

For the interested reader: By a new result of Yuri Manin, persistence homology is linked to Nori-motives from algebraic geometry. As the paper cited above did come calculations on proteins via persistence homology, it may be possible to somehow "pull back" to the Nori category and derive a motivic decomposition of the folded protein but also of the protein in primary structure embedded in a plane. If question 2 has an "yes" for at least some cases, i expect for some isomorphic motivic summands to occur in both decompositions of $p$ and $p'$. If this is the case, then there could even be some common summand in the the motive of $p$ and $Q(P)$.

$\endgroup$
9
  • 2
    $\begingroup$ Sorry, but I have a feeling question made like you did wouldn't be useful or clear to pretty much anyone :/ If you just made up this "subprotein" term than it's even more likely. That's the reality - without proper jargon and wording in general, one can hardly make a good question. $\endgroup$
    – Mithoron
    Commented Jun 10, 2021 at 18:23
  • $\begingroup$ en.wikipedia.org/wiki/Protein_domain $\endgroup$
    – Mithoron
    Commented Jun 10, 2021 at 18:41
  • 1
    $\begingroup$ If Question 1) asks for the primary structure for the proteins, i.e., the mere sequence of the proteins, then this is a task entries from a database like uniprot.org may be used. E.g., BioPython contains functions to identify such similarities. But the concept isn't limited to proteins; you equally may apply this similarity check on nucleic acids, too. Perhaps a detail if a forensic lab (e.g., paternity test) / Craig Venter's human genome project use(d) an interpreted or a compiled language for performance, though. $\endgroup$
    – Buttonwood
    Commented Jun 10, 2021 at 19:14
  • 1
    $\begingroup$ I think you mean tertiary rather than quaternary structure for Q? Most proteins do not have quaternary structure, as they are free monomers. $\endgroup$
    – Andrew
    Commented Jun 10, 2021 at 19:39
  • 1
    $\begingroup$ If you want to apply math to the folding of macromolecules, you should also investigate (single-stranded) RNA folding. In some ways it is more amenable to a theoretical approach than protein folding, because its folding is hierarchical: You can treat the creation of the secondary structure (base pairing) and tertiary structure (how the molecule arranges itself in 3D space) as separate problems. A lot of mathematical models have been created for RNA folding for this reason. Protein folding, by contrast, is less tractable, because the secondary and tertiary structures form cooperatively. $\endgroup$
    – theorist
    Commented Jun 23, 2021 at 23:30

2 Answers 2

5
$\begingroup$

To question (1), the short answer is yes, such scenarios do exist, but they are rare for m greater than 10 or so due to the large number of possible sequences and the high frequency of substitutions even in very closely related proteins. Those that do exist are often encoded by the same gene, and the deletion of parts of one polypeptide (producing a "subprotein" by your definition) result from either posttranscriptional processing (deletion of parts of the encoding mRNA) or posttranslational processing (deletion of part of the synthesized polypeptide by proteolytic cleavage or intein excision).

For question (2), I'm going to assume you meant tertiary structure rather than quaternary, and your question, simply stated, is whether a subprotein has the same 3D folded structure alone as it does when within a larger protein. There are some cases where this is true, such as when short signal sequences are cleaved off. You can also have cleavage in a linker region between two folded domains, such that the folds of the domains are not affected. This result, however, cannot be universally assumed, as there are also many cases where deletion even of short segments of the primary sequence results in changes to the folded structure.

A well known example is trypsinogen, from which 15 amino acids are cleaved to produce the subprotein trypsin. The trypsin rearranges so that its overall structure is slightly different from the structure it had when part of the trypsinogen molecule. The new fold is catalytically active, whereas the trypsinogen is not. This type of activation that results from proteolytic cleavage is a common theme for enzymes, and the larger inactive forms are called zymogens.

Hopefully this is enough to point you in the right direction to read up more on this topic.

$\endgroup$
1
  • $\begingroup$ Thats a great answer. I have no idea which databases and software to consult (there some mentioned in the paper, but fiddling out how much of the whole picture they present will be very hard for me), but Biopyhton sounds like a start. I didnt even know that it existed. $\endgroup$
    – Guestlee
    Commented Jun 10, 2021 at 23:20
2
$\begingroup$

The protein structural terminology in your question needs some work. For example "protein in primary structure embedded in a plane" - the primary structure is the sequence of the protein, and it is not clear how that would be embedded in the plane. Perhaps you mean embedding the graph corresponding to the atomic structure?

Pulling back, however, the larger question is "Can abstract mathematics (such as algebraic geometry) be applied to protein structural analysis?". The answer is that only if you can translate from the mathematics back to biology.

Consider this quote from the paper you link to:

Although PH is a highly effective tool for the analysis of nonlocal structures, it has several inherent limitations. First, PH results are sometimes difficult to interpret. In the original PH analysis, we obtain two values called “birth” and “death” for each loop or cavity and make decisions based on the distribution of these values. Frequently, these two values are insufficient to understand the physical relevance provided by PH.

They go on to describe how to fix the problem of this analysis being incomprehensible, but I think you see the point. A mathematically 'neat' description of something like protein structure might be elegant and amenable to powerful tools to analyse and transform that description. However it may be difficult to 'translate' back to conclusions about the actual biology or chemistry.

So using Nori-motives to say something about protein structure might 'work' in some sense, but the important thing would be what question in protein structural analysis it would answer. Also important is whether it would answer it better than simpler techniques that might be easier to understand and extend.

$\endgroup$
1
  • $\begingroup$ Yes you are right. I consider the 3d model of the protein before folding. I quietly assume that a protein starts folding only after it is totally generated. I fear that this is wrong and the bending starts, when the generating process is still going. Interpreting the result is anorher story of course, thats true. But since i am a pure mathematician, that isnt my biggest concern :) Establishing some correspondence beetween some categories or objects, is good enough for me. $\endgroup$
    – Guestlee
    Commented Jun 10, 2021 at 23:24

Not the answer you're looking for? Browse other questions tagged or ask your own question.