191
$\begingroup$

The SARS-Cov2 coronavirus's genome was released, and is now available on Genbank. Looking at it...

    1 attaaaggtt tataccttcc caggtaacaa accaaccaac tttcgatctc ttgtagatct
   61 gttctctaaa cgaactttaa aatctgtgtg gctgtcactc ggctgcatgc ttagtgcact
  121 cacgcagtat aattaataac taattactgt cgttgacagg acacgagtaa ctcgtctatc
  ...
29761 acagtgaaca atgctaggga gagctgccta tatggaagag ccctaatgtg taaaattaat
29821 tttagtagtg ctatccccat gtgattttaa tagcttctta ggagaatgac aaaaaaaaaa
29881 aaaaaaaaaa aaaaaaaaaa aaa

Wuhan seafood market pneumonia virus isolate Wuhan-Hu-1, complete genome, Genbank

Geeze, that's a lot of a nucleotides---I don't think that's just random. I would guess that it's either an artifact of the sequencing process, or there is some underlying biological reason.

Question: Why does the SARS-Cov2 coronavirus genome end in 33 a's?

$\endgroup$
0

4 Answers 4

163
$\begingroup$

Good observation! The 3' poly(A) tail is actually a very common feature of positive-strand RNA viruses, including coronaviruses and picornaviruses.

For coronaviruses in particular, we know that the poly(A) tail is required for replication, functioning in conjunction with the 3' untranslated region (UTR) as a cis-acting signal for negative strand synthesis and attachment to the ribosome during translation. Mutants lacking the poly(A) tail are severely compromised in replication. Jeannie Spagnolo and Brenda Hogue report:

The 3′ poly (A) tail plays an important, but as yet undefined role in Coronavirus genome replication. To further examine the requirement for the Coronavirus poly(A) tail, we created truncated poly(A) mutant defective interfering (DI) RNAs and observed the effects on replication. Bovine Coronavirus (BCV) and mouse hepatitis Coronavirus A59 (MHV-A59) DI RNAs with tails of 5 or 10 A residues were replicated, albeit at delayed kinetics as compared to DI RNAs with wild type tail lengths (>50 A residues). A BCV DI RNA lacking a poly(A) tail was unable to replicate; however, a MHV DI lacking a tail did replicate following multiple virus passages. Poly(A) tail extension/repair was concurrent with robust replication of the tail mutants. Binding of the host factor poly(A)- binding protein (PABP) appeared to correlate with the ability of DI RNAs to be replicated. Poly(A) tail mutants that were compromised for replication, or that were unable to replicate at all exhibited less in vitro PABP interaction. The data support the importance of the poly(A) tail in Coronavirus replication and further delineate the minimal requirements for viral genome propagation.

Spagnolo J.F., Hogue B.G. (2001) Requirement of the Poly(A) Tail in Coronavirus Genome Replication. In: Lavi E., Weiss S.R., Hingley S.T. (eds) The Nidoviruses. Advances in Experimental Medicine and Biology, vol 494. Springer, Boston, MA

Yu-Hui Peng et al. also report that the length of the poly(A) tail is regulated during infection:

Similar to eukaryotic mRNA, the positive-strand coronavirus genome of ~30 kilobases is 5’-capped and 3’-polyadenylated. It has been demonstrated that the length of the coronaviral poly(A) tail is not static but regulated during infection; however, little is known regarding the factors involved in coronaviral polyadenylation and its regulation. Here, we show that during infection, the level of coronavirus poly(A) tail lengthening depends on the initial length upon infection and that the minimum length to initiate lengthening may lie between 5 and 9 nucleotides. By mutagenesis analysis, it was found that (i) the hexamer AGUAAA and poly(A) tail are two important elements responsible for synthesis of the coronavirus poly(A) tail and may function in concert to accomplish polyadenylation and (ii) the function of the hexamer AGUAAA in coronaviral polyadenylation is position dependent. Based on these findings, we propose a process for how the coronaviral poly(A) tail is synthesized and undergoes variation. Our results provide the first genetic evidence to gain insight into coronaviral polyadenylation.

Peng Y-H, Lin C-H, Lin C-N, Lo C-Y, Tsai T-L, Wu H-Y (2016) Characterization of the Role of Hexamer AGUAAA and Poly(A) Tail in Coronavirus Polyadenylation. PLoS ONE 11(10): e0165077

This builds upon prior work by Hung-Yi Wu et al, which showed that the coronaviral 3' poly(A) tail is approximately 65 nucleotides in length in both genomic and sgmRNAs at peak viral RNA synthesis, and also observed that the precise length varied throughout infection. Most interestingly, they report:

Functional analyses of poly(A) tail length on specific viral RNA species, furthermore, revealed that translation, in vivo, of RNAs with the longer poly(A) tail was enhanced over those with the shorter poly(A). Although the mechanisms by which the tail lengths vary is unknown, experimental results together suggest that the length of the poly(A) and poly(U) tails is regulated. One potential function of regulated poly(A) tail length might be that for the coronavirus genome a longer poly(A) favors translation. The regulation of coronavirus translation by poly(A) tail length resembles that during embryonal development suggesting there may be mechanistic parallels.

Wu HY, Ke TY, Liao WY, Chang NY. Regulation of coronaviral poly(A) tail length during infection. PLoS One. 2013;8(7):e70548. Published 2013 Jul 29. doi:10.1371/journal.pone.0070548

It's also worth pointing out that poly(A) tails at the 3' end of RNA are not an unusual feature of viruses. Eukaryotic mRNA almost always contains poly(A) tails, which are added post-transcriptionally in a process known as polyadenylation. It should not therefore be surprising that positive-strand RNA viruses would have poly(A) tails as well. In eukaryotic mRNA, the central sequence motif for identifying a polyadenylation region is AAUAAA, identified way back in the 1970s, with more recent research confirming its ubiquity. Proudfoot 2011 is a nice review article on poly(A) signals in eukaryotic mRNA.

$\endgroup$
7
  • 1
    $\begingroup$ Maybe worth noting that the poly(A) tail in eukaryotic mRNAs serves as a binding region for Poly(A)-binding protein which helps take mRNA out of the nucleus. $\endgroup$
    – svavil
    Commented Jan 26, 2020 at 10:20
  • 7
    $\begingroup$ User NONONO mentioned in comments a NOP sled, a trick used in computer viruses, used to raise the probability of a piece of code (randomly injected into the host) to be actually ran. In that context, it's always a starting point. Is it possible that this tail-of-As, as described here, is actually not a "tail" (end of data, last piece translated, etc), but a "head" instead? A "head" that is "caught" or "detected" by something and then it is where the process begins and proceeds with reading/processing/replication, or fails if the A-chain is too short and the process "loses its grasp" on it? $\endgroup$ Commented Jan 26, 2020 at 22:41
  • 1
    $\begingroup$ Knowing that these entities all share a similar 3' poly(A) tail and that it helps in replication, is it possible to essentially bind off the end with a synthetic molecule, or would that interfere with regular host operation? $\endgroup$
    – HouseCat
    Commented Jan 27, 2020 at 15:19
  • 3
    $\begingroup$ @JPhi1618 3-prime $\endgroup$
    – user170231
    Commented Jan 27, 2020 at 17:36
  • 2
    $\begingroup$ @quetzalcoatl This is getting off topic, but yes, polyA tails are important to promote translation because the mRNA gets put head to tail in a circle. Here's a paper on this "circularization". pubmed.ncbi.nlm.nih.gov/9702200 $\endgroup$ Commented Jul 20, 2021 at 21:03
32
$\begingroup$

This question is quite general, so I'm going to attempt to tie it back to bioinformatics.

Background The tree for the current coronavirus is here, showing it is closely related to bat-coronavirus and in particular SARS.

Question The bioinformatics question for the current coronavirus is why this virus appears to be able to infect humans and transmit to human.

Genome size Firstly, you said that 30kb was large ... this is a standard size for a coronavirus genome, albeit it is unusual in that the family Coronaviridae are the largest genomes for a single stranded RNA virus, for example flaviviruses are 10kb. Thus, all coronaviruses are all approximately 30Kb. Some coronaviruses don't infect humans (zero symptoms), some cause very mild symptoms, others are MERS and SARS with 40-60% and 10% mortality rates, respectively. So, genome size is of little bioinformatics interest in my opinion.

Polyadenylation Polyadenylation and capping (5' methylation) enable the RNA to be trafficked and transcribed by ribosomes and the mechanism is widely used by viruses. Methylation would also prevent the innate immune response from the shredding the vRNA. Koonin and Moss (2010), interpreted a given capping mechanism as being common to the Mononegavirales - a viral Order including measles, mumps, Ebolavirus. Its a big statement, but regardless poly-A and capping are simply mimicking the host mRNA which a lot of viruses use. Poly-A and capping per se are not really interesting.

Evolution and SARS A more detailed examination the evolution of 2019-nCov and its epidemiology in relation to SARS can be found here

Conclusion The bioinformatics question is the genome size wierd - no, its standard for a coronavirus, is the poly-A weird - no its generic amongst lots of viruses as is capping. Is the length of the poly-A excessive (33 As), it looks odd but a human genecist/bioinformaticist needs to answer that ... so is it (potentially) linked with its epidemiology/clinical symptoms?

I don't think 33 poly-As are linked with anything bioinformatically interesting. This is because it will likely vary dramatically between genomes (not simply epidemic vs. non-epidemic strains). I don't know the mechanism for poly-adenylation, but I think slippage is a likely mutation resulting in large variations between individual genomes, particularly for poly-A - which notorious for slippage.

So ultimately could poly-As be linked with the ability of the new coronavirus to infect/transmit and could we therefore explore that bioinformatically? I personally think slippage mutations would prevent a clonal lineage emerging, i.e. that the size of the poly-As is not stable between genomes, but that would assume a given given mechanism of polyadenylation. Thus as a bioinformatics question I wouldn't pursue it, because I don't think there is sufficient biological rationale. I agree weird stuff should be questioned and that bit of the genome jumps out ... but I doubt it would go anywhere.

Slippage The definition of a slippage mutation is here, but basically it means this genome has 33 poly-As, however another isolate from the same epidemic could say have 30 poly-As (just an example), another might have 25 poly-As and so on.

Just my 2 cents

$\endgroup$
2
  • $\begingroup$ Can anybody give me some pointers on understanding that relationship tree? It looks like the current virus split from a bat virus. And that the same ancestral virus is also ancestral to a number of other viruses in both humans and bats. Is that correct? $\endgroup$
    – puppetsock
    Commented Jan 27, 2020 at 16:28
  • 2
    $\begingroup$ @puppetsock this is separate question and the rule of BioSE is one question per post, otherwise it will be moderated. If you post the question I will provide you with a detailed answer, with additional detail about the shared SARS receptor $\endgroup$
    – M__
    Commented Jan 27, 2020 at 17:05
21
$\begingroup$

Some of the other answers here seem quite good; at the same time I think the core answer to the OP's question is maybe a bit hard to tease out of them, so I'd like to try to state it more plainly. It's worth noting that a truly complete answer to this question seems to be beyond current research, but any kind of "Why?" is inevitably a hard or even impossible sort of question to answer fully in biology. We have some ideas about it though.

mRNA is used as a template for protein synthesis within a cell. A single mRNA is used repeatedly, but is eventually "used up" and taken apart. In eukaryotes, poly(A) tails are almost always found on mRNAs produced in the nucleus. The poly(A) tail is ultimately shortened during the transcription process, and this shortening contributes to the mRNA being degraded. (See here for more.)

Coronaviruses also have a poly(A) tail, similarly to eukaryote mRNA. The precise mechanical functions of this poly(A) tail and the means of its synthesis are objects of ongoing research, but research has shown that its presence greatly increases the degree to which Coronavirus RNA is replicated by the host cell. Research has also shown that longer tails increase replication compared to shorter tails. It's quite likely that the presence of the tail assists in recruiting the cell's protein synthesis machinery and allows the RNA to last longer within the host cell, just as it does in the cell's own mRNA.

Interestingly, the pattern in which Coronavirus poly(A) tail length is regulated during infection, in which it starts out shorter, gets longer, then gets much shorter, resembles poly(A) tail length regulation of mRNA during eukaryote embryogenesis, suggesting parallels (see the paper in the "longer tails" link for more on this as well). Longer poly(A) tail length is closely tied to greater translational efficiency in that context.

There has been some speculation in the comments as to whether or not the Coronavirus poly(A) tail resembles a NOP sled in computer programming. I think the resemblance is mostly coincidental. NOP sleds are used in exploits because a processor, encountering a NOP, moves to the next instruction without taking any other actions. A long chain of NOPs, if entered by the processor at any point within it, will lead it to the instructions at the "bottom," after the NOPs. This is advantageous to use if you can't get the processor to go exactly where you want but you know it will end up somewhere close by, because it increases your chances of having your payload executed.

It's unusual to see a lengthy NOP sled in legitimate code, to the point that people writing them usually have to disguise their function in order to avoid automatic detection. (see pg. 183 here) In contrast, a poly(A) tail is almost universally found on nuclear eukaryote mRNA (and on some mRNAs of almost all organisms to some capacity, even mitochondria). Furthermore, the functions of the poly(A) tail are complex enough that it's still an object of ongoing research decades after its initial discovery, whereas a NOP sled does one very mechanical thing. Since the environment inside a cell is so different from the environment of a processor interacting with memory, I think it's hard to make comparisons that are so granular as to deal with a specific set of machine instructions, at least in this kind of context—a processor is a very straightforward kind of machine compared to a cell.

$\endgroup$
6
  • 2
    $\begingroup$ Hi Zoe, you stated "the pattern in which Coronavirus poly(A) tail length is regulated during infection, in which it starts out shorter, gets longer, then gets much shorter..". The analogy you used was Drosophila development from egg to the maturation of the oocyte. Interesting idea. Could you cite the variation of poly-A for coronavirus please? The context isn't clear whether it is within a life-cycle within a cell, within an infection (patient) or between patients (transmission). I suspect it is an in vitro observation and represents the infection of an established cell line. $\endgroup$
    – M__
    Commented Jan 27, 2020 at 13:15
  • 2
    $\begingroup$ It's in this paper. Specifically: "In this study we report that the coronaviral 3’-terminal poly(A) tail length in total viral RNA, sgmRNA7, and DI RNA is relatively short (~26-45 nt) in infected cells at 0-2 hpi, increases to peak length (~65 nt) at ~6-10 hpi, and gradually decreases in size (~30-45 nt) after ~10 h of infection." They also draw the analogy to embryo development in their abstract. You're right about the context—it's in vitro using confluent human adenocarcinoma (HRT-18) cells. $\endgroup$ Commented Jan 27, 2020 at 13:52
  • 1
    $\begingroup$ Thanks its an in vitro time series over ~5 days, where the poly-A tail peaks in size quite early in the time-series, albeit the decay in poly-A size thereafter is IMO not a great analysis of their northern blot and would have been better analysed by a data scientist. The key message is clear the poly-A tail length is dynamic for Cov for the in vitro model and therefore possibly during the course of a single patient infection. $\endgroup$
    – M__
    Commented Jan 27, 2020 at 16:13
  • $\begingroup$ @ZoëSparks, I'm interested in your feedback regarding this comment, specifically 'Is it possible that this tail-of-As, as described here, is actually not a "tail" (end of data, last piece translated, etc), but a "head" instead?'. Let's interpret 'head' as "earlier, causality-wise", since molecular biology doesn't have any 'top' or 'bottom'. $\endgroup$
    – daveloyall
    Commented Jan 31, 2020 at 23:45
  • $\begingroup$ @daveloyall Well, in this case, there is a top and bottom in the sense of where translation starts and stops on the mRNA; it begins past the 5' cap and ends before the poly(A) tail, hence the terms (see this diagram for a nice illustration). As far as the tail being "caught or detected" to begin synthesis goes, it is part of what causes mRNA to be taken up, but not the only factor—the cap plays a significant role as well, for instance (see here for more). $\endgroup$ Commented Feb 1, 2020 at 0:22
13
$\begingroup$

Not an expert, but some searching on eukaryotic positive-strand RNA viruses seems to show that polyadenylation is not uncommon. For example, Steil, et al., 2010.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.