|Virus Origins Lite: see this article in Scientific American||Virus Origins Serious: see this article in ViroBlogy|
The probably multiple origins of viruses are lost in a sea of conjecture and speculation, which results mostly from their nature: no-one has ever detected a fossil virus as a particle; they are too small and probably too fragile to have withstood the kinds of processes that led to fossilisation, or even to preservation of short stretches of nucleic acid sequences in leaf tissues or insects in amber.
As a result, we are limited to studying viruses that are isolated in the present, or from material that is at most a few decades old. The new science (or art) of virus molecular systematics is, however, shedding a great deal of light on the distant relationships of, and in some cases on the presumed origins of, many important groups of viruses. This is as a result of the sequencing of all or part of the genomes of representatives of many of the known varieties of viruses, including the largest (pox- and herpesviruses) and the smallest (gemini- and other ssDNA viruses). If viral genomes are compared with each other and with cellular sequences, presumed patterns of evolution / divergence of the genomes can be reconstructed.
Geminiviruses, for example, are a diverse group of viruses - with different genera having different numbers of genes and genome components - that presumably have a common origin - and one that may be traceable back to beyond 200 Myr BP, if one takes into account geographical diversity, and genetic divergence of vectors and of plant hosts (see Rybicki, 1994). Divergence of these viruses is illustrated here.
Potyviruses are also a putatively ancient family of viruses, with genera that have different numbers of genome components and whose gene pool is not shared by all members: a representative coat protein tree is shown here. Bymoviruses have two genome components and are fungus-transmitted; all the rest have a single genome component. Potyviruses are aphid-transmitted; rymo- and triticiviruses are mite-transmitted, and ipomoviruses are whitefly-transmitted. They are supposed to descend from a fungal virus, probably over 200 Myr ago (AJ Gibbs, pers. comm.).
If one were to go far back into evolutionary time, a case could be made for descent from a single ancestor of at least the replicase-associated functions of all viruses with positive-sense and negative-sense single-strand RNA genomes; likewise, large DNA viruses like pox- and herpesviruses and phycodnaviruses could be presumed to have "degenerated" (if one believes viruses to be degenerate organisms, which I for one do not...) from cellular organisms, given that their enzymes share more sequence similarity with sequences from cells than with other viruses or anything else.
Retroviruses, pararetroviruses, retrotransposons and retroposons all probably share a common origin of the reverse transcription function, which in turn may be a living relic of the enzyme that enabled the switch from a presumably RNA-based genetics to DNA-based heredity.
The main message to come from "deep" phylogenetic studies is that early virus evolution was almost certainly modular: that is, that certain "core modules" that proved to be successful - like the retrovirus pol gene, and the picornavirus-like protease-Vpg-polymerase module - appear in a number of different contexts. Thus, certain animal viruses - like picornaviruses and alphaviruses - have relatives among plant viruses which do not necessarily share the same morphology, number of genome components, or even genome organisation or number of genes.
For example, small spherical picornaviruses (ssRNA, 1 component, infect animals) are related to comoviruses (small spherical, 2 component, plant) and Potyviridae (filamentous, 1 or two genome components, plant), as part of a "Picorna-like Supergroup". Similarly, Sindbis (ssRNA, enveloped spherical, single RNA, animals) virus is related to Bromoviridae (naked spherical, 3 component, plant) and Tobamoviruses (naked rod, 1 component, plant), in a "Sindbis (or Alpha-) virus-like Supergroup". This is covered well here. A diagram illustrating one way of looking at this phenomenon is shown here: it shows overlaid phylogenetic relationships of different components of some well-known viruses, illustrating how different components of a group of viruses may have different "gene trees". In this case, it shows how individual viruses in two different "supergroups" of viruses - defined in terms of polymerase affinities - have different gene trees when capsid proteins and polymerase relationships are taken into account. Thus, for ss(+)RNA viruses at least, going back beyond a certain level results in a severe blurring of perceived relationships, as well-defined families of viruses appear to share essential core components with other well-defined families / orders (see also here), while having nothing else in common.
It is very quickly apparent from sequence studies that there can have been no single origin of viruses as organisms. For instance, there is no obvious way one can relate viruses of the size and complexity of the Poxviridae [double-stranded linear DNA,130-375 kb, 150-300 genes] with viruses like the tobamoviruses [ss linear RNA, 6-7 kb, 4 genes], or either of these with the Geminiviridae [ss circular DNA, 2.7 - 5.4 kb, 3-7 genes]. Thus, there can be no simple "family tree" for viruses; rather, their evolutionary descent must resemble a number of scattered "bushes". Viruses as a class of organism must be therefore be considered to be polyphyletic in origins: that is, having a number of independent origins, almost certainly at different times, usually from cellular organisms.
What they have in common is a role as the ultimate "stripped-down" parasites:
organisms which can only undergo a life cycle inside the cells of a host organism, using at the very least the metabolic enzymes and pathways and ribosomes of that host to produce virion components which get assembled into infectious particles.
One major assumption has to be made, if one is ever to make any sense of the sorts of relationships emerging from molecular phylogenetic studies. This is:
THAT VIRUSES CO-EVOLVE WITH THEIR HOSTS, LIKE ANY GOOD PARASITE.
Fortunately, there appears to be quite a lot of justification for it, especially from studies of viruses such as papillomaviruses, endogenous retrovirus-like sequences in animal genomes, and herpesviruses. For example, the divergence of primates and of birds related to chickens have been traced by comparing the types and sequences of retroviral-derived sequences in their genomes. It has also been repeatedly shown that the closest relatives of human papillomavirus types infecting particular tissue types (eg: cutaneous wart types, genital mucosal types) are those viruses infecting similar tissue types in other primates, indicating that these tissue preferences were well established before the divergence of humanoid apes from the primate line.
It is quite useful here to consider the timeline of evolution of life from its beginnings in water, as well as the timeline of colonisation of dry land by organisms, seeing as our knowledge of viruses is limited very largely to ones infecting terrestrial organisms. This process went much as follows (see also here):
Viruses of nearly all the major classes of organisms - animals, plants, fungi and bacteria / archaea - probably evolved with their hosts in the seas, given that most of the evolution of life on this planet has occurred there. This means that viruses also probably emerged from the waters with their different hosts, during the successive waves of colonisation of the terrestrial environment. Thus,
This would explain why viruses in the different broad types of hosts are generally so different: they have had aeons to adapt to each "life niche" since divergence from the respective common ancestor. Thus, bacteria / archaea and eukaryotes share no virus types, as they have been diverged so long. However, there are some marked similarities between viruses infecting plants and those infecting vertebrates (see here), and even more between those infecting arthropods and those in vertebrates. A useful reference point for these sorts of comparisons is the Tree of Life Home Page; in particular their 16S ribosomal RNA gene tree, or alternatively the simpler tree here. Thus, as it can be seen that as Metazoa (all animals) are far more similar to one another than they are to Viridaeplantae (all green plants), it is reasonable to assume that their viruses would be more similar to one another as well. It is also instructive to note that the Tree of Life includes no virus trees / bushes yet...!
A complicating factor in the picture of viruses co-evolving with their hosts over millennia is the fact that viruses apparently can - and obviously do - make big jumps in hosts every now and then. It seems obvious, for example, that arthropods are almost certainly the original source for a number of virus families infecting insects and mammals - such as the Flaviviridae - and probably also of viruses infecting insects and other animals and plants - such as the Rhabdoviridae and Reoviridae - as well (see also here). For example, picornaviruses of mammals are very similar structurally and genetically to a large number of small RNA viruses of insects and to at least two plant viruses, and - as the insect viruses are more diverse than the mammalian viruses - probably had their origin in some insect that adapted to feed on mammals (or plants) at some distant point in evolutionary time.
Clues as to how this happens are given by the fact that flock house virus - a small naked isometric ssRNA virus of insects - can replicate in plants when introduced into leaves, but only at the site of infection. This gives it a survival advantage over other similar viruses which cannot replicate in plants, as these are now no longer a passive, but an active reservoir for the re-infection of insects. For example, leafhopper A virus and Rhopalosiphum padi [aphid] virus are both stable enough and are injected into plants at sufficient concentration by their hosts, to make the plants into "circulative non-propagative" vectors of the viruses. The addition of a movement function - a "module" peculiar to plant viruses - would make flock house virus into a fully-fledged plant virus as well; loss of ability to infect insects would then limit it to plants.
It is obvious that a number of such cross-overs have occurred: there are reoviruses which only infect plants; flaviviruses which only infect humans; picornaviruses which only infect animals or plants, and so on. An interesting "emerging virus" is tomato spotted wilt virus - genus Tospovirus, family Bunyaviridae - which is postulated to have relatively recently acquired the ability to infect plants as well as its thrips vector insect, as well as obtaining a movement function, and is now a major pathogen of a wide range of plant species.
Another corollary of considering cellular organismal evolution as a marker of virus evolution is that the terrestrial virus gene pool must constitute only a subset of the virus lineages currently present in the oceans: if all of the major lines of terrestrial organisms are descended from subsets of parallel-developing lineages in the oceans, then the terrestrial ancestral gene pool is smaller than the oceanic. Assuming virus diversity parallels host diversity, therefore the ancestral terrestrial virus gene pools are also smaller than their oceanic equivalents.
Recent findings that an astonishing diversity (as well as quantity) of viruses exists in seawater may support this conclusion: it has been proposed that viruses may well be one of the prime limiting factors to microbial and planktonic growth in seawater, given each millilitre contains so many viral particles. The finding that only about 10% of bacteria found in seawater (as represented by clonal populations of PCR-generated 16S ribosomal RNA gene fragments) actually resemble any known taxon of bacteria, also bolsters the view that we may only have uncovered a small proportion of the viruses on this planet. If one considers that green plants represent only a small proportion of the eukaryotic algal gene pool, and that we know of hundreds of terrestrial plant viruses, whereas there is only one properly characterised group of viruses from algae (Phycodnaviridae), then the potential diversity still to be uncovered must be huge.
See below for further discussion on this topic.
Copyright Ed Rybicki, November 1997, October 2000, April 2008