Ebola and Marburg Virus

Genomic Structure, Comparative and Molecular Biology


Provided by John Crowley (B.S.) and Ted Crusberg (Ph.D.) crusberg@wpi.edu. Dept. of Biology & Biotechnology, Worcester Polytechnic Institue, Worcester MA 01609

Ebola is a member of the negative-stranded RNA virus family Filoviridae. These filoviruses (Ebola, Marburg and Reston) are very similar in morphology, density and sodium dodecyl sulfate - polyacrylamide gel electrophoresis (SDS-PAGE) profile (Klenk, 1994). The particles are pleomorphic, meaning they can exist in many shapes.

Their basic structure is long and filamentious, essentially bacilliform, but the viruses often takes on a "U" shape, and the particles can be up to 14,000 nm in length and average 80 nm in diameter. The virus consists of a nucleocapsid, surrounded by a cross-striated helical capsid. There is an axial channel in the nucleocapsid, and the whole virion is surrounded by a lipoprotein unit derived from the host cell. In addition, there are 7 nm spikes placed 10 nm apart visible on the surface of the virion.

The genome consists of a single negative strand of RNA that is non-infectious itself, non-polyadenylated, with a linear arrangement of genes, with some occurrence of overlap. The order is:

Once inside the cell (mechanism not yet known) the virus transcribes its RNA and replicates in the cytoplasm of the infected cell. Replication is mediated by the synthesis of an antisense positive RNA strand what will serve as template for additional viral genomes. As the infection progresses the cytoplasm of the infected cell develops "prominent inclusion bodies" that contain the viral nucelocapsid, which will become highly structured. The virus then assembles, and buds off the host cell, attaining its lipoprotein coat from the infected cell's outer membrane.

The transcriptional start site was determined to be at base 54 (3'-UACUCCUUCUAAUU-). The stop site was identified due to its sequence homology with the Sendai virus polyadentlation (transcriptional signaling) site and its position after the open reading frame for the nuceloprotein gene: (3'-UAAUUCUUUUUU). The location of these sequences determined that there are long non-coding sequences within the nucleoprotein gene itself, 417 bp at the 5'-end and 341 bp at the 3'-end. The coding region begins with two AUG codons and ends with a UGA stop codon. The first protein is predicted to have 739 amino acids, and 83.3 KDa molecular weight, lower than that observed by PAGE.

A Kyte-Doolittle analysis of the predicted amino acid sequence yields a definitive hydrophobic N-terminal region, and a hydrophilic C-terminal end. Transcripts from the cloned gene run against natural viral mRNA on acid-urea-agarose gels were identical. Translated proteins from wild type and cloned genes were also identical on SDS-PAGE. Sanchez et al (1993) has published the sequence of the complete genome of Ebola virus and determined the gene order to be 3'-NP-VP35-VP40-GP-VP30-VP24-L.

Three areas of overlap occur in the genome, that average 18 bp in length. The first overlap is between the VP35 and VP40 genes, the second between GP and VP30 and the third between VP24 and L. These overlaps are limited to the conserved sequences determined for the transcriptional signals. In addition there are three non-coding sequences between VP30 and VP24. Except for the start site the L gene (RNA dependent RNA polymerase) has yet to be completely sequenced. For the Ebola (+) leader RNA sequence a potential stem-loop structure is possible, and may play a role in gene expression (perhaps by altering ribosome binding). For example the hairpin shape of the (+) leader strand may be conducive to nucleoprotein binding, and the subseqeuent conformational change may either promote replication of transcription. Norther blot analysis performed on all transcripts yielded appropriate length mRNAs except for the glycoprotein gene (GP). It was found that the GP gene produces both a full-length transcript and a shorter mRNA derived from an atypical transcriptional stop sequence located in the middle coding region of the gene. The protein produced by this transcript would appear to have no particular purpose, and may not be tolerated within infected cells.

Aligning the gene sequences of the Ebola and Marburg viruses along side one another and comparing protein products from the two viruses show similarities although no immunological cross reactivitiy exists:

  1. Ebola and Marburg genomes are both very large containing 3' and 5' non-coding regions.
  2. Both genomes contain overlaps that consist of the transcriptional start and stop signals, although Ebola has three and Marburg has only one. Their positions in relation to the intergenic regions also differ.
  3. The intergeneic regions are not conserved in either genome, and both contain one unusually lengthy region (>94 bp)
  4. All the polyadenylation sites contain (3'-UAAUU) excpet for the VP40 gene of the Marburg virus
  5. Both viruses produce mRNA that can form stem-loop structures
  6. Ebola GP gene produces two transcripts but Marburg produces only one

The Marburg virus does not contain the polyadenylation sequence that is found in the Ebola GP gene. The proteins of Ebola and Marburg are likewise similar. The C-terminal end of the GP proteins of both viruses are 80.9% homologous and are comparable to the same protein found in oncogeneic retroviruses. This may provide some insight into the pathogenicitity of this family of viruses, as synthetic peptides of this gene yield a highly compormised immune system, leading to an inhibition of blastogenesis of lymphhocytes, a reduced chemotactiic ability of monocytes and macrophages and inhibition of natural killer (NK) cells. Other similarities in the GP protein include a variable hydrophilic central region flanked by hydrophobic regions that contain most of the cycfteins used in disulfide bridges.

The sequences of the VP40 and VP24 proteins show that they are primarily hydrophobic, and may be membrane-associated. VP4>


Transfer interrupted!