dcyphr | Phylogenetic network analysis of SARS-CoV-2 genomes


Researchers conducted a phylogenetic analysis of 160 human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes. They found three major variants of the virus: A, B, and C. The A variant represents the ancestral type with the bat coronavirus as the outgroup. The B variant is common in East Asia. But, this variant often mutates when it leaves East Asia. Thus, genetic variation may be lost for the B type of virus outside of Asia. There may also be immunological or environmental resistance. The variants A and C are most common among Europeans and Americans. Phylogenetic networks can be used to trace the path of infections and identify undocumented sources. This can lead to the prevention of infections.


Researchers aim to demonstrate how phylogenetic network analysis can be used for studying evolution and the ancestral genome of viruses.


The researchers used phylogenetics. They studied the coronavirus genomes from humans, pangolins, and bats. Laboratories and reliable sequencing programs confirmed the mutations.


Researchers have used phylogenetic networks to construct hypotheses for prehistoric populations. They use phylogenetic networks to study virus evolution. Using the GISAID database, they created a phylogenetic network of SARS-CoV-2 (Figure 1). Bat coronavirus was the outgroup for this network. Bat coronavirus is 96.2%% similar to the human coronavirus.

The variant A has two subclusters: the T-allele subcluster and the C-allele subcluster. The subclusters are differentiated by the mutation T29095C. Nearly half of the C-allele subcluster types are found outside of Asia.

74 out of 93 type B genomes were found in Asia. Type B differs from A due to the mutations T8782C and C28144T. The C28144T mutation is nonsynonymous, meaning that there is an amino acid change due to the mutation. Specifically, a serine replaced a leucine. The ancestral B genome is found only in East Asians. All 19 genomes found outside of Asia have evolved mutations. The derived types have a long mutation branch. The reason does not appear to be the time lag in spread nor mutations before the spread. Possibilities include a founder scenario, or the virus had to develop some resistance outside East Asia. In relation to type B, type C has the mutation G26144T. A valine replaces a glycine. Type C is mainly found in Europeans.

Researchers can use phylogenetics to trace the route of infection. For example, type C is found in both Brazil and Italy with a mutational link. This is due to a Brazilian who contracted the virus in Italy. The phylogenetic network represents early cases without considering complications due to migration and mutations.

There are still questions if researchers should use the oldest genome as the root. The patterns in the phylogenetic network are due to migrations, founder effects, and sample size. The different variants perhaps may cause different clinical manifestations. Phylogenetics may be used to understand the epidemiology, spread, and prevention of the disease. The networks may help especially in developing treatments and vaccines.