dcyphr | Emergence of SARS-CoV-2 through recombination and strong purifying selection


Understanding the origin of SARS-CoV-2 is important for drug development, vaccine development, and future virus prevention. This study shows high sequence conservation around the receptor binding motif (RBM) in the Spike gene in human, bat, and pangolin coronaviruses. Understanding the recombination that led to SARS-CoV-2 could explain how new human coronaviruses emerge.


COVID-19 has spread since December 2019 and is now a global pandemic declared by the World Health Organization. SARS-CoV-2 is the virus that causes the disease COVID-19, and was identified as a betacoronavirus. SARS and MERS are both betacoronaviruses, but SARS-CoV-2 is most similar to the coronavirus RaTG13 found in a bat. However, other similar coronaviruses were found in Malasian pangolins called Pan_SL-CoV_GD and PAN_SL-Cov_GX. Recombination can lead to the evolution of viruses, and understanding these could help us understand and prevent other viral outbreaks. SARS and MERS had nearly identical sequences to viruses found in camels, but no such similar virus has been identified for SARS-CoV-2.

Materials and Methods

Genome sequences are from GenBank and GISAID for the sequence analysis. For recombination analysis, the researchers used SimPlot 3.5.15 and the LANL database tool RIP. For selection analysis, they used the LANL database tool SNAP. To do structure modeling of receptor binding, they used several softwares to generate the best model, rated by a confidence score.


Acquisition of receptor binding motif through recombination

43 complete genome sequences from 3 different clades were compared and RaTG13 is overall the most similar to SARS-CoV-2. The pangolin virus Pan_SL-CoV-GD is the next most similar, then Pan_SL-CoV-GX. The very first SARS-CoV-2 sequence identified was named Wuhan-Hu-1. The researchers compared Wuhan-Hu-1 to the bat viruses, SARS-CoV, and the pangolin sequences. Still, RaTG13 is the most similar to Wuhan-Hu-1. Using phylogeny, the researchers found a distinctive change of the genome due to recombination before and after the ACE2 binding site arose. This means that the bat and pangolin viruses probably had a recombination event in the development of SARS-CoV-2. Because there are certain pangolin virus genes that are very different from SARS-CoV-2, we can look to other animals to see if recombination had occurred in those sites. The S gene, for example, is very similar in pangolin virus and SARS-CoV-2. The S gene is how SARS-CoV-2 enters the human cell. SARS-CoV and SARS-CoV-2 both have very similar S genes so they can both enter human cells, but the RaTG13 does not have the S gene similarity. If RaTG13 had undergone recombination with a pangolin virus, that could have made a hybrid that could more likely infect humans.

Strong purifying selection among SARS-CoV-2 and closely related viruses

SARS-CoV-2, RATG13, and the pangolin viruses all had identical or nearly identical sites. The sites found before and after receptor binding motif RBM, and after the furin cleavage site. These sites are likely conserved because they allow for binding to ACE2 and allow the virus to actually fuse with the host’s cell membrane. Out of the hundreds of mutated SARS-CoV-2 sequences that are added to the database daily, only eight sequences in the database had a mutation in these sites.

Frequent recombination between SARS-CoVs and bat_SL-CoVs

A previous study suggested that SARS-CoV-2 was from many recombination events from several different bat coronaviruses. This is supported because small portions of the SARS-CoV-2 genome do match many different portions of different bat coronaviruses. Four significant breakpoints have been found, suggesting that what is now SARS-CoV-2 had gone through multiple recombination events. This study has shown that SARS-CoV-2 shares recombinant history with at least three different bat coronaviruses. Recombination may allow for transmission across species of SARS-CoV-2 by allowing it to acquire the human ACE2 binding site. The ORF8 gene is highly variable in many of the coronaviruses that were studied, so this location could be a site of recombination.


Three important aspects of betacoronaviruses should be carefully considered when creating a phylogenetic map. 1) a traditional phylogenetic map is difficult because there is a high amount of recombination between viruses. 2) distant virus relatives can acquire the same mutation, but this does not mean that the two distant viruses are closely related. This can make it difficult to distinguish from a random similar mutation in distant relatives, or if they are close relatives. 3) there are different selective pressures that can affect the recombination of different lineages.

The pangolin viruses seem too divergent to be closely related to SARS-CoV-2, but the similar RBM means it can most likely bind to human ACE2. RaTG13 is the most similar to SARS-CoV-2, but does not have a similar RBM. It is likely that RaTG13 had a recombination event with the pangolin virus to obtain the S gene is SARS-CoV-2, but there are other possibilities. There could be many mutations or unidentified viruses that we do not yet know about. Either way, recombination must occur for a virus to jump between species. Reducing direct human contact with wild animals will help prevent new zoonotic viruses in the future.