Researchers develop a website to identify SARS-CoV-2 regional clusters in real-time

Researchers develop a website to identify SARS-CoV-2 regional clusters in real-time

In a recent study posted to the medRxiv* pre-print server, a team of researchers developed a phylogenetics-based website to identify new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains quickly and efficiently in a region.

Study: Identifying SARS-CoV-2 regional introductions and transmission clusters in real time. Image Credit: Dotted Yeti/Shutterstock

In the absence of advanced phylogenetic and analytical tools, the SARS-CoV-2 global sequencing efforts have witnessed a setback. The existing methods for phylogenetic analysis could handle only small and static datasets. Also, they were computationally too expensive to identify clusters of closely related samples and the ever-expanding datasets of densely sampled pathogens, including SARS-CoV-2.

Even when results were available, these analyses were not readily interpretable for an efficient public health response due to a lack of intuitive visualization and data exploration tools. Overall, there is an unmet need for high-throughput tools that could mount an effective public health response by quickly interpreting the available data, letting public officers take a well-informed public health action.

About the study

The regional index (C) was the core of the phylogenetically informed summary heuristic developed for the study. It is a weighted summary of the composition of descendants of a node of a phylogenetic tree, roughly corresponding to the virus represented by that node was inside or outside a specific area.

When a descendent leaf is genetically identical to the internal node and is inside a specific region, C is equal to one, or else C was equal to zero. The researchers applied additional rules to handle cases where C was undefined. The index calculation is not applicable for leaf nodes, for which accurate geographic location metadata is not available.

Using this method, the researchers traced SARS-CoV-2 transmission clusters in 102 countries using the global parsimony phylogenetic tree, built from 5,563,847 available sequences of SARS-CoV-2 on GISAID, GenBank, and COG-UK25 on 28 November 2021. Cluster size, with ~20% of distinct regional clusters containing 89% of samples, appeared highly skewed, suggesting that novel viral introductions do not essentially lead to the establishment of a locally circulating new strain.


Over 50% of samples of the genome sequence repositories originated from the USA or the UK, substantially restricting the global transmission analysis, as the inference of a cluster’s origin is dependent on the robustness of sequencing at the origin. Therefore, the researchers focused on the US data, where sequencing across each state was relatively comprehensive and robust, and detailed state-level metadata was available for most samples.

As of November 2021, over 3,00,000 distinct state-level SAR-CoV-2 infection clusters were found in the USA from the beginning of the pandemic. Of these, 84% of clusters had an assigned origin, and 7% of clusters had an international origin, with the majority reflecting transmission within the USA. As expected, Mexico and Canada were among the most common international origin regions, given their long land borders. England was also relatively common because it is well-sampled. These findings suggested that sequencing effort in a given region creates a bias for accurately identifying the origin of new clusters.

The most significant achievement of this work was the development of Cluster-Tracker, an open-source, daily updated website. This website assisted the exploration and prioritization of the latest genome sequences from across the USA, quickly identifying the clusters most likely to be of interest for public health action. Any user could use this website and its flexible backend pipeline to construct a similar site for any set of regions (e.g. country-level), allowing people to explore SARS-CoV-2 phylogenetic data.


The open-source tools, methodologies, and software package described in the study could prove immensely useful for researchers worldwide. The researchers could draw inferences from vast sequence datasets quickly, explore the geographic structures to draw inferences in the context of the spread of SARS-CoV-2, even other densely sampled pathogens in specific areas within the global SARS-CoV-2 phylogeny. In addition, this analytical approach performed well on simulated data and was congruent with a more sophisticated analysis performed during the pandemic.

More importantly, the researchers presented an accessible open-source interactive interface for their results, which could automatically compute and display introductions and clusters with each update to the global phylogenetic tree.

To summarize, this work will empower public health officers to explore the spread of SARS-CoV-2 across the USA and even support public health groups globally to quickly understand and apply insights obtained from the most recent genomic data.

*Important notice

medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Jakob McBroome, Jennifer Martin, Adriano de Bernardi Schneider, Yatish Turakhia, Russell Corbett-Detig. (2022). Identifying SARS-CoV-2 regional introductions and transmission clusters in real time. medRxivdoi:

Posted in: Medical Research News | Medical Condition News | Disease/Infection News

Tags: Coronavirus, Coronavirus Disease COVID-19, Genome, Genomic, Pandemic, Phylogeny, Public Health, Respiratory, SARS, SARS-CoV-2, Severe Acute Respiratory, Severe Acute Respiratory Syndrome, Syndrome, Virus

Comments (0)

Written by

Neha Mathur

Neha Mathur has a Master’s degree in Biotechnology and extensive experience in digital marketing. She is passionate about reading and music. When she is not working, Neha likes to cook and travel.

Source: Read Full Article