Closed:What would be the most plausible explanation as to why USA WA has more SARS-CoV-2 diversity than any other place on Earth?
0
1
Entering edit mode
4.1 years ago
5heikki 11k

My methods briefly:

  1. Download all the complete genomes from gisaid
  2. Remove trailing N's
  3. Remove genomes with more than 10 N's post step 2
  4. Remove a few crap quality outlier genomes and all the pangolin genomes and the bat genome
  5. Build a distance matrix from Mash distances (k17, s 5000) ..s gets exhausted. Since all the genomes have the same orientation, I'm using the -n option here
  6. Cluster with AP (I'm adjusting q here to reduce the number of clusters: sm_ap <- apcluster(negDistMat(r=2),sm,q=0.05))

So there are 570 genomes here, grouped into 12 clusters. Somehow 8/12 clusters have genomes from USA WA (49 genomes altogether in my dataset). I'm also a little baffled how that Australia NSW lineage is so different to everything else..

Here's a PDF if someone wants to Ctrl F something..

570 genomes

Edit. Here is one without the NSW lineage. Its sister lineage (maybe originally from Shandong?) is still a big outlier and has members from multiple countries.This is the main reason why I believe that the NSW lineage is real as well. This one has 13 clusters instead of 12 in OP

covid-19 SARS-CoV-2 • 340 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6