Validating uniqueness of NCBI genomes
2
0
Entering edit mode
2.3 years ago
Afreen • 0

Hi- I have 133 strains of L. plantarum strain sequences (from NCBI database), and for my work, I want to double check that these are 133 "unique" strains as opposed to same sequences submitted by different groups. What would be the best approach for doing this kind of cross-check, can anyone help?

genomes strains • 911 views
ADD COMMENT
1
Entering edit mode
2.3 years ago
GenoMax 154k

This may be a futile exercise to an extent. The "completeness" of these strains in unlikely to be equal so it would likely be difficult to determine with certainty if the strains are unique.

That said you can try @Mensur's suggestion for potential programs here: Measuring sequence similarity between draft genomes

ADD COMMENT
1
Entering edit mode
2.3 years ago
Mensur Dlakic ★ 30k

A quick test for general sequence identity can be done using the links provided by GenoMax in the earlier post. Yet if you are looking to eliminate 100% base-for-base identical sequences, I don't think it would work. If the two labs sequenced the same strain, I don't think they would get a 100% base-for-base sequence identity. This is because of random sequencing errors and assembly limitations. Even if one group sequenced the same strain twice a month apart, I doubt they would get completely identical sequence.

I think you need to adopt some kind of a threshold (say 99.5%) and declare the strains identical if they are above that threshold and within +/- 10 bp in genome sizes. You may want to try the programs listed below for whole genome alignments after using FastANI and pyani to calculate global genome identity.

ADD COMMENT

Login before adding your answer.

Traffic: 3458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6