Question: How can I query NCBI SRA for purpose of strain typing/profiling?
4 months ago
I read that one can search SRA database to retrieve sequence data from it and make comparisons with one's own sequencing reads. So if I found a strain of pathogenic E. coli, and I want to know if it is a totally new strain, other than initially blasting and mapping to reference genomes, I can search SRA to see if anyone else might be working on it or retrieve related data for comparison.

What I understand is that SRA is an archive of NGS data from ongoing projects which includes raw reads and draft assemblies. But I don't know how to use it or if what I am thinking of using it for is correct.

  1. How do I compare my strain sequence read with those deposited in SRA, to see if there is a similar strain in there?

  2. How do I even choose a experiment set to blast my sequence(s) against to begin with? Do I have to do a literature search to find relevant papers for SRA Experiment set (SRX) first?

Appreciate any direction on this topic.

4 months ago
There are 18000+ assemblies for E. coli in NCBI's genome database so it is unlikely that you have a totally new strain. You will likely have a strain that has differences compared to something that is already there.

Take a look at this page to see how you can blast search against SRA. I suggest that you limit your search using taxID for E. coli, which is 562.

Thank you for the link to that page. I'll look through the different search strategies listed within.

