Question

How to identify the strain in reads file if genus-specie is known?

0

Entering edit mode

4.7 years ago

muhammad_elhossary • 0

Hi folks,

I am relativly new to bioinformatics. I have got some reads files for analysis, and these reads (fastq) actually was for a short reads sequenced sample which contain a pool of multiple bactrial species (roughly 10). all together went to the sequencer. They all belong to the same biological family, but I have the list of Genus-Specie was in that sample. But I don't know the specific strains for each.

I need to accuratly identify the strain/substr of each specie, regardless of the computation power needed to do that. example: I know that their is an e.coli in the sample but i don't know which strain? is it K12 or O157 or something else?

How can I do that? what approach to go with? and by what tools?

Thanks Best regards

sequencing rna-seq genome next-gen alignment • 683 views

ADD COMMENT • link 4.7 years ago by muhammad_elhossary • 0

0

Entering edit mode

If you know the 10 genomes you are working with you can try bbsplit.sh to bin your reads into genome specific pools. bbsplit.sh is part of BBMap suite.

ADD REPLY • link 4.7 years ago by GenoMax 141k

0

Entering edit mode

Thanks for your fast reply, As I see (correct me if I am wrong), This solution can be a second step to split the reads pool by specie. If I did it as a first step to all the strains for each specie in the sample I will end up having a numerous fastq files and still don't know which is the best matching strain.

I need for example: I know that their is an e.coli in the sample but i don't know which strain? is it K12 or O157 or something else? this is only for one specie. Thanks again

ADD REPLY • link 4.7 years ago by muhammad_elhossary • 0

0

Entering edit mode

Strain level identification with short reads like illumina is always going to be a challenge. If you started this experiment with a defined set of genomes then it would be fine. If you did not know if particular strain was K12 or O157 to begin with, then the best you may be able to do is to say it is E. coli.

ADD REPLY • link 4.7 years ago by GenoMax 141k