Question: Comparing the WG sequenced isolate against the other same genus isolates on the public database
0
gravatar for bioinforesearchquestions
7 days ago by
United States
bioinforesearchquestions250 wrote:

Hi

Our team recently sequenced a whole genome (WG) of bacterial isolate and performed denovo assembly of that bacterial isolate. Currently I am planning to do following things

1) Compare the WG sequenced bacterial isolate to other publicly available sequences of Escherichia or any other deposited sequences in NCBI. I am thinking of doing BLAST against the custom database. Creating the custom database of all Escherichia isolates and blasting it against my WG sequenced bacterial isolate. Am I correct with my approach?

2) I received another similar strain of WG sequenced bacterial isolate from our collaborator. I would like to compare our team's WG sequenced isolate (isolate A) with our collaborator's isolate B and identify SNPs between the two isolates. I have used GATK for SNP identification for human samples. But for bacterial sequences what methods/tools are used?

I am open for any other recommendations to the above requirement 1 and 2. Any suggestions.

ADD COMMENTlink modified 8 hours ago by bioguy40 • written 7 days ago by bioinforesearchquestions250
1

Have you looked at mauve (http://darlinglab.org/mauve/mauve.html ). You may also want to look at aligners meant to align chromosome size chunks like LAST or LASTZ. Blast is a local aligner and will not be totally appropriate here.

For #2 you could use callvariants.sh from BBMap suite after alignments to reference since you have simple haplid genomes because of bacteria.

ADD REPLYlink modified 1 day ago • written 1 day ago by genomax70k

Yes, I checked it, as per their website currently it is unsuitable for datasets with more than 50 bacterial genomes. But I suspect the task 1 might have more than 50 bacterial genomes.

For the task2, I need to compare isolate A and isolate B and identify variants. Do you mean I need to consider one of the isolate as the reference?

ADD REPLYlink written 8 hours ago by bioinforesearchquestions250

It is useful to include this type of information in original question (>50 genomes). If you are doing smaller comparisons then using mauve would still give you a birds-eye view of overall rearrangements across these strains (which should be largely similar).

If you want to compare SNP present in the strains you are going to need to use a particular strain as reference (which can be one in GenBank) and then compare the rest to that. You can then compare the VCF files.

Mash/sourmash mentioned below would also be good programs to try.

ADD REPLYlink modified 2 hours ago • written 2 hours ago by genomax70k
1
gravatar for bioguy
8 hours ago by
bioguy40
bioguy40 wrote:

If you're interested in simply comparing overall genome similarity, Mash (using minhash to compare genomic content) has become a gold standard of sorts for what you're describing (https://github.com/marbl/mash). Mauve also would be appropriate if you want information that an alignment could give you (i.e. genome rearrangement), but it will be less computationally efficient and perhaps overkill.

For SNP calling, you could alternatively use IGV, which calls SNPs by comparing to reference genomes for a given isolate (http://software.broadinstitute.org/software/igv/book/export/html/6).

Anvio also may be helpful in visualizing (or the underlying analysis within) your endeavors. http://merenlab.org/software/anvio/

ADD COMMENTlink written 8 hours ago by bioguy40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1446 users visited in the last hour