Hi,
I am working with whole genome sequences of bacteria and want to identify SNPs from the genomes for each sequences. What would be the best way to identify these SNPs that will allow me to compare the variation between them.
Thanks a lot.
Sujan
Hi,
I am working with whole genome sequences of bacteria and want to identify SNPs from the genomes for each sequences. What would be the best way to identify these SNPs that will allow me to compare the variation between them.
Thanks a lot.
Sujan
You can use the NCBI dbSNP database to gather the SNP data for the bacterium of your interest http://www.ncbi.nlm.nih.gov/SNP/
The following is the FTP location of the SNP data corresponding to various organisms in this database. ftp://ftp.ncbi.nih.gov/snp/organisms/
You can first compute the variants in your sequence data with the Tools mentioned by Frederic Bigey. Then use the dbSNP data to identity any known and novel SNPs of the bacterium.
First do you have a reference genome? If yes:
.
If you are working with deep sequencing data, then I recommend following the short-read aligner suggestion described by Frederic Bigey.
If you are working with whole genome sequences (from NCBI Nucleotide, for example), I think base-by-base might be able to help you call the SNPs (which you can then interpret as GANI has suggested). I've used it to compare with viral genome sequences. Don't know what is the maximum allowed genome size, but you can check it out:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I am working with Xanthomonas bacteria and the database has no SNP data.