Question

Single Nucleotide Polymorphisms

1

Entering edit mode

11.7 years ago

nepgorkhey ▴ 130

Hi,

I am working with whole genome sequences of bacteria and want to identify SNPs from the genomes for each sequences. What would be the best way to identify these SNPs that will allow me to compare the variation between them.

Thanks a lot.

Sujan

snp • 3.1k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 11.7 years ago by nepgorkhey ▴ 130

score 2 · Answer 1 · 2014-02-14

2

Entering edit mode

11.7 years ago

GANI ▴ 230

You can use the NCBI dbSNP database to gather the SNP data for the bacterium of your interest http://www.ncbi.nlm.nih.gov/SNP/

The following is the FTP location of the SNP data corresponding to various organisms in this database. ftp://ftp.ncbi.nih.gov/snp/organisms/

You can first compute the variants in your sequence data with the Tools mentioned by Frederic Bigey. Then use the dbSNP data to identity any known and novel SNPs of the bacterium.

ADD COMMENT • link 11.7 years ago by GANI ▴ 230

0

Entering edit mode

I am working with Xanthomonas bacteria and the database has no SNP data.

ADD REPLY • link 11.7 years ago by nepgorkhey ▴ 130

score 2 · Answer 2 · 2014-02-14

2

Entering edit mode

11.7 years ago

Frédéric Bigey ▴ 320

First do you have a reference genome? If yes:

map reads to the ref. genome (using BAW...)
use a variant calling soft (using Samtools, GATK)

.

ADD COMMENT • link 11.7 years ago by Frédéric Bigey ▴ 320

0

Entering edit mode

What is BAW? I´m trying to use SMALT to map the reads but looking for another option...

ADD REPLY • link 11.0 years ago by Lucia • 0

score 1 · Answer 3 · 2014-02-14

If you are working with deep sequencing data, then I recommend following the short-read aligner suggestion described by Frederic Bigey.

If you are working with whole genome sequences (from NCBI Nucleotide, for example), I think base-by-base might be able to help you call the SNPs (which you can then interpret as GANI has suggested). I've used it to compare with viral genome sequences. Don't know what is the maximum allowed genome size, but you can check it out:

http://athena.bioc.uvic.ca/virology-ca-tools/base-by-base/