Question: Average Snp Position For Reads With Predicted Snp
0
gravatar for miaochao1987
7.7 years ago by
miaochao19870 wrote:

Hi,

I have a human sequencing data, and reads have been mapped with bwa. Any suggestions on what is the best way to extract following statistics? Thanks

For each predicted SNP, I would like to extract information about 1) #the number of reads with predicted SNP on + and - strand respectively 2) #average SNP location for reads with predicted SNP

calling snp • 2.1k views
ADD COMMENTlink written 7.7 years ago by miaochao19870

What do you mean with predicted SNP? Maybe you mean called (single nucleotide) variant (also called de novo SNPs). People talk about SNPs when a base is polymorphic in a few percent of a population. When you have just aligned your reads to a reference genome you end up with an alignment, not "predicted SNPs". Maybe you are looking for a starting point for detecting de novo SNPs and INDELS with NGS data

ADD REPLYlink written 7.7 years ago by Irsan7.2k

yes, "called variant" would be more accurate. thanks for noticing.

ADD REPLYlink written 7.7 years ago by miaochao19870
1

Ok, so you are looking for snp detection. One of the commonly used approaches is to do variant-calling on the alignment-file (probably .bam, if you have sam convert it to bam for storage purposes) with samtools mpileup. Then convert mpileup output with bcftools to vcf. You can also useGATK for variant calling on, it will also give you vcf as output. Then use summarize_annovar.pl from the annovar package to annotate the detected variants to get the impact of the the variant on amino acid, estimation whether variant is disease-causing, filter variants for those who are in dbSNP, ESP, 1000g ... You might also want to start reading some literature about variant calling with NGS data variant calling

ADD REPLYlink modified 7.7 years ago • written 7.7 years ago by Irsan7.2k
0
gravatar for Josh Herr
7.7 years ago by
Josh Herr5.7k
University of Nebraska
Josh Herr5.7k wrote:

I use Stacks for measuring SNP data across haplotypes using the pstacks script. I've only used Bowtie to map my sequencing data from within the Stacks pipeline, but I'm fairly certain Stacks can take BWA mapped reads.

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Josh Herr5.7k
0
gravatar for ernfrid
7.7 years ago by
ernfrid390
Saint Louis
ernfrid390 wrote:

Depending on your variant caller, you may have this information available with your SNP calls. For samtools' mpileup + bcftools view commands, for instance, values close to what you want are reported in the I16 tag of the VCF output (http://samtools.sourceforge.net/mpileup.shtml) with some minor calculations. Specifically, the sum of the tail distances is reported and dividing by the number of reads would give you the average distance. The number of non-reference reads on each strand is also reported (I realize this is not exactly what you are looking for).

That said, we use the bam-readcount program (fair disclosure, I am one of the authors) to calculate these values for lists of SNPs and use the downstream results for filtering. For each position you request, it will report various summary metrics for each base as a colon delimited list. If you know your SNP base then you can grab the appropriate values. It is not well documented on the web yet, but if you clone and compile it (from https://github.com/genome/bam-readcount) you can get usage by running it with no options.

The output format is as so:

chrpositionreferencebasebase:count:avgmappingquality:avgbasequality:avgsemappingquality:numplusstrand:numminusstrand:avgposasfraction:avgnummismatchesasfraction:avgsummismatchqualitiest:numq2containingreads:avgdistancetoq2startinq2reads:avgclippedlength:avgdistancetoeffective3pend...

An example of some output (at a homozygous variant position):

$ bam-readcount -q 30 -b 7 -f reference.fa some.bam 19:301238-301238 19301238C25=:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00T:25:53.56:35.28:36.44:14:11:0.40:0.01:35.88:17:0.45:97.00:0.48N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00

ADD COMMENTlink written 7.7 years ago by ernfrid390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1020 users visited in the last hour