Question: Snp Number From Whole-Genome Sequencing
gravatar for Junfeng
9.4 years ago by
Junfeng330 wrote:


I have a whole-genome sequencing data from Illumina company. The SNPs number I called using samtools/pileup, samtools/mpileup and SNVMix is 3,752,858, 3,959,353, and 5,836,045 for tumor sample, respectively. The corresponding SNPs number for blood sample is 3,512,896, 3,686,117, and 5,456,739.

It have been said that samtools/mpileup is better than samtool/pileup (for single-sample SNP calling, they differ little), and SNVMix is suitable especially for cancer sequencing. So the SNPs number from SNVMix should be less than samtools/mpileup, and samtools/mpileup should be less than samtools/pileup.

Why here the number of SNPs from samtools/pileup < samtools/mpileup < SNVMix? Do the SNPs number have any problems here? Thanks.

next-gen samtools snp sequencing • 3.1k views
ADD COMMENTlink modified 9.1 years ago by Nina380 • written 9.4 years ago by Junfeng330

Try plotting the SNP call quality histograms. It may also help to take a quick look at Ti/Tv ratios. Reference: Ti/Tv Ratio Confirms Snp Discovery. Is This A General Rule?

ADD REPLYlink modified 11 months ago by RamRS28k • written 9.3 years ago by Pablo1.9k
gravatar for Nina
9.1 years ago by
Vancouver, BC, Canada
Nina380 wrote:

Generally snp callers work in two passes, where the first pass identifies every position with at least one mismatching base, and the second pass filters these results to generate the list of snps you think are "real".

I get the sense that you have only done the first pass.

You need to decide how to set the parameters for the second pass appropriately depending on how you want to balance sensitivity vs specificity.

SNVMix results are filtered with using -t to set the probability threshold. The resulting file only contains positions where p(bb)+p(ab)>=T for your specified value of T.

samtools pileup results are filtered with varFilter using a wide variety of filtering options including the phred scaled snp quality, depth, distance to the closest indel and more.

samtools mpileup results are filtered with bcftools/ varFilter which again uses a wide variety of filtering options.

ADD COMMENTlink modified 11 months ago by RamRS28k • written 9.1 years ago by Nina380
gravatar for Kevin
9.4 years ago by
Kevin640 wrote:

What is your sequencing coverage? did you do any quality filtering on the SNPs to result in the numbers posted here? generally I think it's a sensitivity vs specificity issue here.

ADD COMMENTlink written 9.4 years ago by Kevin640

this should be posted as a comment for the question.

ADD REPLYlink written 9.4 years ago by Doctoroots790

Thanks. The sequencing coverage is minimum 30X. I have do the quality filtering before calling SNPs such as remove repeats, duplications and the reads that fail platform/vendor quality check. Do you have any suggestions about how to decide the threshold about the sensitivity vs specificity issue here?

ADD REPLYlink written 9.4 years ago by Junfeng330
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1489 users visited in the last hour