Snp Number From Whole-Genome Sequencing
2
4
Entering edit mode
10.1 years ago
Junfeng ▴ 330

Hi,

I have a whole-genome sequencing data from Illumina company. The SNPs number I called using samtools/pileup, samtools/mpileup and SNVMix is 3,752,858, 3,959,353, and 5,836,045 for tumor sample, respectively. The corresponding SNPs number for blood sample is 3,512,896, 3,686,117, and 5,456,739.

It have been said that samtools/mpileup is better than samtool/pileup (for single-sample SNP calling, they differ little), and SNVMix is suitable especially for cancer sequencing. So the SNPs number from SNVMix should be less than samtools/mpileup, and samtools/mpileup should be less than samtools/pileup.

Why here the number of SNPs from samtools/pileup < samtools/mpileup < SNVMix? Do the SNPs number have any problems here? Thanks.

snp samtools next-gen sequencing snp • 3.2k views
0
Entering edit mode

Try plotting the SNP call quality histograms. It may also help to take a quick look at Ti/Tv ratios. Reference: Ti/Tv Ratio Confirms Snp Discovery. Is This A General Rule?

2
Entering edit mode
9.8 years ago
Nina ▴ 380

Generally snp callers work in two passes, where the first pass identifies every position with at least one mismatching base, and the second pass filters these results to generate the list of snps you think are "real".

I get the sense that you have only done the first pass.

You need to decide how to set the parameters for the second pass appropriately depending on how you want to balance sensitivity vs specificity.

SNVMix results are filtered with snvmix2summary.pl using -t to set the probability threshold. The resulting file only contains positions where p(bb)+p(ab)>=T for your specified value of T.

samtools pileup results are filtered with samtools.pl varFilter using a wide variety of filtering options including the phred scaled snp quality, depth, distance to the closest indel and more.

samtools mpileup results are filtered with bcftools/vcfutils.pl varFilter which again uses a wide variety of filtering options.

1
Entering edit mode
10.1 years ago
Kevin ▴ 640

What is your sequencing coverage? did you do any quality filtering on the SNPs to result in the numbers posted here? generally I think it's a sensitivity vs specificity issue here.

2
Entering edit mode

this should be posted as a comment for the question.

0
Entering edit mode

Thanks. The sequencing coverage is minimum 30X. I have do the quality filtering before calling SNPs such as remove repeats, duplications and the reads that fail platform/vendor quality check. Do you have any suggestions about how to decide the threshold about the sensitivity vs specificity issue here?