Single Read Resulting In Heterozygous Snp Call In Samtools
1
1
Entering edit mode
10.7 years ago
danrdanny ▴ 70

Hi all,

I'm confused by a samtools behavior. Using the following code:

$ samtools mpileup -u -r X -f my.fasta my.bam | bcftools view -bvcg - > var.raw.bcf 
$ bcftools view var.raw.bcf | vcfutils.pl varFilter -D500 > my-snps.vcf

I get a list of SNPs and InDels. But, I end up with a few cases of samtools calling heterozygous SNPs when there may be 50 reads supporting one SNP (the ref SNP) and one read giving a different SNP. This strikes me as an obvious case in which samtools should ignore the one read.

I can't seem to figure out an easy way around this. Am I missing something simple? Any suggestions?

So, just to re-iterate: at location X I may have 50 reads that are C, and one read which gives T. My .vcf file reports C,T SNP here. I'd just like it to report C.

Thanks in advance.

samtools • 4.0k views
ADD COMMENT
2
Entering edit mode

Are you sure it is calling at heterozygous SNP? Take a look at the GT field in your file. It should look something like 0/0 0/1 or 1/1 corresponding to homozygous reference, heterozygous, and homozygous alternate. I believe that you can have an alternate allele observed but not actually a heterozygous call. Posting the actual data from your file would help.

ADD REPLY
0
Entering edit mode

Matt Shirley is right, probably not a het call.

ADD REPLY
1
Entering edit mode
10.7 years ago
BruceB ▴ 340

The ALT column (where you see C,T) reports all the possible alternative alleles that are present.

You need to look at the GT field which tells you whether they are heterozygous or not, e.g. a GT of 0/1 means it is heterozygous for the reference and the first alternative allele (in your case C). If you see 1/1, then it is homozygous for the first alternative allele (i.e. C/C).

ADD COMMENT
0
Entering edit mode

Ah, that makes sense. As an example, here's one of the SNPs I was interested in:

chrX 433121 . A G,C 141 . DP=73;VDB=0.0392;AF1=1;AC1=2;DP4=0,0,35,33;MQ=60;FQ=-229 GT:PL:GQ 1/1:174,202,0,171,181,168:99

The 1/1 in GT tells me to use the G from G,C. Which makes sense when looking at the raw reads.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 3120 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6