Question: How To Make Samtools Mpileup Produce Genotype Information At Every Base?
4
gravatar for komal.rathi
5.4 years ago by
komal.rathi3.4k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

Hi everyone,

I have a bam file and a snps file for chr1 (hg19) which contains the chromosome number & position for a list of known snps.

$head chr1.snps

chr1 1211292
chr1 2069172
chr1 2069681
chr1 2513216
chr1 2553624

Now, I want to compute the genotype information (just like it is given in a vcf format) at all the positions given in my snps file, but I do not want to "call" any snps because these are already known variants. And I think VCF format is a good standard for getting genotype information so I tried using:

samtools mpileup -u -l chr1.snps -f hg19.fa sample1_filter_sort.bam | bcftools view - > test1.vcf

So here I feed my .snps file and .bam file to samtools mpileup and create an uncompressed .bcf file which I then pipe into bcftools to get a .vcf file. Here I got 245 entries in my .vcf file, so it found 245/513 snps. But as it is not calling any snps, the output is a bit weird:

$head test1.vcf

chr1    1211292    .    C    X    0    .    DP=0;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=0.000000,0.000000,0.000000,0.000000    PL    0,0,0
chr1    2069172    .    C    X    0    .    DP=0;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=0.000000,0.000000,0.000000,0.000000    PL    0,0,0
chr1    2069681    .    C    X    0    .    DP=0;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=0.000000,0.000000,0.000000,0.000000    PL    0,0,0
chr1    2513216    .    C    T,X    0    .    DP=1;I16=0,0,1,0,0,0,34,1156,0,0,39,1521,0,0,4,16;QS=0.000000,1.000000,0.000000,0.000000    PL    34,3,0,34,3,34
chr1    2553624    .    T    X    0    .    DP=0;I16=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;QS=-nan,0.000000,0.000000,0.000000    PL    0,0,0

In another test I use options to call the variants:

samtools mpileup -u -l chr1.snps -f hg19.fa sample1_filter_sort.bam | bcftools view -v -c -g -l chr1.snps - > test2.vcf

Here, I feed my .snps and .bam file to samtools mpileup and create an uncompressed .bcf file which I then pipe into bcftools to get a .vcf file but this time I am calling the variants using -vcg options. Here, I got 16 entries in my .vcf file so it "called" 16/513 snps. The output here is just like I want it to be:

 $head test2.vcf

chr1    2513216    .    C    T    5.46    .    DP=1;QS=0.000000,1.000000,0.000000,0.000000;AF1=1;AC1=2;DP4=0,0,1,0;MQ=39;FQ=-30    GT:PL:GQ    1/1:34,3,0:3
chr1    29519778    .    T    G    187    .    DP=8;QS=0.000000,1.000000,0.000000,0.000000;VDB=4.580661e-02;AF1=1;AC1=2;DP4=0,0,4,3;MQ=40;FQ=-48    GT:PL:GQ    1/1:220,21,0:39
chr1    41204569    .    C    T    222    .    DP=91;QS=0.000000,1.000000,0.000000,0.000000;VDB=4.337502e-01;AF1=1;AC1=2;DP4=0,0,47,34;MQ=40;FQ=-271    GT:PL:GQ    1/1:255,244,0:99
chr1    42880516    .    T    C    222    .    DP=11;QS=0.000000,1.000000,0.000000,0.000000;VDB=1.273245e-01;AF1=1;AC1=2;DP4=0,0,5,5;MQ=40;FQ=-57    GT:PL:GQ    1/1:255,30,0:57
chr1    55319902    .    A    G    222    .    DP=20;QS=0.000000,1.000000,0.000000,0.000000;VDB=1.803282e-01;AF1=1;AC1=2;DP4=0,0,12,7;MQ=40;FQ=-84    GT:PL:GQ    1/1:255,57,0:99

So, essentially you can see that there is no genotype information for the snps detected in test1 but there is genotype information in the last column of test2 (because we used variant calling). What I want is that the program should detect the 245 snps just like test1 (I dont want it to call it because I already have the list) but should report the output in a format like test2. How can that be done?

samtools bcftools mpileup • 6.6k views
ADD COMMENTlink modified 5.4 years ago by Istvan Albert ♦♦ 80k • written 5.4 years ago by komal.rathi3.4k
1
gravatar for Istvan Albert
5.4 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

The question is whether your bam file supports or not the SNPs that you expect to be there.

As it happens the answer is rarely that you find 100%. It all depends on coverage, alignment and the quality of sample.

ADD COMMENTlink written 5.4 years ago by Istvan Albert ♦♦ 80k

Thanks you are right, the question is that but I was also doubtful about my approach. So nothing is wrong with my approach of extracting genotype information for a list of snps from a bam file?

ADD REPLYlink written 5.4 years ago by komal.rathi3.4k

yes, it should work just fine, most of the time we call snps not even knowing where they are, if you knew them to be true getting the genotype is "easy"

ADD REPLYlink written 5.4 years ago by Istvan Albert ♦♦ 80k

I have edited my question so that it makes sense a bit

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by komal.rathi3.4k
1

I have changed the title of your post your question does not really depend on the existance of the SNP file.

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by Istvan Albert ♦♦ 80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1206 users visited in the last hour