Question: Filtering for homozygous reference alleles
0
gravatar for AGE
14 days ago by
AGE20
AGE20 wrote:

I would like to obtain a gvcf file for non-variant sites with a minimum coverage of 5x (homozygous reference alleles that I can at least minimally trust). The non-variant sites i am getting seem to be of low coverage. For example a non-variant site with DP=51 but DP4=1,0,0,0. The bam file was filtered with samtools for a minimum mapping quality of 20 (-q 20), but the discrepancy seems to be too high. E.g. Here is my output for homozygous ref sites

 bcftools  mpileup -f genomic.fa file.bam -a "AD,ADF,ADR,DP,SP,INFO/AD,INFO/ADF,INFO/ADR" | bcftools call --gvcf 5 -m > gvcf.vcf && awk '$5 == "." {print $0}' gvcf.vcf | grep -v "##" | shuf -n 25

    NW_006532855.1  244614  .       T       .       .       .       END=244634;MinDP=31     GT:DP   0/0:31
    NW_006533971.1  93708   .       A       .       29.5864 .       DP=21;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.     GT:DP:SP:ADF:ADR:AD     ./.:0:0:0:0:0
    NW_006536084.1  51107   .       G       .       67      .       DP=1;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=60;MinDP=1     GT:DP:SP:ADF:ADR:AD     0/0:1:0:0:1:1
    NW_006532401.1  390421  .       T       .       45.5868 .       DP=2;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=60;MinDP=1     GT:DP:SP:ADF:ADR:AD     0/0:1:0:0:1:1
    NW_006535812.1  52470   .       A       .       .       .       END=52587;MinDP=55      GT:DP   0/0:55
    NW_006536288.1  53      .       T       .       29.5864 .       DP=11;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.     GT:DP:SP:ADF:ADR:AD     ./.:0:0:0:0:0
    NW_006532139.1  349467  .       T       .       .       .       END=349484;MinDP=39     GT:DP   0/0:39
    NW_006534925.1  53705   .       T       .       142     .       DP=3;ADF=1;ADR=2;AD=3;MQSB=1;MQ0F=0;AN=2;DP4=1,2,0,0;MQ=60;MinDP=3      GT:DP:SP:ADF:ADR:AD     0/0:3:0:1:2:3
    NW_006532792.1  285159  .       A       .       .       .       END=285176;MinDP=53     GT:DP   0/0:53
    NW_006535919.1  76893   .       T       .       .       .       END=76931;MinDP=55      GT:DP   0/0:55
    NW_006533107.1  102827  .       A       .       55      .       DP=51;ADF=1;ADR=0;AD=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60;MinDP=1    GT:DP:SP:ADF:ADR:AD     0/0:1:0:1:0:1
    NW_006535205.1  85235   .       A       .       29.5864 .       DP=0;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.      GT:DP:SP:ADF:ADR:AD     ./.:0:0:0:0:0
    NW_006532576.1  44906   .       T       .       .       .       END=44910;MinDP=8       GT:DP   0/0:8
    NW_006535228.1  57467   .       C       .       90      .       DP=2;ADF=1;ADR=0;AD=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60;MinDP=1     GT:DP:SP:ADF:ADR:AD     0/0:1:0:1:0:1
    NW_006535321.1  94195   .       G       .       .       .       END=94209;MinDP=64      GT:DP   0/0:64
    NW_006532278.1  304697  .       A       .       .       .       END=304714;MinDP=62     GT:DP   0/0:62
    NW_006532169.1  456198  .       C       .       51      .       DP=1;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=21;MinDP=1     GT:DP:SP:ADF:ADR:AD     0/0:1:0:0:1:1
    NW_006533733.1  85892   .       G       .       29.5864 .       DP=0;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.      GT:DP:SP:ADF:ADR:AD     ./.:0:0:0:0:0
    NW_006534011.1  208144  .       A       .       .       .       END=208164;MinDP=68     GT:DP   0/0:68
    NW_006535039.1  100639  .       T       .       66      .       DP=48;ADF=1;ADR=0;AD=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60;MinDP=1    GT:DP:SP:ADF:ADR:AD     0/0:1:0:1:0:1
    NW_006533379.1  65322   .       A       .       .       .       END=65390;MinDP=30      GT:DP   0/0:30
    NW_006533431.1  132332  .       G       .       29.5864 .       DP=1;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.      GT:DP:SP:ADF:ADR:AD     ./.:0:0:0:0:0
    NW_006536654.1  66636   .       C       .       .       .       END=66638;MinDP=15      GT:DP   0/0:15
    NW_006532666.1  234796  .       T       .       .       .       END=234798;MinDP=48     GT:DP   0/0:48
    NW_006533913.1  124175  .       A       .       51      .       DP=1;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=44;MinDP=1     GT:DP:SP:ADF:ADR:AD     0/0:1:0:0:1:1

The AD also does not go above 4. eg:

awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=1" | wc -l
5689457
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=2" | wc -l
4036127
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=3" | wc -l
3455244
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=4" | wc -l
3082479
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=5" | wc -l
0

Is it the --gvcf 5 the reason why the coverage is low?

ADD COMMENTlink modified 11 days ago by Biostar ♦♦ 20 • written 14 days ago by AGE20

Hello,

a drop-off of the DP4 value is usually happens due to low mapping or base quality. If you are running bcftools call without any additional parameter the default for mapping quality is 0 and for base quality is 13. So I would guess in your case you have bad base qualities at the given positions. Check it by viewing some with the igv browser.

fin swimmer

ADD REPLYlink written 11 days ago by finswimmer12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 863 users visited in the last hour