Entering edit mode
6.1 years ago
AGE
▴
30
I would like to obtain a gvcf file for non-variant sites with a minimum coverage of 5x (homozygous reference alleles that I can at least minimally trust). The non-variant sites i am getting seem to be of low coverage. For example a non-variant site with DP=51 but DP4=1,0,0,0. The bam file was filtered with samtools for a minimum mapping quality of 20 (-q 20), but the discrepancy seems to be too high.
E.g.
Here is my output for homozygous ref sites
bcftools mpileup -f genomic.fa file.bam -a "AD,ADF,ADR,DP,SP,INFO/AD,INFO/ADF,INFO/ADR" | bcftools call --gvcf 5 -m > gvcf.vcf && awk '$5 == "." {print $0}' gvcf.vcf | grep -v "##" | shuf -n 25
NW_006532855.1 244614 . T . . . END=244634;MinDP=31 GT:DP 0/0:31
NW_006533971.1 93708 . A . 29.5864 . DP=21;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=. GT:DP:SP:ADF:ADR:AD ./.:0:0:0:0:0
NW_006536084.1 51107 . G . 67 . DP=1;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=60;MinDP=1 GT:DP:SP:ADF:ADR:AD 0/0:1:0:0:1:1
NW_006532401.1 390421 . T . 45.5868 . DP=2;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=60;MinDP=1 GT:DP:SP:ADF:ADR:AD 0/0:1:0:0:1:1
NW_006535812.1 52470 . A . . . END=52587;MinDP=55 GT:DP 0/0:55
NW_006536288.1 53 . T . 29.5864 . DP=11;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=. GT:DP:SP:ADF:ADR:AD ./.:0:0:0:0:0
NW_006532139.1 349467 . T . . . END=349484;MinDP=39 GT:DP 0/0:39
NW_006534925.1 53705 . T . 142 . DP=3;ADF=1;ADR=2;AD=3;MQSB=1;MQ0F=0;AN=2;DP4=1,2,0,0;MQ=60;MinDP=3 GT:DP:SP:ADF:ADR:AD 0/0:3:0:1:2:3
NW_006532792.1 285159 . A . . . END=285176;MinDP=53 GT:DP 0/0:53
NW_006535919.1 76893 . T . . . END=76931;MinDP=55 GT:DP 0/0:55
NW_006533107.1 102827 . A . 55 . DP=51;ADF=1;ADR=0;AD=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60;MinDP=1 GT:DP:SP:ADF:ADR:AD 0/0:1:0:1:0:1
NW_006535205.1 85235 . A . 29.5864 . DP=0;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=. GT:DP:SP:ADF:ADR:AD ./.:0:0:0:0:0
NW_006532576.1 44906 . T . . . END=44910;MinDP=8 GT:DP 0/0:8
NW_006535228.1 57467 . C . 90 . DP=2;ADF=1;ADR=0;AD=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60;MinDP=1 GT:DP:SP:ADF:ADR:AD 0/0:1:0:1:0:1
NW_006535321.1 94195 . G . . . END=94209;MinDP=64 GT:DP 0/0:64
NW_006532278.1 304697 . A . . . END=304714;MinDP=62 GT:DP 0/0:62
NW_006532169.1 456198 . C . 51 . DP=1;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=21;MinDP=1 GT:DP:SP:ADF:ADR:AD 0/0:1:0:0:1:1
NW_006533733.1 85892 . G . 29.5864 . DP=0;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=. GT:DP:SP:ADF:ADR:AD ./.:0:0:0:0:0
NW_006534011.1 208144 . A . . . END=208164;MinDP=68 GT:DP 0/0:68
NW_006535039.1 100639 . T . 66 . DP=48;ADF=1;ADR=0;AD=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60;MinDP=1 GT:DP:SP:ADF:ADR:AD 0/0:1:0:1:0:1
NW_006533379.1 65322 . A . . . END=65390;MinDP=30 GT:DP 0/0:30
NW_006533431.1 132332 . G . 29.5864 . DP=1;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=. GT:DP:SP:ADF:ADR:AD ./.:0:0:0:0:0
NW_006536654.1 66636 . C . . . END=66638;MinDP=15 GT:DP 0/0:15
NW_006532666.1 234796 . T . . . END=234798;MinDP=48 GT:DP 0/0:48
NW_006533913.1 124175 . A . 51 . DP=1;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=44;MinDP=1 GT:DP:SP:ADF:ADR:AD 0/0:1:0:0:1:1
The AD also does not go above 4. eg:
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=1" | wc -l
5689457
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=2" | wc -l
4036127
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=3" | wc -l
3455244
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=4" | wc -l
3082479
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=5" | wc -l
0
Is it the --gvcf 5 the reason why the coverage is low?
Hello,
a drop-off of the DP4 value is usually happens due to low mapping or base quality. If you are running
bcftools callwithout any additional parameter the default for mapping quality is 0 and for base quality is 13. So I would guess in your case you have bad base qualities at the given positions. Check it by viewing some with the igv browser.fin swimmer