3.3 years ago by
Bam-readcount is ideal for this - https://github.com/genome/bam-readcount. Apart from raw read depth of every base present it gives you many useful information, inluding average quality mapping, average base quality etc. Its output is also used for variant filtering in fpfilter script - https://github.com/ckandoth/variant-filter.
In order to obtain information for every alleles present, you'll have to provide script with reference file in fasta format. The command looks like this:
bam-readcount -q1 -l sites_list -f reference.fasta your_file.bam > your_file.readcount
-q1 option means minimum mapping quality of 1.
Sites list is supposed to look like this (tab delimited):
chr1 56777 56777.
Creators of the fpfilter script mentioned above provided a nice one-liner to convert your vcf file into sites_list file:
perl -ane 'print join("\t",@F[0,1,1])."\n" unless(m/^#/)' your.vcf > sites_list
I tested it on human and mice data and it worked fine in both cases. Sample output from mice data looks like this:
chr1 8816113 T 146 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 A:3:4.00:26.00:4.00:0:3:0.71:0.01:26.00:0:0.00:121.33:0.65 C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 T:143:63.80:24.38:63.80:79:64:0.57:0.01:21.76:79:0.62:120.83:0.50 N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00
chr1 8816178 G 121 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 C:120:71.14:26.27:71.14:68:52:0.44:0.02:28.25:68:0.24:122.28:0.48 G:1:73.00:22.00:73.00:0:1:0.31:0.03:8.00:0:0.00:116.00:0.84 T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 .
modified 3.3 years ago
3.3 years ago by
mkulecka • 300