Question

Get multiple snps coverage (allelic depth) from bam files

2

Entering edit mode

8.7 years ago

MAPK ★ 2.1k

I have more than 500 samples (or thier bam files). I also have more than 1000 snps in snp table as shown below. I need to get the number of reads (allelic depth) for reference allele and alternate allele from all the bam files. Is there any program to get that directly from bam files given the snp information below? I know pile up package in R samtools does that, but for some reason I often miss either positive strand read or - strand counts.

snp table:

chr    position    REF    ALT    TYPE
3      23322333    T      C      snp
2      22322223    A      T      snp

bam sequencing genotype • 4.3k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 8.7 years ago by MAPK ★ 2.1k

Ram · Answer 1 · 2015-11-17

1

Entering edit mode

8.7 years ago

GouthamAtla 12k

This tool provides a script ComputePileupFreqs.pl which can be used for counting ref and alternate allele from pileup output.

Not tested.

ADD COMMENT • link updated 4.7 years ago by Ram 44k • written 8.7 years ago by GouthamAtla 12k

Ram · Answer 2 · 2015-11-17

1

Entering edit mode

8.7 years ago

mkulecka ▴ 360

Bam-readcount is ideal for this. Apart from raw read depth of every base present it gives you many useful information, inluding average quality mapping, average base quality etc. Its output is also used for variant filtering in fpfilter script - https://github.com/ckandoth/variant-filter.

In order to obtain information for every alleles present, you'll have to provide script with reference file in fasta format. The command looks like this:

bam-readcount -q1 -l sites_list -f reference.fasta your_file.bam > your_file.readcount

-q1 option means minimum mapping quality of 1.

Sites list is supposed to look like this (tab delimited):

chr1 56777 56777

Creators of the fpfilter script mentioned above provided a nice one-liner to convert your vcf file into sites_list file:

perl -ane 'print join("\t",@F[0,1,1])."\n" unless(m/^#/)' your.vcf > sites_list

I tested it on human and mice data and it worked fine in both cases. Sample output from mice data looks like this:

chr1    8816113    T    146    =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    A:3:4.00:26.00:4.00:0:3:0.71:0.01:26.00:0:0.00:121.33:0.65    C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    T:143:63.80:24.38:63.80:79:64:0.57:0.01:21.76:79:0.62:120.83:0.50    N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00
chr1    8816178    G    121    =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    C:120:71.14:26.27:71.14:68:52:0.44:0.02:28.25:68:0.24:122.28:0.48    G:1:73.00:22.00:73.00:0:1:0.31:0.03:8.00:0:0.00:116.00:0.84    T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00    N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 .

ADD COMMENT • link updated 4.7 years ago by Ram 44k • written 8.7 years ago by mkulecka ▴ 360

1

Entering edit mode

Does it also gives the number of reads for each reference and alternate allele over a SNP ?

ADD REPLY • link 8.7 years ago by GouthamAtla 12k

0

Entering edit mode

Yes, exactly.

ADD REPLY • link updated 4.7 years ago by Ram 44k • written 8.7 years ago by mkulecka ▴ 360

0

Entering edit mode

what would be the command if i have to calculate the reference and alternate allele reads counts for a bam file in.bam and a vcf file in.vcf ?

ADD REPLY • link 8.7 years ago by GouthamAtla 12k

0

Entering edit mode

Yes what would be the command for that and If there is anythig to pipeline or loop through multiple bam files for multiple snp? I also could not get ALT base count using mock data. I am using their mock data and with the following command(which doesn't seem to work):

./bam-readcount -q 0 -l /Downloads/site_list.txt /Downloads/test.bam

ADD REPLY • link updated 4.7 years ago by Ram 44k • written 8.7 years ago by MAPK ★ 2.1k

0

Entering edit mode

Well, I forgot to mention you have to have a reference for that. I'll put that in my response.

ADD REPLY • link updated 4.7 years ago by Ram 44k • written 8.7 years ago by mkulecka ▴ 360

0

Entering edit mode

Thanks!

ADD REPLY • link updated 4.7 years ago by Ram 44k • written 8.7 years ago by MAPK ★ 2.1k