Question: bam-readcount empty output
0
gravatar for alexander.charney
20 months ago by
alexander.charney0 wrote:

I have encountered similar questions elsewhere, but none exactly replicate my problem.

I ran bam-readcount for about 1000 sites (across all chromosomes) for two samples from the same individual. I know both bam files have reads at the sites in question, as these sites were called as variant by multiple somatic callers (mutect, varscan, etc). Yet, for one sample I get an empty output from bam-readcount, while for the other I get the expected counts. To illustrate the problem with 1 site, see below.

First, I make the input site list:

echo "1 884091 884091" | tr ' ' '\t' > test.bed

Second, I run bam-readcount:

bam-readcount --reference-fasta ref.fa --site-list test.bed sample1.bam > test1

bam-readcount --reference-fasta ref.fa --site-list test.bed sample2.bam > test2

When I do this, we see test1 is an empty file, whereas test2 has counts for the site.

Third, as a sanity chec, I use samtools to subset these bams using the same site list. This is to verify this site is indeed covered in both bam files:

samtools view -L test.bed sample1.bam > test3

samtools view -L test.bed sample2.bam > test4

When we do this, we see there is output for both files:

head -1 test3 test4

==> test3 <== HWI-D00310:198:CALR0ANXX:5:1116:18839:21292 99 1 883969 60 125M = 883971 127 GACCTGCTGGAACATCTGCCCCAAGGGCCGTGTCAGGCTCTCTCGGCCCCATGCCTGGTCACCCTGGCTTCACCCTGGCTGCACCCTGGTCCCCCTGGTCCCTTTGGCCCTGCACCTGGCTGCAC >??ADEEEGECCDDCEEGEDDDDCFEEED@BGBEDFEEEEEEEE@EEDDDDCGECEGEBEDEDDEGEEECEDEDDEFEEEGEDECDEGEBEDDDDEGCBDDDECCGEEDDEGEDEDEGEDCB?>@ MC:Z:125M BD:Z:KKMLKNNNMMLMMLMNKLLKIHJJKLHJJIJJILKMLJKJJJJJKJJJHHJLLKJHKKJLKJKHHKKJKIKKJKHHKKJKKKLJKHHKKJLKHHHILLKMLIIJEKLKKIILLMKLJMMLNLLMK MD:Z:125 RG:Z:BCALR0ANXX.5.TCCTGAGC-ACTGCATA BI:Z:PPSRUWUVVSTRROSSRTSQMMSPQPMRPPROOSRTPSTQRQRQTQSQMMSSTSQRTRSTSPRMRTRSTQRSPRMRTRSTTSSPRMRTRSTQNNNSUSTURNSRKTSTROTVUURTTVTUVTSSP NM:i:0 BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ MQ:i:60 AS:i:125 XS:i:19

==> test4 <== HWI-D00310:198:CALR0ANXX:4:2308:5529:20058 99 1 883970 60 125M = 883992 147 ACCTGCTGGAACATCTGCCCCAAGGGCCGTGTCAGGCTCTCTCGGCCCCATGCATGGTCACCCTGGCTTCACCCTGGCTGCACCCTGGTCCCCCTGGTCCCTTTGGCCCTGCACCTGGCTGCACC AA@12DE1@/ADCDFFFA-.AC=E/DED>AFAECGACFA0CDD?CD7?CDDGBABFEBFCBCEFGCFEB/CB.?DGEFFCF.DAEDGE-B/AEEDD<9EABFCBGDE//=0?0DDD0DCDD?>@@ MC:Z:125M BD:Z:KKLJPPNOMLMMJMNKLLKHHJJJLGIJHJIHKKMLJKJJJJJKKJKHHJLLLLLLKJLKJKHILKJKIKKJKHILKJKLLLJKHILKJLKHHHILKJLKIJJEKLKLIJMMMKLJNMLMOMMKL MD:Z:53C71 RG:Z:BCALR0ANXX.4.TCCTGAGC-TATCCTCT BI:Z:PPQRUUVTSSQQORSPRRPLLQNQPMRPPROORQRPRSOPOPORPRPLLQQRRRQRQRRQNPLQRQSTPRROQLRSRSTSSSOQMRSRSSQMMMRSRSSQMRPJSSTRNSTTTPSTUTUVUSSOQ NM:i:1 BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ MQ:i:60 AS:i:120 XS:i:20

If anyone has seen similar behavior and has any suggestions, would be greatly appreciated.

bam-readcount • 801 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by alexander.charney0

https://github.com/genome/bam-readcount

The list of regions should be formatted as chromosome start and end. Each field should be tab separated and coordinates should be 1-based.

Could that be a problem that your list of regions is in bed-format?

ADD REPLYlink written 20 months ago by Santosh Anand4.6k

The file is formatted as you describe. Furthermore, I don't believe it is a problem with the region list file because this file is working for sample2.bam

ADD REPLYlink written 20 months ago by alexander.charney0
1 884091 884091

That's in bed-format!

Furthermore, I don't believe it is a problem with the region list file because this file is working for sample2.bam

you never know the internal working mechanisms of such programs, so it is better to follow exactly what is being said in the manual

ADD REPLYlink written 20 months ago by Santosh Anand4.6k

Again, the test.bed file in my example is formatted as the manual instructs: three tab-separated columns (chr, start, end). The start and end having the same position does not matter for bam-readcount. To be sure, I have run this using different iterations of the region list file and the same issue occurs. Its very strange

ADD REPLYlink written 20 months ago by alexander.charney0

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLYlink written 20 months ago by WouterDeCoster37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1957 users visited in the last hour