0
0
Entering edit mode
4.2 years ago

I have encountered similar questions elsewhere, but none exactly replicate my problem.

I ran bam-readcount for about 1000 sites (across all chromosomes) for two samples from the same individual. I know both bam files have reads at the sites in question, as these sites were called as variant by multiple somatic callers (mutect, varscan, etc). Yet, for one sample I get an empty output from bam-readcount, while for the other I get the expected counts. To illustrate the problem with 1 site, see below.

First, I make the input site list:

echo "1 884091 884091" | tr ' ' '\t' > test.bed

bam-readcount --reference-fasta ref.fa --site-list test.bed sample1.bam > test1

bam-readcount --reference-fasta ref.fa --site-list test.bed sample2.bam > test2

When I do this, we see test1 is an empty file, whereas test2 has counts for the site.

Third, as a sanity chec, I use samtools to subset these bams using the same site list. This is to verify this site is indeed covered in both bam files:

samtools view -L test.bed sample1.bam > test3

samtools view -L test.bed sample2.bam > test4

When we do this, we see there is output for both files:

head -1 test3 test4

==> test3 <== HWI-D00310:198:CALR0ANXX:5:1116:18839:21292 99 1 883969 60 125M = 883971 127 GACCTGCTGGAACATCTGCCCCAAGGGCCGTGTCAGGCTCTCTCGGCCCCATGCCTGGTCACCCTGGCTTCACCCTGGCTGCACCCTGGTCCCCCTGGTCCCTTTGGCCCTGCACCTGGCTGCAC >??ADEEEGECCDDCEEGEDDDDCFEEED@BGBEDFEEEEEEEE@EEDDDDCGECEGEBEDEDDEGEEECEDEDDEFEEEGEDECDEGEBEDDDDEGCBDDDECCGEEDDEGEDEDEGEDCB?>@ MC:Z:125M BD:Z:KKMLKNNNMMLMMLMNKLLKIHJJKLHJJIJJILKMLJKJJJJJKJJJHHJLLKJHKKJLKJKHHKKJKIKKJKHHKKJKKKLJKHHKKJLKHHHILLKMLIIJEKLKKIILLMKLJMMLNLLMK MD:Z:125 RG:Z:BCALR0ANXX.5.TCCTGAGC-ACTGCATA BI:Z:PPSRUWUVVSTRROSSRTSQMMSPQPMRPPROOSRTPSTQRQRQTQSQMMSSTSQRTRSTSPRMRTRSTQRSPRMRTRSTTSSPRMRTRSTQNNNSUSTURNSRKTSTROTVUURTTVTUVTSSP NM:i:0 BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ MQ:i:60 AS:i:125 XS:i:19

==> test4 <== HWI-D00310:198:CALR0ANXX:4:2308:5529:20058 99 1 883970 60 125M = 883992 147 ACCTGCTGGAACATCTGCCCCAAGGGCCGTGTCAGGCTCTCTCGGCCCCATGCATGGTCACCCTGGCTTCACCCTGGCTGCACCCTGGTCCCCCTGGTCCCTTTGGCCCTGCACCTGGCTGCACC AA@12DE1@/ADCDFFFA-.AC=E/DED>AFAECGACFA0CDD?CD7?CDDGBABFEBFCBCEFGCFEB/CB.?DGEFFCF.DAEDGE-B/AEEDD<9EABFCBGDE//=0?0DDD0DCDD?>@@ MC:Z:125M BD:Z:KKLJPPNOMLMMJMNKLLKHHJJJLGIJHJIHKKMLJKJJJJJKKJKHHJLLLLLLKJLKJKHILKJKIKKJKHILKJKLLLJKHILKJLKHHHILKJLKIJJEKLKLIJMMMKLJNMLMOMMKL MD:Z:53C71 RG:Z:BCALR0ANXX.4.TCCTGAGC-TATCCTCT BI:Z:PPQRUUVTSSQQORSPRRPLLQNQPMRPPROORQRPRSOPOPORPRPLLQQRRRQRQRRQNPLQRQSTPRROQLRSRSTSSSOQMRSRSSQMMMRSRSSQMRPJSSTRNSTTTPSTUTUVUSSOQ NM:i:1 BQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ MQ:i:60 AS:i:120 XS:i:20

If anyone has seen similar behavior and has any suggestions, would be greatly appreciated.

0
Entering edit mode

The list of regions should be formatted as chromosome start and end. Each field should be tab separated and coordinates should be 1-based.

Could that be a problem that your list of regions is in bed-format?

0
Entering edit mode

The file is formatted as you describe. Furthermore, I don't believe it is a problem with the region list file because this file is working for sample2.bam

0
Entering edit mode
1 884091 884091


That's in bed-format!

Furthermore, I don't believe it is a problem with the region list file because this file is working for sample2.bam

you never know the internal working mechanisms of such programs, so it is better to follow exactly what is being said in the manual

0
Entering edit mode

Again, the test.bed file in my example is formatted as the manual instructs: three tab-separated columns (chr, start, end). The start and end having the same position does not matter for bam-readcount. To be sure, I have run this using different iterations of the region list file and the same issue occurs. Its very strange

0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.