Question: bam-readcount not finding reference in bam file
0
gravatar for obuzko
2.3 years ago by
obuzko0
obuzko0 wrote:

Hi, I'm trying to run bam-readcount with region information (as a workaround to the 8000 read limitation). I use version 0.8 cloned from the repository.

The reference FASTA file (reference_1.fa) has this header:

>reference_1:1-35911

It's indexed: bwa index reference_1.fa

I run the alignment:

bwa mem -t 8 reference_1.fa -p input.fastq | samtools sort -@8 -O BAM -o alignment_reference_1.bam -

Output is indexed:

samtools index alignment_reference_1.bam

At this point, I need to get base counts at each position in the reference. I created a separate file with region information (reference_1_regions.txt) that contains one line:

reference_1       1       35911 (tab-delimited)

The actual command:

bam-readcount -w 1 -d 1000000 -l reference_1_regions.txt -f reference_1.fa alignment_reference_1.bam > readcount_reference_1.txt

The output:

reference_1 not found in bam file. Region reference_1 1 35911 skipped.

I checked the bam file header with samtools view -H and here's the output:

@HD VN:1.5  SO:coordinate
@SQ SN:reference_1:1-35911  LN:35911
@PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem -t 8 reference_1.fa -p input.fastq

If anyone has any suggestions about the reason for this error, please let me know. Is this a bug in bam-readcount or am I missing something?

Thank you in advance

Sasha

bam-readcount bam • 821 views
ADD COMMENTlink modified 2.3 years ago by WouterDeCoster44k • written 2.3 years ago by obuzko0

Thank you! The funny thing is that it worked as some kind of a hack. I did try just using >reference_1 in the FASTA header and then "reference_1 1 35911" in the regions file, and it didn't work. Bam-readcount kept saying that reference_1 not found in FASTA file with a core dump.

However, when FASTA file contains ">reference_1:1-35911" and the regions file has "reference_1:1-35911 1 35911", it works. Go figure... In any event, thanks for the hint.

ADD REPLYlink written 2.3 years ago by obuzko0

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 2.3 years ago by WouterDeCoster44k
2
gravatar for ATpoint
2.3 years ago by
ATpoint40k
Germany
ATpoint40k wrote:

@SQ SN:reference_1:1-35911 LN:35911 Blockquote

As you can see from this line, the chromosome name is reference_1:1-35911, not reference_1. Change that in your reference_1_regions.txt. Are you trying to get an output for an entire BAM file? What do you want to do with it. The only use of the output I know of is together with the fpfilter of VarScan2, but I might be wrong.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by ATpoint40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1021 users visited in the last hour