Question: bwa and bowtie2 bamfile format
0
gravatar for nkinney06
2.9 years ago by
nkinney0630
nkinney0630 wrote:

I'm using a couple programs to look at repetitive DNA in mapped reads (bamfiles): genotan and repeatseq. Both programs have publications and are designed to work on bamfiles.

The problem is that I'm getting segmentation faults from both programs on some of the bamfiles I would like to analyze. It's hard to tell for sure why some bamfiles run successfully while others seg fault. There does appear to be a much greater tendency to seg fault on any bamfile created with bwa. Im wondering if there are any formatting differences between bwa and bowtie2 mapped bamfiles and if there is a way to repair the files that fail without remapping. Perhaps its a whitespace issue or special character issue. Debugging the programs myself is probably unfeasible so Im looking for any other solution, Thanks

bwa bamfiles bowtie2 • 1.0k views
ADD COMMENTlink modified 2.9 years ago by d-cameron2.1k • written 2.9 years ago by nkinney0630

Try running your script with providing more memory.Segmentation faults could be because of insufficient memory.

ADD REPLYlink written 2.9 years ago by Ron970
2
gravatar for d-cameron
2.9 years ago by
d-cameron2.1k
Australia
d-cameron2.1k wrote:

The SAM file format specifications (which also define the binary BAM equivalent) are quite flexible so it is quite easy to write a SAM/BAM file that is valid according to the specifications, but the program processing the bam file considers that invalid input. Some example include:

  • Writing multiple records for the same read (eg bwa by default does split read alignment which, if the downstream program is expecting 1 per read, could exhibit as a program crash only if both split alignments are aligned to the same repeat).

  • Read that align before or after the start/end of a chromosome

  • Different interpretations of SAM flags (eg bwa sets the "0x2 each segment properly aligned according to the aligner" for paired reads that are aligned to the same chromosome in the correct orientation regardless of how far apart they the align)

  • bwa hard clips the split reads, whereas bowtie2 does not use the hard clipping CIGAR operator

    • The additional SAM tags written by bwa and bowtie2 are different.

In short, just because something is a valid SAM/BAM file, doesn't mean the downstream tools know what to do with it. My guess is that your programs are crashing on a particular input edge case that they weren't designed to handle.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by d-cameron2.1k

This is potentially a very helpful lead. Do you happen to know of any filtering program? Id like to simply remove some of these cases that you mention from my bamfiles and see if I have better luck. I may try some of the available perl or python bamreaders to quickly write something from scratch but Im not sure these tools will be sufficient and writing a filter in C++ could be quite an untertaking. Thanks

ADD REPLYlink written 2.9 years ago by nkinney0630
1

Have you eliminated the other possible explanations (malformed BAMs or memory limits)? Filtering edge cases will not solve either of those problems.

ADD REPLYlink written 2.9 years ago by harold.smith.tarheel4.4k

I don't think its memory limitations and I would have trouble adjusting this because the C source code is rather challenging. I could look into bam validation with the tool you mentioned, but it seems like the programs may be running until they "hit" one the problem reads that cause a seg fault. I may not be too difficult to write a perl script to filter out a couple of the special cases mentioned here. There's also cases of bams were one program works but the other fails. Its a little frustrating since these programs are published but at least I have some directions now.

ADD REPLYlink written 2.9 years ago by nkinney0630
0
gravatar for harold.smith.tarheel
2.9 years ago by
United States
harold.smith.tarheel4.4k wrote:

Probably a memory issue per @Ron, but you can run BamUtilities 'validate' or Picard's ValidateSamFile to assess your BAMs.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by harold.smith.tarheel4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour