Question: Samtools segmentation fault when extracting exons from bam file.
0
gravatar for jstevenson2256
6 months ago by
jstevenson225610 wrote:

I have a bam file of RNAseq alignments generated by STAR (alignment.bam) and a list of possible exon positions in a txt file (exons.txt). I'm trying to extract these exons from the alignment file and to do so I'm using a samtools view loop:

#!/usr/bin/env
for line in `cat exons.txt`; do
            samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam
done;

The process has an output status of 0 (zero) but when looking at the stderr file it flags multiple segmentation faults, e.g.:

/cm/local/apps/torque/var/spool/mom_priv/jobs/2676979.master.cm.cluster.SC: line 13: 23232 Segmentation fault      samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam

/cm/local/apps/torque/var/spool/mom_priv/jobs/2676979.master.cm.cluster.SC: line 13: 30357 Segmentation fault      samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam

/cm/local/apps/torque/var/spool/mom_priv/jobs/2676979.master.cm.cluster.SC: line 13:  1273 Segmentation fault      samtools view -@ 8 -b -h alignment.bam $line >> exon_alignments.bam

Exons.txt contains one line per exon in the format of contig1:1-100. The samtools command works fine when run manually on any one of the exon coordinates in exons.txt. When the loop runs, output is normal for the first few lines and then the process fails with segmentation faults. This failure reliably happens as soon as the loop hits a coordinate indicating a new contig in the list. I've tested this by editing exons.txt and the behaviour is the same:

exons.1.txt:

contig1:1-100
contig1:200-300
contig2:1-100        <- segfault
contig2:200-300    <- segfault
contig3:1-100        <- segfault
contig3:200-300    <- segfault

exons.2.txt:

contig2:1-100        
contig2:200-300   
contig3:1-100        <- segfault
contig3:200-300    <- segfault

I've already indexed the bam file using samtools/1.9 in Linux. I cannot go through it one by one as there are over 2300 exons I'm scanning for.

Any help would be greatly appreciated.

ADD COMMENTlink modified 6 months ago by Damian Kao15k • written 6 months ago by jstevenson225610
2

You have indexed the bam file but was it sorted before indexing?

Note: I am not sure if you can append to an existing BAM file like what you are trying to do. That may be your problem.

You should convert your intervals to bed format file and then feed that to samtools with -L option.

-L FILE  only include reads overlapping this BED FILE
ADD REPLYlink modified 6 months ago • written 6 months ago by genomax73k

Excellent, thank you! Converting to a .bed file and using the -L command worked fabulously. Final script looks like:

samtools view -@ 8 -b -h -L exons.bed alignment.bam > exon_alignments.bam
ADD REPLYlink written 6 months ago by jstevenson225610
5
gravatar for Damian Kao
6 months ago by
Damian Kao15k
USA
Damian Kao15k wrote:

There is a -L flag that you can use with samtools to output alignments overlapping intervals defined by a .bed file. That's probably your best option. You'll have to convert your exons.txt to a bed file.

ADD COMMENTlink written 6 months ago by Damian Kao15k
1

Agreed. Using the loop in the toplevel post you basically append several BAM files to each other including the header sections etc. That is probably a bad idea.

ADD REPLYlink written 6 months ago by ATpoint24k

For future reference, if you want to append to a BAM file, you should use samtools merge.

ADD REPLYlink written 6 months ago by i.sudbery6.0k
1

Strictly speaking, appending would be samtools cat while merge would be combining while maintaining the coordinate-sorted order of the final file ;-)

ADD REPLYlink written 6 months ago by ATpoint24k

Thank you! It worked wonderfully.

ADD REPLYlink written 6 months ago by jstevenson225610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 3321 users visited in the last hour