Question: Extract only RNA-seq reads in a BAM file that extend into an intron
gravatar for foxdie
8 weeks ago by
foxdie10 wrote:

I have a BAM file containing many amplicon RNA-seq reads mapped to a small genomic region.

Most of these reads appear to be correctly spliced (see before/after), but some contain bases that map to the intronic region. I want to extract (into a separate BAM file) only those reads who have bases that map to the intronic region.

I have searched for hours and haven't been able to find a solution.

Everything I've tried has been some version of:

samtools view \
-hb \
-o out.misspliced.bam \
in.bam \
chr12:132676220 # an intronic position

However, this command still extracts reads that are correctly spliced and don't have bases that cross into the intron (see "after" visual in posted image). When I pass the coordinates to samtools view, it seems to still include reads that are correctly splice but stil span the intron, even if they don't have bases that map to the intron.

I'm new to working with SAMtools and I'm unfamiliar with BEDtools (as of now).

Can anyone help me extract only the RNA-seq reads that have bases mapping to intronic positions between two exons to a new BAM file?


I eventually solved this by using the -split option with bedtools intersect. This is the option that successfully makes bedtools view a spliced read as not covering an intronic region if it doesn't actually have bases mapping to that intronic region. First, I made a one-line BED file for the intron I wanted reads from:

> echo "chr12   132676204   132676545" > intron-region.bed

Then I used the following command to retrieve only reads that actually have bases mapping into the intron and save them to a new BAM:

bedtools intersect \
-a in.bam \
-b intron-region.bed \
-wa -split > misspliced-all.bam

Now to find how to do this just using samtools, because there must be a way...

I hope this helps anyone with a similar problem.

ADD COMMENTlink modified 7 weeks ago • written 8 weeks ago by foxdie10

If you were you able to do this, can you advise what you plan on doing next? I ask because if you next intend to tabulate how many reads extend into each end of each intron, then you might profitably skip extracting them and directly count them through creative use of RSubread's FeatureCounts: A General-Purpose Read Summarization Function, or the Subread package, which it wraps.

ADD REPLYlink written 8 weeks ago by Malcolm.Cook1.1k

Thanks, I will look into how to do this with Subread/FeatureCounts! We managed to do it with a lot of grepping, but I knew there had to be a tool already out there.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by foxdie10
gravatar for swbarnes2
8 weeks ago by
United States
swbarnes26.5k wrote:

You could take those reads, and try filtering with bedtools to get all reads that partially intersect exons

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by swbarnes26.5k

Thanks for the reply. I'm not familiar with bedtools yet so I will look into that. Thanks!

ADD REPLYlink written 7 weeks ago by foxdie10

I solved my problem by using the -split option with bedtools intersect. I don't know why this was so hard to figure out. The -split option is the thing that seems to make it treat exons/introns as separate features within the read, which is what I needed.

ADD REPLYlink written 7 weeks ago by foxdie10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1528 users visited in the last hour