Question

Extract only RNA-seq reads in a BAM file that extend into an intron

0

Entering edit mode

4.8 years ago

foxdie ▴ 20

I have a BAM file containing many amplicon RNA-seq reads mapped to a small genomic region.

Most of these reads appear to be correctly spliced (see before/after ), but some contain bases that map to the intronic region. I want to extract (into a separate BAM file) only those reads who have bases that map to the intronic region.

I have searched for hours and haven't been able to find a solution.

Everything I've tried has been some version of:

samtools view \
-hb \
-o out.misspliced.bam \
in.bam \
chr12:132676220 # an intronic position

However, this command still extracts reads that are correctly spliced and don't have bases that cross into the intron (see "after" visual in posted image). When I pass the coordinates to samtools view, it seems to still include reads that are correctly splice but stil span the intron, even if they don't have bases that map to the intron.

I'm new to working with SAMtools and I'm unfamiliar with BEDtools (as of now).

Can anyone help me extract only the RNA-seq reads that have bases mapping to intronic positions between two exons to a new BAM file?

EDIT [SOLVED]:

I eventually solved this by using the -split option with bedtools intersect. This is the option that successfully makes bedtools view a spliced read as not covering an intronic region if it doesn't actually have bases mapping to that intronic region. First, I made a one-line BED file for the intron I wanted reads from:

> echo "chr12   132676204   132676545" > intron-region.bed

Then I used the following command to retrieve only reads that actually have bases mapping into the intron and save them to a new BAM:

bedtools intersect \
-a in.bam \
-b intron-region.bed \
-wa -split > misspliced-all.bam

Now to find how to do this just using samtools, because there must be a way...

I hope this helps anyone with a similar problem.

RNA-Seq alignment samtools bam bedtools • 2.5k views

ADD COMMENT • link 4.8 years ago by foxdie ▴ 20

1

Entering edit mode

If you were you able to do this, can you advise what you plan on doing next? I ask because if you next intend to tabulate how many reads extend into each end of each intron, then you might profitably skip extracting them and directly count them through creative use of RSubread's FeatureCounts: A General-Purpose Read Summarization Function, or the Subread package, which it wraps.

ADD REPLY • link 4.8 years ago by Malcolm.Cook ★ 1.5k

0

Entering edit mode

Thanks, I will look into how to do this with Subread/FeatureCounts! We managed to do it with a lot of grepping, but I knew there had to be a tool already out there.

ADD REPLY • link 4.8 years ago by foxdie ▴ 20

score 1 · Answer 1 · 2019-07-24

1

Entering edit mode

4.8 years ago

swbarnes2 14k

You could take those reads, and try filtering with bedtools to get all reads that partially intersect exons

ADD COMMENT • link 4.8 years ago by swbarnes2 14k

0

Entering edit mode

Thanks for the reply. I'm not familiar with bedtools yet so I will look into that. Thanks!

ADD REPLY • link 4.8 years ago by foxdie ▴ 20

0

Entering edit mode

I solved my problem by using the -split option with bedtools intersect. I don't know why this was so hard to figure out. The -split option is the thing that seems to make it treat exons/introns as separate features within the read, which is what I needed.

ADD REPLY • link 4.8 years ago by foxdie ▴ 20