Extract only RNA-seq reads in a BAM file that extend into an intron
1
0
Entering edit mode
4.8 years ago
foxdie ▴ 20

I have a BAM file containing many amplicon RNA-seq reads mapped to a small genomic region.

Most of these reads appear to be correctly spliced (see before/after), but some contain bases that map to the intronic region. I want to extract (into a separate BAM file) only those reads who have bases that map to the intronic region.

I have searched for hours and haven't been able to find a solution.

Everything I've tried has been some version of:

samtools view \
-hb \
-o out.misspliced.bam \
in.bam \
chr12:132676220 # an intronic position

However, this command still extracts reads that are correctly spliced and don't have bases that cross into the intron (see "after" visual in posted image). When I pass the coordinates to samtools view, it seems to still include reads that are correctly splice but stil span the intron, even if they don't have bases that map to the intron.

I'm new to working with SAMtools and I'm unfamiliar with BEDtools (as of now).

Can anyone help me extract only the RNA-seq reads that have bases mapping to intronic positions between two exons to a new BAM file?

EDIT [SOLVED]:

I eventually solved this by using the -split option with bedtools intersect. This is the option that successfully makes bedtools view a spliced read as not covering an intronic region if it doesn't actually have bases mapping to that intronic region. First, I made a one-line BED file for the intron I wanted reads from:

> echo "chr12   132676204   132676545" > intron-region.bed

Then I used the following command to retrieve only reads that actually have bases mapping into the intron and save them to a new BAM:

bedtools intersect \
-a in.bam \
-b intron-region.bed \
-wa -split > misspliced-all.bam

Now to find how to do this just using samtools, because there must be a way...

I hope this helps anyone with a similar problem.

RNA-Seq alignment samtools bam bedtools • 2.5k views
ADD COMMENT
1
Entering edit mode

If you were you able to do this, can you advise what you plan on doing next? I ask because if you next intend to tabulate how many reads extend into each end of each intron, then you might profitably skip extracting them and directly count them through creative use of RSubread's FeatureCounts: A General-Purpose Read Summarization Function, or the Subread package, which it wraps.

ADD REPLY
0
Entering edit mode

Thanks, I will look into how to do this with Subread/FeatureCounts! We managed to do it with a lot of grepping, but I knew there had to be a tool already out there.

ADD REPLY
1
Entering edit mode
4.8 years ago

You could take those reads, and try filtering with bedtools to get all reads that partially intersect exons

ADD COMMENT
0
Entering edit mode

Thanks for the reply. I'm not familiar with bedtools yet so I will look into that. Thanks!

ADD REPLY
0
Entering edit mode

I solved my problem by using the -split option with bedtools intersect. I don't know why this was so hard to figure out. The -split option is the thing that seems to make it treat exons/introns as separate features within the read, which is what I needed.

ADD REPLY

Login before adding your answer.

Traffic: 2352 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6