I have a BAM file containing many amplicon RNA-seq reads mapped to a small genomic region.
Most of these reads appear to be correctly spliced (see ), but some contain bases that map to the intronic region. I want to extract (into a separate BAM file) only those reads who have bases that map to the intronic region.
I have searched for hours and haven't been able to find a solution.
Everything I've tried has been some version of:
samtools view \ -hb \ -o out.misspliced.bam \ in.bam \ chr12:132676220 # an intronic position
However, this command still extracts reads that are correctly spliced and don't have bases that cross into the intron (see "after" visual in posted image). When I pass the coordinates to
samtools view, it seems to still include reads that are correctly splice but stil span the intron, even if they don't have bases that map to the intron.
I'm new to working with SAMtools and I'm unfamiliar with BEDtools (as of now).
Can anyone help me extract only the RNA-seq reads that have bases mapping to intronic positions between two exons to a new BAM file?
I eventually solved this by using the
-split option with
bedtools intersect. This is the option that successfully makes bedtools view a spliced read as not covering an intronic region if it doesn't actually have bases mapping to that intronic region. First, I made a one-line BED file for the intron I wanted reads from:
> echo "chr12 132676204 132676545" > intron-region.bed
Then I used the following command to retrieve only reads that actually have bases mapping into the intron and save them to a new BAM:
bedtools intersect \ -a in.bam \ -b intron-region.bed \ -wa -split > misspliced-all.bam
Now to find how to do this just using samtools, because there must be a way...
I hope this helps anyone with a similar problem.