Question: How To Extract Spliced Rnaseq Reads
gravatar for Chirag Nepal
6.4 years ago by
Chirag Nepal2.2k
Chirag Nepal2.2k wrote:

Hey all, From acceptedhits.bam, I want to count only those reads that are spliced across two exons. How do we extract such information from BAM file ?

I think one way would be: use "split" option in coveragebed from bedtools. Though i am not 100% sure.

cheers Chirag

rnaseq • 6.4k views
ADD COMMENTlink modified 2.6 years ago by Biostar ♦♦ 20 • written 6.4 years ago by Chirag Nepal2.2k
gravatar for Ashutosh Pandey
6.4 years ago by
Ashutosh Pandey12k wrote:

samtools view acceptedhits.bam | awk '($6 ~ /N/)' | cut -f1 will give you the read ids of the spliced reads.

'N' tag in BAM format represents skipped region from the reference. So if a read doesn't have a continuous alignment or a large reference region is skipped from the alignment then that portion of reference genome will be depicted in BAM file using 'N' tag in the sixth column. This won't guarantee you that both the portions of the reads are aligned on exons. They can be covering exon-intron, intron-exon or intron-intron regions too. But as it is a RNA-seq read most probably they should be exon-exon. It would be pretty easy to check it based on GTF annotation file you have.

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Ashutosh Pandey12k

Is it possible to count spliced read over intron from intron.bed?

ADD REPLYlink written 19 months ago by thindmarsmission0
gravatar for Charles Warden
6.4 years ago by
Charles Warden7.7k
Duarte, CA
Charles Warden7.7k wrote:

There are lots of ways to do this. I think the easiest solution is to use knowledge of known exon junctions.

1). Providing TopHat with a .gtf file should produce a junctions output file (technically, it already produces this file, but I think it should be empty unless you provide a reference list of transcript locations)

2) Use a software to predict splicing events. This will give you a prioritized list (in addition to providing counts for all relevant splicing junctions). MATS is my favorite tool for this, and MISO is another popular option.



3) If it is relevant, there are also gene fusions programs. I have a slight preference for chimerascan, but I have tried all of the following programs:




I'm sure there are also options for generically splicing the .bed file for split reads, but I would typcially focus on looking for software that also assists with the downstream analysis (for whatever specific application I am interested in). So, I don't really have recommendations on this end.

ADD COMMENTlink written 6.4 years ago by Charles Warden7.7k

After RNA-seq alignment using tophat or STAR, only one bam file will be outputted. But the MATS need to input two bam files, is it a must to seperate the bam file into two bam files accoarding the first or second reads? Is there any other good methods to deal with this problem?

ADD REPLYlink written 4.9 years ago by zju.whw40

MATS detects differential alternative splicing events between two conditions. For example, before and after treatment or normal vs tumor samples etc. That is why it requires two bam files (one for each condition).  It has nothing to do with first and second read. 

ADD REPLYlink written 4.9 years ago by Ashutosh Pandey12k

Thank you very much. Your comment is very useful and helpful. I took the "sample_1" and "sample_2" as "read_1" and "read2" by mistake.

ADD REPLYlink written 4.9 years ago by zju.whw40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2172 users visited in the last hour