Selecting and de-selecting exon-exon junctions based on depth or coverage from RNA-seq mapping
1
0
Entering edit mode
7.5 years ago
mjoyraj ▴ 80

In RNA-seq mapping with TopHat, the exon-exon junctions are reported in junction.bed file. This includes both known junction and novel junction based on read mapping between two exons. That is, the exons which are known to be spliced and come together as well as unknown or novel exon-exon association based on the mapping are stored in the junction.bed file. The exon-exon known junctions reported is well known but to claim novel exon-exon junctions, it is very important to assess the depth or coverage of the novel exon-exon junction. Selecting a common threshold for filtering the false positive junctions is complicated by the fact that the depth or coverage will vary greatly in RNA-seq. The genes expresses at low-level will be covered by less number of reads so as their junctions. If any novel junction is found at that gene, it will be covered by less number of reads. Normally it will be discarded considering the coverage is low, but low coverage is due to low-level expression of the gene. So to defend the novel exon-exon junctions with RNA-seq, I think a relative approach is needed. I will like to know the comments on this topic from experts. Answering this will help greatly the novices in this field.

RNA-Seq • 2.7k views
0
Entering edit mode

I am assuming that you have two different conditions. When there is a exon-exon junction, there will be a transcript formed as a result. You may look how many transcripts are significantly expressed and look how many of those transcripts are formed due to novel exon-exon splicing events. It may be like just looking at novel transcripts that are differentially expressed.

If you do not have different conditions, you may need to look at public data if any . Or else if you have biological replicates, the variation should be consistent across the biological replicates, which adds significance to the novel sites.

0
Entering edit mode

I have RNA seq data from five conditions, each condition has three replicates. So consistency of the differential expression of the novel transcripts confirms the existence of the transcripts. Thanks for the information...

0
Entering edit mode
7.5 years ago

If you have biological replicates and different conditions, when there is a exon-exon junction, there will be a transcript formed as a result. You may look how many transcripts are significantly expressed and look how many of those transcripts are formed due to novel exon-exon splicing events. It may be like just looking at novel transcripts that are differentially expressed.

Just posting it as an answer, so that it won't remain as open question.

0
Entering edit mode

Okay... got it...

0
Entering edit mode

0
Entering edit mode

One more simple question, whether it is better to assemble the mapped reads with GTF file or without GTF file. There is a option in cufflinks like -g which allows assembly of the reads from known as well as novel junctions. So I thought of assembling the reads with the cufflinks commands

cufflinks -o "cufflinks_output" -p 12 -g "GTF-file" -b "Genome" --max-bundle-frags 1000000000000 --multi-read-correct "tophat_output_sorted.bam"