Question: Selecting and de-selecting exon-exon junctions based on depth or coverage from RNA-seq mapping
0
gravatar for mjoyraj
4.8 years ago by
mjoyraj50
Taiwan
mjoyraj50 wrote:

In RNA-seq mapping with TopHat, the exon-exon junctions are reported in junction.bed file. This includes both known junction and novel junction based on read mapping between two exons. That is, the exons which are known to be spliced and come together as well as unknown or novel exon-exon association based on the mapping are stored in the junction.bed file. The exon-exon known junctions reported is well known but to claim novel exon-exon junctions, it is very important to assess the depth or coverage of the novel exon-exon junction. Selecting a common threshold for filtering the false positive junctions is complicated by the fact that the depth or coverage will vary greatly in RNA-seq. The genes expresses at low-level will be covered by less number of reads so as their junctions. If any novel junction is found at that gene, it will be covered by less number of reads. Normally it will be discarded considering the coverage is low, but low coverage is due to low-level expression of the gene. So to defend the novel exon-exon junctions with RNA-seq, I think a relative approach is needed. I will like to know the comments on this topic from experts. Answering this will help greatly the novices in this field.

rna-seq • 2.0k views
ADD COMMENTlink modified 4.8 years ago by geek_y9.9k • written 4.8 years ago by mjoyraj50

I am assuming that you have two different conditions. When there is a exon-exon junction, there will be a transcript formed as a result. You may look how many transcripts are significantly expressed and look how many of those transcripts are formed due to novel exon-exon splicing events. It may be like just looking at novel transcripts that are differentially expressed.

If you do not have different conditions, you may need to look at public data if any . Or else if you have biological replicates, the variation should be consistent across the biological replicates, which adds significance to the novel sites.

 

ADD REPLYlink written 4.8 years ago by geek_y9.9k

I have RNA seq data from five conditions, each condition has three replicates. So consistency of the differential expression of the novel transcripts confirms the existence of the transcripts. Thanks for the information...

ADD REPLYlink written 4.8 years ago by mjoyraj50
0
gravatar for geek_y
4.8 years ago by
geek_y9.9k
Barcelona
geek_y9.9k wrote:

If you have biological replicates and different conditions, when there is a exon-exon junction, there will be a transcript formed as a result. You may look how many transcripts are significantly expressed and look how many of those transcripts are formed due to novel exon-exon splicing events. It may be like just looking at novel transcripts that are differentially expressed.

Just posting it as an answer, so that it won't remain as open question.

 

ADD COMMENTlink written 4.8 years ago by geek_y9.9k

Okay... got it...

ADD REPLYlink written 4.8 years ago by mjoyraj50

Thanks for sharing your knowledge....

ADD REPLYlink written 4.8 years ago by mjoyraj50

One more simple question, whether it is better to assemble the mapped reads with GTF file or without GTF file. There is a option in cufflinks like '-g' which allows assembly of the reads from known as well as novel junctions. So I thought of assembling the reads with the cufflinks commands

cufflinks -o "cufflinks_output" -p 12 -g "GTF-file" -b "Genome" --max-bundle-frags 1000000000000 --multi-read-correct "tophat_output_sorted.bam"

ADD REPLYlink written 4.8 years ago by mjoyraj50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 976 users visited in the last hour