Question: Rna Seq Transcript Aligns On The Wrong Strand
1
gravatar for disco
6.3 years ago by
disco30
disco30 wrote:

Hello,

I'm analysing RNA seq data from the ENCODE CSHL long RNA seq to see differential expression between two genes sharing a chromosomal locus. I am really not familiar with bio-informatics at all, a wet bench researcher through & through. Somehow, I managed to get on with a linux platform and started with a single sample to analyse with cufflinks, and further aligned it to the reference genome using IGV. What I see is that the transcripts from cufflinks for the two genes are on the same strand in IGV, as opposed to the reality wherein they are in different strands, going away from each other. I'm pretty convinced that its a technical mistake, pertaining to the fact that I'm not suave with these informatic analyses. But if anybody could please point out how it is done properly or what could possibly have gone wrong, I would be really grateful.

Many thanks, Vaish

strand • 2.0k views
ADD COMMENTlink modified 6.3 years ago by Michael Dondrup46k • written 6.3 years ago by disco30
1

I know you've probably thought of this, but I'd suggest finding a local resource to go through this with you. There are MANY details in an analysis that you will want to learn, I'm sure, and having someone you can run ideas past can be the most effective way to do that.

ADD REPLYlink written 6.3 years ago by Sean Davis25k

Yeah, I tried my best but couldn't find anyone who would sit and go through the whole thing, some people were kind enough to suggest and direct me, and we don't really have a bio-informatician in my group.

ADD REPLYlink written 6.3 years ago by disco30

Also see older question: Transcript Specific Expression Data

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by Josh Herr5.6k

Hello Josh, do you have an idea what could be wrong with this analysis method?

ADD REPLYlink written 6.3 years ago by disco30

Can you clarify a bit? I'm not sure exactly what you mean about seeing "the transcripts from cufflinks in IGV" ... are you somehow loading a gtf (gff) generated by cufflinks in IGV? Or are you looking at the read alignments (the accepted_hits.bam) in IGV and something looks weird to you?

ADD REPLYlink written 6.3 years ago by Steve Lianoglou5.0k

I loaded the gtf file generated by cufflinks and viewing it in IGV.. I could post a screenshot if it would be helpful..

ADD REPLYlink written 6.3 years ago by disco30

This is what it looks like in IGV;

http://s2.postimage.org/fbhntvzmh/Screenshot_from_2012_12_21_16_21_55.png

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by disco30
3
gravatar for Michael Dondrup
6.3 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

That is most likely not a mistake. Most RNA-seq protocols are not strand specific. I would check with the sequencing lab, and until it is stated explicitly, assume that there is no valid strand information in the data.

ADD COMMENTlink written 6.3 years ago by Michael Dondrup46k

Thanks for the response. All I'm doing now is to see differential expression of these two genes across different samples. So, this shouldn't be a problem, right?

ADD REPLYlink written 6.3 years ago by disco30

I would check the aligned reads for uniqueness, and to be on the safe side when drawing conclusions, discard reads which have multiple matches to the reference (on whatever strand) from the data for DE analysis.

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by Michael Dondrup46k

That makes a lot of sense, thanks a lot! I'm going to try that.

ADD REPLYlink written 6.3 years ago by disco30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 974 users visited in the last hour