Question

Rna Seq Transcript Aligns On The Wrong Strand

1

Entering edit mode

11.3 years ago

disco ▴ 30

Hello,

I'm analysing RNA seq data from the ENCODE CSHL long RNA seq to see differential expression between two genes sharing a chromosomal locus. I am really not familiar with bio-informatics at all, a wet bench researcher through & through. Somehow, I managed to get on with a linux platform and started with a single sample to analyse with cufflinks, and further aligned it to the reference genome using IGV. What I see is that the transcripts from cufflinks for the two genes are on the same strand in IGV, as opposed to the reality wherein they are in different strands, going away from each other. I'm pretty convinced that its a technical mistake, pertaining to the fact that I'm not suave with these informatic analyses. But if anybody could please point out how it is done properly or what could possibly have gone wrong, I would be really grateful.

Many thanks, Vaish

strand • 3.2k views

ADD COMMENT • link updated 11.3 years ago by Michael 54k • written 11.3 years ago by disco ▴ 30

1

Entering edit mode

I know you've probably thought of this, but I'd suggest finding a local resource to go through this with you. There are MANY details in an analysis that you will want to learn, I'm sure, and having someone you can run ideas past can be the most effective way to do that.

ADD REPLY • link 11.3 years ago by Sean Davis 26k

0

Entering edit mode

Yeah, I tried my best but couldn't find anyone who would sit and go through the whole thing, some people were kind enough to suggest and direct me, and we don't really have a bio-informatician in my group.

ADD REPLY • link 11.3 years ago by disco ▴ 30

0

Entering edit mode

Also see older question: Transcript Specific Expression Data

ADD REPLY • link 11.3 years ago by Josh Herr 5.8k

0

Entering edit mode

Hello Josh, do you have an idea what could be wrong with this analysis method?

ADD REPLY • link 11.3 years ago by disco ▴ 30

0

Entering edit mode

Can you clarify a bit? I'm not sure exactly what you mean about seeing "the transcripts from cufflinks in IGV" ... are you somehow loading a gtf (gff) generated by cufflinks in IGV? Or are you looking at the read alignments (the accepted_hits.bam) in IGV and something looks weird to you?

ADD REPLY • link 11.3 years ago by Steve Lianoglou 5.2k

0

Entering edit mode

I loaded the gtf file generated by cufflinks and viewing it in IGV.. I could post a screenshot if it would be helpful..

ADD REPLY • link 11.3 years ago by disco ▴ 30

0

Entering edit mode

This is what it looks like in IGV;

http://s2.postimage.org/fbhntvzmh/Screenshot_from_2012_12_21_16_21_55.png

ADD REPLY • link 11.3 years ago by disco ▴ 30

score 3 · Answer 1 · 2012-12-21

3

Entering edit mode

11.3 years ago

Michael 54k

That is most likely not a mistake. Most RNA-seq protocols are not strand specific. I would check with the sequencing lab, and until it is stated explicitly, assume that there is no valid strand information in the data.

ADD COMMENT • link 11.3 years ago by Michael 54k

0

Entering edit mode

Thanks for the response. All I'm doing now is to see differential expression of these two genes across different samples. So, this shouldn't be a problem, right?

ADD REPLY • link 11.3 years ago by disco ▴ 30

0

Entering edit mode

I would check the aligned reads for uniqueness, and to be on the safe side when drawing conclusions, discard reads which have multiple matches to the reference (on whatever strand) from the data for DE analysis.