Hello,
I'm analysing RNA seq data from the ENCODE CSHL long RNA seq to see differential expression between two genes sharing a chromosomal locus. I am really not familiar with bio-informatics at all, a wet bench researcher through & through. Somehow, I managed to get on with a linux platform and started with a single sample to analyse with cufflinks, and further aligned it to the reference genome using IGV. What I see is that the transcripts from cufflinks for the two genes are on the same strand in IGV, as opposed to the reality wherein they are in different strands, going away from each other. I'm pretty convinced that its a technical mistake, pertaining to the fact that I'm not suave with these informatic analyses. But if anybody could please point out how it is done properly or what could possibly have gone wrong, I would be really grateful.
Many thanks, Vaish
I know you've probably thought of this, but I'd suggest finding a local resource to go through this with you. There are MANY details in an analysis that you will want to learn, I'm sure, and having someone you can run ideas past can be the most effective way to do that.
Yeah, I tried my best but couldn't find anyone who would sit and go through the whole thing, some people were kind enough to suggest and direct me, and we don't really have a bio-informatician in my group.
Also see older question: Transcript Specific Expression Data
Hello Josh, do you have an idea what could be wrong with this analysis method?
Can you clarify a bit? I'm not sure exactly what you mean about seeing "the transcripts from cufflinks in IGV" ... are you somehow loading a gtf (gff) generated by cufflinks in IGV? Or are you looking at the read alignments (the
accepted_hits.bam
) in IGV and something looks weird to you?I loaded the gtf file generated by cufflinks and viewing it in IGV.. I could post a screenshot if it would be helpful..
This is what it looks like in IGV;
http://s2.postimage.org/fbhntvzmh/Screenshot_from_2012_12_21_16_21_55.png