Question: Help with identifying novel transcripts
gravatar for martyflores
5.3 years ago by
United States
martyflores10 wrote:

I am trying to identify novel transcripts across two developmental stages. We'll call them dev1 and dev2. I have the RNA-seq for dev1 and dev2 which I performed the following:

Align via TopHat
Assemble via Cufflinks
Merged the two sets of transcripts via Cuffcompare using refFlat as a reference.
Determine differential expression via CuffDiff from CuffCompare (gtf) with dev1 and dev2.
Join my CuffDiff file with my CuffCompare transcript tracking file to be able to identify transcripts by their CufflinksID.

This has lead to a few questions:
A. Looking at the transcript differential expression testing on CuffDiff, an FPKM is given to a particular transcript in for both dev1 and dev2, even when the transcript does not appear in assembled transcripts for dev2. 
B. The FPKM for Cufflinks and Cuffdiff are different. I've seen other people with this question. But still, what's up with that?
C. Essentially the opposite of question A where, looking at my CuffCompare transcript tracking file, I'll have an identified transcript TCONS_xxx but it won't have any values in my cuffdiff file.

Any insight would be greatly appreciated, especially for question A which I have a sneaking suspicion would give insight to the rest of the questions.


rna-seq cuffdiff cufflinks • 2.3k views
ADD COMMENTlink modified 5.2 years ago by Reema Singh150 • written 5.3 years ago by martyflores10
gravatar for Reema Singh
5.2 years ago by
Reema Singh150
United Kingdom
Reema Singh150 wrote:

The novelty of a transcripts can be find out :-

1) Aligning it back to the genome and the available annotation (coding sequences). If your assembled transcript aligns well to the genome[length and identity], but not to the coding sequnece . Then there may be rough chances that this is the novel transcript.

2) So to get more evidence - Now once you are sure that its aligning very well with the genome and not with CDS then look for the read depth for that particular transcript as well as with the genome[align back your input reads to the genome].

3) Then do a similarity search in different database [ GenBank blast] and make sure that this transcript is not bacterial contig, plasmid or any other sequence.  In case you didn't get any hit then translate your sequence into protein[ may be the longest ORF] and go for protein search- if again there is no hit then go for psi-blast. And if this transcript is really a novel transcript then you will definitely get some hit to the closest species [ any one - Do correct me if I am wrong].

For rest of the questions i actually don't have a clear answer - Sorry .:P



ADD COMMENTlink written 5.2 years ago by Reema Singh150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2157 users visited in the last hour