Question: Differential Expression Analysis clarifications
gravatar for Robyn Edgar
4.1 years ago by
Robyn Edgar20
Robyn Edgar20 wrote:



I am working on a comparative transcriptomics project and have a few logistics questions.

I have 4 transcriptomes (paired end Illumina) that I have been working with since last summer.  I completed a combined reference assembly of these four transcriptomes (using ABySS, CD-Hit, Scaffolding and a few other steps).  I then generated read counts using RSEM and then completed differential expression analysis using Bioconductor's DESeq package.  

A few weeks ago, my lab recieved the 4kb draft genome for the species that I am working with.  It's not the finalized version, and there is still work being done to obtain a better quality genome.  My main question is, should I be re-running my DE analysis using the draft genome versus the combined transcriptomes?  I know that this is typically how DE analysis is done when you have a reference genome available.  My concern is that since it is still only at the draft genome stage, the combined reference transcriptomes might be more accurate to use still.

My second question is: Is there a way for me to determine which method is more accurate in this case?

My initial thought was to align the combined transcriptomes to the draft genome to get an idea of the coverage.  If the coverage was relatively high, then I would continue to use the DE analysis that I have already completed.  Does this make sense to do?  I have the output from GMAP and have been viewing it in IGV, but I'm not sure how to quantify the results of the alignment.


The last thing that I have a question about is annotation-related.  My DE analysis was completed on the transcripts, not on predicted genes.  My problem is that I'm not sure how to annotate my results from the DE analysis without using something like Blast2Go (which is soooo slow).  Any suggestions would be greatly appreciated!


Sorry for the abundance of questions!

Thanks for the help!

ADD COMMENTlink modified 4.1 years ago by Charles Warden5.0k • written 4.1 years ago by Robyn Edgar20
gravatar for Charles Warden
4.1 years ago by
Charles Warden5.0k
Duarte, CA
Charles Warden5.0k wrote:

There are kmer-based strategies (NIKS, RUFUS, etc.) for differential expression without a reference assembly.  I think the BLAST annotations will be harder than you might imagine (becuase often not a 1:1 match between assemblies).


This message tread (and the link provided in my comment) might be helpful:

Trinity/Rsem/Edger Pipeline...Now What?

ADD COMMENTlink written 4.1 years ago by Charles Warden5.0k

Thanks for the links!

ADD REPLYlink written 4.1 years ago by Robyn Edgar20
gravatar for Istvan Albert
4.1 years ago by
Istvan Albert ♦♦ 76k
University Park, USA
Istvan Albert ♦♦ 76k wrote:

These are difficult questions to answer because they lack specificity.

In general you need to recognize that the even evaluation metric that you are looking for is undefined. It is difficult to quantify what "better means" when you are using data that are also incomplete.  For example is it better to find two genes where each is 60% likely to exists or one gene that is 90% likely to express.

In these cases one is left with estimates and I recommend not to sweat it too much and  forge ahead with the choice that yields answers quicker. The most important part of any analysis is to get to a point of actionable information. With that you learn a lot about the data and the process.

Finally as I mentioned this site works the best when a post contains just one specific well defined and short question. Longer and multiple questions make it very difficult to contribute. 

ADD COMMENTlink written 4.1 years ago by Istvan Albert ♦♦ 76k

Thanks for your help!

ADD REPLYlink written 4.1 years ago by Robyn Edgar20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1566 users visited in the last hour