Question: trinity metatranscriptome assembly without reference genome
gravatar for steph_tf
2.1 years ago by
steph_tf10 wrote:

Hi, I need advice on the processing steps for my project. It would be really nice to get some feedback on this.

I want to perform de novo assembly from metatranscriptome samples, since there are different microorganisms inside, I want to use the big assembly as a reference to map then the individual samples (replicates under 2 different conditions) and count the transcripts for testing differential expression. The problem that I am struggling with is: for the big assembly I will have around or more than 200 million reads (PE), depending on how I will process the sequences (I could have 2 big assemblys, each one for the different conditions, and in this case will be less data; or I could maybe get a better "reference assembly" by using all data together) . So I don't now if it will be possible to perform this without problem on the requested resources using Trinity in a HPC cluster, until now I have only used 40 million as a maximum for assembly and its really difficult to keep running a job for so long. Maybe you could give me some advice on how could I improve the data (pre)processing steps?

Another thing that I'm not sure about is: as I dont have a reference genome and I'm not expecting to have a big percentage of further annotation, It would be better for me to use merged PE (longer reads) that its about 20-30% of my sequences, but in this case I would loose the rest of the information from the unpaired reads AND I should treat my sequences in single mode with Trinity... Is there a way that I could combine my merged data with the unmerged and include everything in my analysis? without having to treat everything as single end data? Thanks in advance!

ADD COMMENTlink modified 13 months ago by zhuofei.xu10 • written 2.1 years ago by steph_tf10
gravatar for zhuofei.xu
13 months ago by
zhuofei.xu10 wrote:

I don't think assembly of metatranscriptome short reads is worth to be done. It's also not meaningful and too biased if the subsequent analysis is based on the assembled and partial transcript sequences. Metagenome sequencing on the same sample should be good to produce a high-quality reference gene sets and then could be used as a reference gene database for metatranscriptomic analysis. Anyone else supports my opinion?

ADD COMMENTlink written 13 months ago by zhuofei.xu10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1704 users visited in the last hour