Question

Trinity job design: one big trinity vs two sequential trinity jobs

1

Entering edit mode

9.6 years ago

niu2rseq ▴ 90

Hello everyone,

I have 44 pairs of fastq files which I want to assembly using Trinity. I want to use the trinity output fasta file to map the contigs back to each fastq files to get the expression profiles (using bowtie) so I can do the differential analysis. The reason I want to do such a trinity assembly is because we don't have a good reference genome.

However I am facing a question: assemble these 44 pairs of fastq files in one trinity job maybe really resource challenging. My cluster doesn't have enough space for the temporary files generated during this process. So I am wondering if there is any alternative approach I can do?

Can I assemble each pair first and then assemble the 44 trinity output fasta? Would these two be identical? Please let me know. Thank you very much!

assembly RNA-Seq trinity • 3.3k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by niu2rseq ▴ 90

2

Entering edit mode

Assembling each pair first will give you problems in downstream analysis with DE when you map your reads back with RSEM, which uses bowtie. Try using the in silico read normalization parameter. It should cut down on the number of reads you're using, by normalizing given a cutoff for sequencing depth, and since you have 44 samples, you have more than enough coverage. Do you have multiple treatments? Why so many samples?

You could also run trinity in steps. Stop after each step (there's parameters in the 'show full usage' tag), and remove interemediate files.

ADD REPLY • link 9.6 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Thank you! I understand that there would be a problem to map it back using RSEM if I did separate assembly for each pair since the contigs would be labeled with different IDs.

The reason I have so many samples is because they are from different body sites, different time points and different animals. So yes something similar to multiple treatments.

I am not sure how to use silico parameter. Could you please provide more info or a protocol so I can do some deep research. Thank you very much!

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by niu2rseq ▴ 90

1

Entering edit mode

In your trinity directory, type ./Trinity --show_full_usage_info. It will give you a full set of parameters. Are you interested in DE genes between different animals? Or different body sites/time points within the different animals separately. If the former, I would combine all your read1's, and combine all your read 2's, and run Trinity, using the in silico parameter. Simply provide the flag, and it will run and reduce your total number of reads. If the latter, combine all the read 1's and read2 's for each animal, and then run Trinity/RSEM/DE analysis on each assembly individually. In the end you'll get a heatmap per animal, given the different body sites/time points.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by st.ph.n ★ 2.7k

Ram · Answer 1 · 2014-09-04

1

Entering edit mode

9.6 years ago

Biojl ★ 1.7k

Hi,

For the second part (estimate expression) you will have a bottleneck since you have a huge amount of reads and RSEM has a very poor scalability.

I would recommend giving a try to eXpress.

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Biojl ★ 1.7k

1

Entering edit mode

With RSEM, if I have multiple treatments, and many more samples per treatment, I combine the read1's and read2's by treatment, and map to the assembly, and do DE that way. It will give you a clearer picture in the heatmap, and make comparisons easier later when determining up/down transcripts. Also, if you're not doing any pre-assembly quality control, I recommend trimming all your PE reads with trimmomatic to trim barcodes and low quality bases. Trinity provides a parameter. If you're unsure of the quality of your reads, put them through FASTQC if you're sequencing facility didn't already. Often, they have their own QC before giving you your data.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by st.ph.n ★ 2.7k