I'm de novo assembling a transcriptome. I have RNA-seq data of treatment and control group with two time points. There are three replicates for each group. When doing the assembly, shall I pool all reads (from both control and treatment groups) to assemble or just use each replicate to do the assembly? Is it ok to pool them together and if assembling for each replicate what I shall do to make it comparable between different groups and differenet timepoints? Thank you.
The approach suggested by the Trinity assembler developers suggests combining all sample reads for the assembly step (possibly using digital normalization to speed the assembly up), then realigns sample reads back to the assembly for filtering and DE analysis (the later step using salmon, RSEM, or alternative tools). This is explicitly stated in the notes for this workflow.
In addition, I also recommend following up assembly with Transrate to assess assembly quality and filter low-quality assembly artifacts, and then transcriptome annotation (I'm biased towards tools like Trinotate though others like commercial tools like BLAST2GO); this helps identify additional elements like rRNA that you can disregard. You can also use this screening for contaminants, if that is a potential issue in your assembly, as BLAST is a typical step for annotation purposes.