Good morning to everyone, I'm trying to figure out the best Trimmomatic filtering parameters to apply to assemble a reference transcriptome (obtained from different tissues from the same organism) with Trinity.
My Trinity assembling command for now looks like this:
nohup Trinity --seqType fq --max_memory 200G --left reads/R1_001.fastq.gz, [...] --right reads/R2_001.fastq.gz, [...] --CPU 90 --min_contig_length 300 --verbose --trimmomatic --quality_trimming_params "ILLUMINACLIP:Adapters_Overrepresented.fa:2:30:10 SLIDINGWINDOW:5:20 MINLEN:25 LEADING 25 TRAILING 25" --output trascrittoma_diriferimento_trinity >/dev/null 2>&1
I'm not sure whether the parameters are optimal in my case, according to some literature I read it's better not to apply strict quality filtering (even not performing it at all!) when it's about assembling transcriptomes...
I've used a file called Adapters_Overrepresented.fa which contains all the expected adapters for my libraries (Illumina TruSeq) and the overrepresented sequences found analyzing the FQ statistics files. (For instance, I've added a GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG, which in some samples represented up to 2% of the reads).
I've decided the rest mostly on the basis of forums and manual indications...
I'm looking at the single FQ statistics for my reads and for most R2 files the parameters don't appear to be strict enough (I'm attaching an example of the quality box-plot), but at the same time I'm afraid cutting too much of the sequence may result in an inaccurate transcriptome (which also should be used to discriminate among different isoforms among tissues in a second moment)...
Am I proceeding correctly? Any suggestions there?
How about the Trinity parameter min_contig_length 300? Is that fine?
Thanks in advance!