Trimmomatic parameters for Trinity transcriptome assembling
0
0
Entering edit mode
2.5 years ago
Raito92 ▴ 60

Good morning to everyone, I'm trying to figure out the best Trimmomatic filtering parameters to apply to assemble a reference transcriptome (obtained from different tissues from the same organism) with Trinity.

My Trinity assembling command for now looks like this:

nohup Trinity --seqType fq --max_memory 200G --left reads/R1_001.fastq.gz, [...] --right reads/R2_001.fastq.gz, [...] --CPU 90 --min_contig_length 300 --verbose --trimmomatic --quality_trimming_params "ILLUMINACLIP:Adapters_Overrepresented.fa:2:30:10 SLIDINGWINDOW:5:20 MINLEN:25 LEADING 25 TRAILING 25" --output trascrittoma_diriferimento_trinity >/dev/null 2>&1


I'm not sure whether the parameters are optimal in my case, according to some literature I read it's better not to apply strict quality filtering (even not performing it at all!) when it's about assembling transcriptomes...

I've used a file called Adapters_Overrepresented.fa which contains all the expected adapters for my libraries (Illumina TruSeq) and the overrepresented sequences found analyzing the FQ statistics files. (For instance, I've added a GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG, which in some samples represented up to 2% of the reads).

I've decided the rest mostly on the basis of forums and manual indications...

I'm looking at the single FQ statistics for my reads and for most R2 files the parameters don't appear to be strict enough (I'm attaching an example of the quality box-plot), but at the same time I'm afraid cutting too much of the sequence may result in an inaccurate transcriptome (which also should be used to discriminate among different isoforms among tissues in a second moment)...

Am I proceeding correctly? Any suggestions there?

How about the Trinity parameter min_contig_length 300? Is that fine?

trimmomatic trinity Assembly transcriptome • 1.3k views
1
Entering edit mode

I would suggest to run the trimmomatic step separate from the trinity step as this will give you more control and insight what is going on.

Otherwise I see no big issue with the trimmomatic param setttings, though I wonder why you want to default trim of 25 bases from the begin and end of your reads?

Is the attached plot a pre- or post-filtering version? if post filtering then I suspect something went wrong as those q-values should not be present anymore

0
Entering edit mode
I would suggest to run the trimmomatic step separate from the trinity step as this will give you more control and insight what is going on.


Good suggestion, even though Trimmomatic is launched before Trinity assembling starts and it even produces .gz compressed reads you can use separately...

though I wonder why you want to default trim of 25 bases from the begin and end of your reads?


Wasn't that 25 about quality threshold and not length, according to what I have read here, or I'm getting something wrong?

Is the attached plot a pre- or post-filtering version? if post filtering then I suspect something went wrong as those q-values should not be present anymore


The attached plot is just an example of one of my libraries (a R2 one).

P.S. I tried to run Trimmomatic and with the shown parameters it said:

Input Read Pairs: 30479762 Both Surviving: 24211503 (79.43%) Forward Only Surviving: 5765950 (18.92%) Reverse Only Surviving: 261744 (0.86%) Dropped: 240565 (0.79%)

1
Entering edit mode

Wasn't that 25 about quality threshold and not length, according to what I have read here, or I'm getting something wrong?

my bad, it's the correct parameter as you intended indeed. though: I would rather go for the slidingwindow one (in stead of leading or trailing) and then there is a syntax error as it should read LEADING:25 TRAILING:25 (no spaces, with colon)

that is the expected output of trimmomatic indeed

0
Entering edit mode

and then there is a syntax error as it should read LEADING:25 TRAILING:25 (no spaces, with colon)

Thanks for pointing this out, you're right (and Trimmomatic noticed me after a while by showing a NumberFormatException in the log...)

I would rather go for the slidingwindow one (in stead of leading or trailing)

That's what I read a lot online too and one of the things I was indecisive about... but I also noticed sometimes they are used together... even though I'm not sure what difference could make... I guess I'll try both and compare the number of reads trimmed and the overall transcriptome quality.

1
Entering edit mode

but I also noticed sometimes they are used together... even though I'm not sure what difference could make.

the main and crucial difference is that with LEADING and TRAILING you work on a per base level (== if one base drops below the values you set it to it will be removed) with the slidingwindow you will need an average scroe in the window below a certain value before it is removed. the leading/trailing work from the begin/end inwards, the sliding window works from the left towards the right (as soon as the window score drops below the set value, all bases towards the right side are being removed)

0
Entering edit mode

I see! So I guess SLIDINGWINDOW is a more accurate approach! Thank you!

0
Entering edit mode

unless for very specific goals, slidingwindow is the more sensible (== most commonly used) approach indeed