Question: Paired End Trimmomatic producing asymmetric paired read files.
1
gravatar for Biogeek
2.7 years ago by
Biogeek280
Biogeek280 wrote:

Hey guys,

 

Got some 101 bp paired end illumina reads I recently got back and I'm in the middle of QC. I decided to use Trimmomatic over solexaqa which I normally use. Problem is, my paired 'cleaned' reads outputted from trimmomatic are uneven file sizes and reads. I have used the option with ILLUMINACLIP , keepbothreads ( ":TRUE"). I decided this was best as to my knowledge the Trinity de novo assembler needs paired end data to work best.

 

The trimmomatic specific code I run (as part of a loop):

java -jar directory/trimmomatic-0.35.jar PE -phred33 -basein $f1 -baseout "$f1"filtered.fastq ILLUMINACLIP: directory/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE HEADCLIP:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Trimmomatic runs as normal, and produces all files. Can I also ask a few ignorant questions. I am correct in assuming headclip removes reads from the 5'? Additionally, after the trimming has succeeded I still have a flagged high kmer content at the start of the 5' where hexamer primer bias usually exists. Any suggestions or just an artifact to ignore?

I am using TruSeq adaptors.I am assuming all sequencing facilities will now work with TruSeq3?

 

Thanks for helping. 

 

 

rna-seq qc trimmomatic • 3.9k views
ADD COMMENTlink modified 2.7 years ago by geek_y8.7k • written 2.7 years ago by Biogeek280

@Biogeek could you please provide the reference for your usage of keepbothreads ( ":TRUE") and HEADCLIP:10? I cannot find these options on manual, at: http://www.usadellab.org/cms/?page=trimmomatic

ADD REPLYlink written 2.2 years ago by dovah30

I believe he meant HEADCROP instead of HEADCLIP.

ADD REPLYlink written 2.1 years ago by bounlu140
0
gravatar for geek_y
2.7 years ago by
geek_y8.7k
geek_y8.7k wrote:

When you say PE, where is the R2 file in your command ?  you gave only $f1 but not $f2. And also, you don't need to mention the path to adapters, if its standard TrueSeq, I guess. 

What is TRUE here ? 

ILLUMINACLIP: directory/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE

​Trimmomatic instructions are pretty clear on their website:

java -jar trimmomatic-0.35.jar PE -phred33 \
    input_forward.fq.gz input_reverse.fq.gz \
    output_forward_paired.fq.gz output_forward_unpaired.fq.gz \
    output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \
    ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36​
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by geek_y8.7k

The manual says that if you give the $f1 or first read (fwd), it will automatically detect the second complementary rev read.

 

as for :8:TRUE this is where the keepbothreads command comes in. I've read on other forums, by adding "TRUE" , this will signify I want to keep both reads. Apologies if I'm confused, first time using the software.

 

ADD REPLYlink written 2.7 years ago by Biogeek280

Additionally, the reason I don't add $f2, is because it returns the error 'unknown trimmer'.

 

Is there anyway to write this code better so I can still do a loop, rather than processing multiple read pairs individually?

 

ADD REPLYlink written 2.7 years ago by Biogeek280

Okay. Can you just run using the standard command given in the manual on one of the samples and see if it works ? Take care of spaces as well.

ILLUMINACLIP: 
ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by geek_y8.7k

Goutham,

My script works fine. I changed a few settings. I read up and I am using the Truseq3-2-PE file for adaptors as Truseq2 refers to older GAII sequencing methods. Fastqc indicated no presence of adaptors anyway, but nevertheless, I went through with this step. Trailing and leading were at 3, but this is not logical, as this just gets rid of poor quality Illumina reads and N's. I've changed these to Q25 - this has improved the reads a lot. I felt Q20 was too lenient and Q30 was too harsh. I've headcropped the sequence by 16bp (final size 85 bp reads) to remove the primer hexamer bias. Sliding window is over 4 bases with an average of Q25. Minlength of sequences to be kept: 30 bp and over. The fastQC results are now looking a lot better. Thanks for the input.

 

ADD REPLYlink written 2.7 years ago by Biogeek280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 945 users visited in the last hour