Paired End Trimmomatic producing asymmetric paired read files.
1
1
Entering edit mode
5.5 years ago
Biogeek ▴ 400

Hey guys,

Got some 101 bp paired end illumina reads I recently got back and I'm in the middle of QC. I decided to use Trimmomatic over solexaqa which I normally use. Problem is, my paired 'cleaned' reads outputted from trimmomatic are uneven file sizes and reads. I have used the option with ILLUMINACLIP , keepbothreads ( ":TRUE"). I decided this was best as to my knowledge the Trinity de novo assembler needs paired end data to work best.

The trimmomatic specific code I run (as part of a loop):

java -jar directory/trimmomatic-0.35.jar PE -phred33 -basein $f1 -baseout "$f1"filtered.fastq ILLUMINACLIP: directory/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE HEADCLIP:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Trimmomatic runs as normal, and produces all files. Can I also ask a few ignorant questions. I am correct in assuming headclip removes reads from the 5'? Additionally, after the trimming has succeeded I still have a flagged high kmer content at the start of the 5' where hexamer primer bias usually exists. Any suggestions or just an artifact to ignore?

I am using TruSeq adaptors.I am assuming all sequencing facilities will now work with TruSeq3?

Thanks for helping.

trimmomatic RNA-Seq qc • 7.5k views
0
Entering edit mode

0
Entering edit mode

0
Entering edit mode
5.5 years ago

When you say PE, where is the R2 file in your command ? you gave only $f1 but not $f2. And also, you don't need to mention the path to adapters, if its standard TrueSeq, I guess.

What is TRUE here?

ILLUMINACLIP: directory/adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE
___________________________________________________________^^^^


​Trimmomatic instructions are pretty clear on their website:

java -jar trimmomatic-0.35.jar PE -phred33 \
input_forward.fq.gz input_reverse.fq.gz \
output_forward_paired.fq.gz output_forward_unpaired.fq.gz \
output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \

0
Entering edit mode

The manual says that if you give the $f1 or first read (fwd), it will automatically detect the second complementary rev read. as for :8:TRUE this is where the keepbothreads command comes in. I've read on other forums, by adding "TRUE", this will signify I want to keep both reads. Apologies if I'm confused, first time using the software. ADD REPLY 0 Entering edit mode Additionally, the reason I don't add $f2, is because it returns the error 'unknown trimmer'.

Is there anyway to write this code better so I can still do a loop, rather than processing multiple read pairs individually?

0
Entering edit mode

Okay. Can you just run using the standard command given in the manual on one of the samples and see if it works ? Take care of spaces as well.

ILLUMINACLIP:
_____________^

0
Entering edit mode

Goutham,

My script works fine. I changed a few settings. I read up and I am using the Truseq3-2-PE file for adaptors as Truseq2 refers to older GAII sequencing methods. Fastqc indicated no presence of adaptors anyway, but nevertheless, I went through with this step. Trailing and leading were at 3, but this is not logical, as this just gets rid of poor quality Illumina reads and N's. I've changed these to Q25 - this has improved the reads a lot. I felt Q20 was too lenient and Q30 was too harsh. I've headcropped the sequence by 16bp (final size 85 bp reads) to remove the primer hexamer bias. Sliding window is over 4 bases with an average of Q25. Minlength of sequences to be kept: 30 bp and over. The fastQC results are now looking a lot better. Thanks for the input.