Question: illumina quality trimming - FASTQC
gravatar for sumudu_rangika
4.1 years ago by
sumudu_rangika30 wrote:


I have few things to clarify.

1) I have a illumuna MiSeq dataset for a parasite genome. Machine itself gave paired-end reads as two separate datasets. one forward(R1) and other reverse(R2). When using FASTQC tool for one set e.g. filtering reads <70bp in R1 dataset, should we consider R1 as paired-end or no?

2) During quality trimming, I tried to adjust the sliding window size and see how per base quality improves. Increasing the sliding window size resulted in more aggressive trimming. I have selected sliding window 5, step 2 and quality<20 and filtered reads less than 70bp.

Appreciate any advice on this.


next-gen • 4.7k views
ADD COMMENTlink modified 4.1 years ago by agata88800 • written 4.1 years ago by sumudu_rangika30

Never filter or trim paired reads with a tool that does not support paired reads, or you may get broken pair ordering. And it sounds to me like your trimming to Q20 is too aggressive for most purposes; it increases bias. I recommend use BBDuk for quality operations like filtering and trimming, but whether or not it's advisable to do them at all depends on what you're doing with the data. What's your experiment?

ADD REPLYlink written 4.1 years ago by Brian Bushnell17k

Its a whole genome sequencing of 8 clinical isolates. I have two separate sets for each, forward set and reverse set.

So do you think quality trimming of forward set and reverse set separately using a tool like FASTQ quality trimmer is not a good option?

ADD REPLYlink written 4.1 years ago by sumudu_rangika30

It's not! You will have different number of reads in R1 file and R2 file, which will result in problems with processing eg. mapping. You need to trim "connected" reads from R1 and R2 file.

ADD REPLYlink written 4.1 years ago by agata88800

Assuming you mean the trimmer from FASTX-Toolkit, no that should not be used.

ADD REPLYlink written 4.1 years ago by Devon Ryan97k

I'm using the local instance of galaxy and FASTQC quality trimmer.

ADD REPLYlink written 4.1 years ago by sumudu_rangika30

Do you mean you use FASTX-toolkit? Coz FASTQC produces just the report. The report you showed in your previous post was good. Not sure why trimming or filtering (especially sliding window) is necessary. R1 and R2 are paired end reads. Generally FASTQC profile of paired end reads is similar.

ADD REPLYlink written 4.1 years ago by Satyajeet Khare1.6k
gravatar for agata88
4.1 years ago by
agata88800 wrote:

I would suggest to use Trimmomatic for PE reads.

For DNAseq I am using SLIDINGWINDOW:4:30 and MINLEN=30. But it is up to you what quality and read length you set.

In case of FastQC - this tool checks for statistics for one sample eg. R1 or R2. Both files have the same number of reads before and should have after trimming.

So, check the data R1 and R2 separately by FastQC before trimming and after trimming and you'll see how much your data changed after quality cut.



ADD COMMENTlink written 4.1 years ago by agata88800

Trimmomatic generate 4 sets of output. If I use it should I consider only two paired sets and ignore two unpaired datasets?

ADD REPLYlink written 4.1 years ago by sumudu_rangika30

Your desirable output is in paired sets and that is one you should use for further analysis. The unpaired datasets include all trimmed reads that don't have a pair (because of trimming). It is important to have 4 sets in case of performing too much aggressive trimming which will results in large number of unpaired reads.

ADD REPLYlink written 4.1 years ago by agata88800
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1484 users visited in the last hour