Question: Trimmomatic usage issue
0
gravatar for banerjeeshayantan
14 months ago by
banerjeeshayantan110 wrote:

I have two fastq read files(Read_1.fq, Read_2.fq). I did a quality check and found out that both the reads are of very poor quality. Rest all the FASTQC outputs are fine. Hence I decided to use Trimmomatic. There are some parts of the tools that I don't understand.
If I want to retain only bases with minimum quality 28, what command should I use? I don't understand what LEADING/TRAILING means here. After trimming, will I get reads of the same length? Can someone please help?

sequencing next-gen • 667 views
ADD COMMENTlink modified 14 months ago by chen1.9k • written 14 months ago by banerjeeshayantan110
1

SLIDINGWINDOW:<windowSize>:<requiredQuality> you could do SLIDINGWINDOW:5:28 (something like this)

If no bases were trimmed from any reads then you would still have reads of the original length. Otherwise you will have reads of variable length depending on how many bases are eliminated from each read.

LEADING/TRAILING means cut bases off at the beginning and end of read respectively if they are below certain value.


All that said I suggest that you use bbduk.sh from BBMap suite instead. Easy to use and understand options. Help documentation here.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax68k

What do you intend to do downstream? You quality cut-off is really stringent, and depending on the analysis you want to perform, you may throw a lot of good data. Also, it is a good idea to post here the picture of the FastQC quality check.

ADD REPLYlink written 14 months ago by h.mon26k

Thanks for your reply. I intend to find variants using MUTECT2 and apply driver detection algorithms downstream. One thing that baffles me is that how can you consider poor quality bases as "good data"? Am I really losing out information by throwing out poor quality bases?
This is the FASTQC report image

download

ADD REPLYlink modified 14 months ago by genomax68k • written 14 months ago by banerjeeshayantan110

You do have a significant quantity of poor quality data (your success in calling variants may be limited by this). But Q28 seems to be a stringent cut-off as @h.mon stated. When you have a reference genome available Q15 or Q20 may be stringent enough cut-off.

ADD REPLYlink written 14 months ago by genomax68k
0
gravatar for chen
14 months ago by
chen1.9k
OpenGene
chen1.9k wrote:

You goal can be achieved by using sliding window pruning. You may try fastp with following command: fastp -i Read_1.fq, -I Read_2.fq -o Read_1.out.fq -O Read_2.out.fq -5 -3 -M 28

ADD COMMENTlink modified 14 months ago • written 14 months ago by chen1.9k

Thanks for your reply! WIll try that.

ADD REPLYlink written 14 months ago by banerjeeshayantan110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2047 users visited in the last hour