Trimming fastq with quality
5
3
Entering edit mode
7.2 years ago
RafaelMP ▴ 120

Hello everyone!

I'm a newbie in analyzes with RNA-Seq.
I have paired-end reads data (Sanger / Illumina 1.9) and would like to cut the reads using quality score.
We have visualized data with FastQC and we tested the softwares: sickle, Spade, fastq quality filter (fastx) and AllPaths.
The best result was obtained with the fastq quality filter, but we lost a lot of reads in the process (it does not cut off part of the reads, only excludes the entire read).

I wonder if there is already a program that recognizes the first occurrence of a symbol (eg '#') and cuts the sequence and the quality from that point.

RNA-Seq fastx fastq trim FastQC • 17k views
1
Entering edit mode

Did you use fastq-Trimmer instead? I think you can also try running Prinseq using -trim_qual_left, -trim_qual_right, with given quality threshold. In your case, you mentioned "#". Quickly looking at the Encoding chart in fastq wiki, #'s score is 2 (assuming 1.8+ and 1.9+ are same). So try using threshold 2, which should remove the regions before and after the threhold value.

5
Entering edit mode
7.2 years ago

When a major part of the read has low quality bases then trimming reduces the length of the read and now the read can't be aligned with higher confidence against the reference genome. So almost all the trimming softwares will discard reads whose length has been reduced to less than some number (say n=30). If most of your reads suffer from this problem than all the trimming tools will behave in the same way i.e. discarding majority of the reads. Although this is not a solution but I would suggest you trying Trimmomatic and see if it helps (http://www.usadellab.org/cms/?page=trimmomatic

1
Entering edit mode
7.2 years ago
Ming Tang ★ 2.7k

If you are doing de novo transcriptome assembly, trim the fastq files at phred score 5, not 20

0
Entering edit mode
7.2 years ago

seqtk is the fastest algorithm available and easy to use/install (can process huge FASTQs in seconds to minutes). It will trim reads from both ends based on phred score and leave the good parts intact. You can set whatever phred quality threshold you want with the "-q" option.

You can get it here https://github.com/lh3/seqtk/. Just 'make' it and you're good to go.

0
Entering edit mode

How to trim reads less than phred score 20 using seqtk? What does the default of 0.05 mean in terms of phred score? (-q FLOAT error rate threshold (disabled by -b/-e) [0.05])

0
Entering edit mode
2.6 years ago

We use trimmomatic for this it allows you to trim reads below a certain quality from both the 3' and 5' end, and also trim using the average quality within a window.

0
Entering edit mode
2.6 years ago
GenoMax 107k

Since an old thread got activated. I will add that bbduk.sh from BBMap suite can also be used to do quality based trimming in addition to a host of other things. A guide is available here.

qtrim=f             Trim read ends to remove bases with quality below trimq.
Performed AFTER looking for kmers.  Values:
rl (trim both ends),
f (neither end),
r (right end only),
l (left end only),
w (sliding window).
trimq=6             Regions with average quality BELOW this will be trimmed,
if qtrim is set to something other than f.  Can be a
floating-point number like 7.3.
minavgquality=0     (maq) Reads with average quality (after trimming) below
maqb=0              If positive, calculate maq from this many initial bases.
minbasequality=0    (mbq) Reads with any base below this quality (after