FASTQC after trimming: how do I know if it is good enough to proceed to alignment?
0
1
Entering edit mode
3.3 years ago
whb ▴ 60

Hi,

I have some Novaseq DNA sequencing data (100bp PE). I ran FASTQC on the raw FASTQ files. Then trimmed the adapters with trimmomatics and then used fastp to remove poly-G overrepresented sequence. After that, I ran FASTQC again. My first question is if this is a good trimming workflow or should I just stick to one trimmer?

Trimmomatic parameters are:

...ILLUMINACLIP:${adaptors}:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

and I followed the simple usage for fastp:

fastp -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz

Q2) FASTQC showed that all of my R1 length are 10-100 while R3 is 100. is that normal? Also all R1 fails the per base sequence content in FASTQC where A% dropped at the end and trimming did not remove this...probably because the scores are good >= 35? Do I need to fix this? if so, what should I do?

R1 good score at base 96-100 but A% drop. R1 also failed per base sequence content and Kmer content: Screen-Shot-2020-12-29-at-8-00-44-pm Screen-Shot-2020-12-29-at-8-01-10-pm Screen-Shot-2020-12-29-at-8-18-30-pm
danish o

Q3) There are still ~2% (down from ~6 - 9%) of adaptors found in some samples after trimming. Do I need to remove them entirely even though they passed FASTQC? Are the parameters I used in Trimmomatic not strict enough?

Screen-Shot-2020-12-29-at-8-02-54-pm
danish o

FASTQC Trimming per base sequence content • 2.8k views
ADD COMMENT
3
Entering edit mode

This is all fine, you can proceed with your downstream processing. Don't bother yourself with these lowlevel metrics too much. You have basically no adapters and good base quality, that is mainly all that matters.

ADD REPLY
1
Entering edit mode

Just a side note question, do you (@ATpoint) know if the bias close to the end (3' end), in the Per base sequence content, is related with the library or sequencing technology?

I'm been observing this quite often. I'm just curious about. I know that 5 prime bias in RNA-seq is common and related to sequencing library, but the 3 prime bias I did not find any good documentation/blog explaining that. From the experiments that I'm working on I preferred to remove them.

ADD REPLY

Login before adding your answer.

Traffic: 2439 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6