Question: Why does read quality drop after adapter trimming with cutadapt?
0
gravatar for Nabeel Ahmed
12 months ago by
Nabeel Ahmed10
United States
Nabeel Ahmed10 wrote:

I am using cutadapt (v1.14) to trim adapter from a published Ribosome profiling dataset (short single-end reads of 51 nt). When I run FastQC on the raw data, I see that the read quality is pretty good at the 3' end with the entire box plot of quality > 30. However when I trim the adapter and run FastQC on the processed data, I find that the Quality drops at the 3' end. I am unable to understand why there will be a drop in quality after adapter trimming when the original reads were of high quality. Would appreciate if someone could throw some light on this.

The adapter trimming command is as follows

cutadapt  -a CTGTAGGCACCATCAATATCTCGTATGC -q 20 -m 20 -M 45 -O 6 -o SRR1562913_trimmed.fastq SRR1562913_1.fastq

FastQC on the raw data Raw dataset raw data

FastQC on the processed data Processed data after adapter trimming

ADD COMMENTlink modified 12 months ago by genomax64k • written 12 months ago by Nabeel Ahmed10
2
gravatar for Buffo
12 months ago by
Buffo1.4k
Buffo1.4k wrote:

Because adapters has a good quality consensus sequence, if you remove it, mean quality drops (even the length) for your experimental reads.

ADD COMMENTlink written 12 months ago by Buffo1.4k

But shouldn't the lower quality seen for position 33-43 in the processed data be visible for these positions in the raw data? Of course their numbers would be small so that mean quality is higher, but even the lower bounds of the box plot is > 30 in the raw data

ADD REPLYlink written 12 months ago by Nabeel Ahmed10

The lower bound of the box plot (lower whisker) is not the minimal observable value. The exact definition varies. It may represent the tenth percentile for example. If the raw plot of adapter contamination shows high values (say above 50%), the processed box plot may possibly be showing the raw outliers.

ADD REPLYlink written 12 months ago by jomo018470

Thanks. I think this explains it. The lower bound is the 10th percentile according to FastQC documentation. The bad quality reads must be in the lowest 10th percentile and hence do not show up in the raw data plot

ADD REPLYlink written 12 months ago by Nabeel Ahmed10

The adapter do not necessarily have to occur at the very end of your reads, so I think some of them might occur in the 33-43 range boosting the quality score in the region (prior to trimming) as well

ADD REPLYlink written 12 months ago by lieven.sterck4.2k

So this implies that after adapter trimming you always need to do another round of quality trimming then? and that the order is important: first adapter then quality?

ADD REPLYlink written 12 months ago by lieven.sterck4.2k
1

Every dataset is different. Even in this case most of data is still above Q20 so as long as there is a reference genome available to align against, no quality trimming should be needed.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax64k

true. especially if you assume that the aligner will do soft clipping/trimming of the data (which most do I think)

I was however thinking in the case of assembly (which is obviously not the case in the question asked here).

ADD REPLYlink written 12 months ago by lieven.sterck4.2k

For any de novo work it would be appropriate (perhaps required) to quality trim the data at Q20 (or stricter).

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax64k

my thought exactly. However I'm a little nervous about the order of trimming which apparently has (severe) impact on the result. and OK, normally you would probably first get rid of the adapters and then do Q-trimming.

I'm a bit rusty on the cutadapt syntax :/ but is the cmdline given in this post not also doing Q-trimming as well ( -q 20 )? If so, I'm concerned that other tools that do both adapter removal and Q-trim combined might also not apply the "correct" order

ADD REPLYlink written 12 months ago by lieven.sterck4.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1224 users visited in the last hour