bbduk - trimming ITS primer sequence
0
0
Entering edit mode
3.6 years ago
AfinaM ▴ 30

Hi everyone,

I am currently trying to analyse my ITS1 samples. AFAIK, ITS works differently as their region length is variable. Therefore, when I try to trim off the primer using bbduk, I use the parameter below:

bbduk.sh in1=MSA_S21_L001_R1_001.fastq.gz in2=MSA_S21_L001_R2_001.fastq.gz \
        out1=MSA_S21_L001_R1.fastq out2=MSA_S21_L001_R2.fastq \
        ktrim=l k=22 mink=20 hdist=1 copyundefined=t ordered=t rcomp=t \
        literal="GGAAGTAAAAGTCGTAACAAGG,GCTGCGTTCTTCATCGATGC" tpe tbo

I then use DADA2 and noticed that most of the input is filtered out and I think this is due to the primer trimming.

enter image description here

Has anyone else did the same thing as me?

bbduk ITS • 1.3k views
ADD COMMENT
0
Entering edit mode

What is the length of the sequences? What filterAndTrim() parameters are you using? Input is the number of reads after trimming the primers, correct? The filtering has nothing to do with bbduk trimming primers, your parameters are probably too stringent, or the reads are of bad quality.

ADD REPLY
0
Entering edit mode

I set 0 for my trim and trunc parameter (I am using QIIME2 btw) to get all the data in for DADA2. Yes, input is the number of of reads after primer trimming. I am worried that because it was trimmed, DADA2 is recognizing some of the reads as bad quality. I tried again with max_ee_r = 6 and it the number of filtered went up to 85811.

Because it seems like there are too many reads filtered out, therefore I am trying to get as much reads into DADA2 as possible. Or is it normal for an ITS data? Even with the rcomp=t parameter in bbduk, I don't see any difference.

ADD REPLY
0
Entering edit mode

It seems like your data is moderately bad (or moderately good, if you are optimist). You have to evaluate the quality of the reads to set optimal maxEE, truncLen, etc parameters. Did you examine the DADA2 quality profile plots, or FastQC quality plots? They will help you decide on the best parameters.

ADD REPLY
0
Entering edit mode

Can I ask how do you determine that the data is 'moderately bad'? I did run FASTQC for both before and after trimming and it looks okay to me.

ADD REPLY
0
Entering edit mode

It is just a guess, based on the fact you are discarding ~43% of the reads at the filterAndTrim() step - my experience is, for good datasets, one discards ~5-20% of the reads. But it may not be related to quality, instead be related to the truncLen parameter.

ADD REPLY

Login before adding your answer.

Traffic: 1932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6