Question: bbduk - trimming ITS primer sequence
0
gravatar for AfinaM
5 months ago by
AfinaM0
AfinaM0 wrote:

Hi everyone,

I am currently trying to analyse my ITS1 samples. AFAIK, ITS works differently as their region length is variable. Therefore, when I try to trim off the primer using bbduk, I use the parameter below:

bbduk.sh in1=MSA_S21_L001_R1_001.fastq.gz in2=MSA_S21_L001_R2_001.fastq.gz \
        out1=MSA_S21_L001_R1.fastq out2=MSA_S21_L001_R2.fastq \
        ktrim=l k=22 mink=20 hdist=1 copyundefined=t ordered=t rcomp=t \
        literal="GGAAGTAAAAGTCGTAACAAGG,GCTGCGTTCTTCATCGATGC" tpe tbo

I then use DADA2 and noticed that most of the input is filtered out and I think this is due to the primer trimming.

enter image description here

Has anyone else did the same thing as me?

its bbduk • 294 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by AfinaM0

What is the length of the sequences? What filterAndTrim() parameters are you using? Input is the number of reads after trimming the primers, correct? The filtering has nothing to do with bbduk trimming primers, your parameters are probably too stringent, or the reads are of bad quality.

ADD REPLYlink written 5 months ago by h.mon32k

I set 0 for my trim and trunc parameter (I am using QIIME2 btw) to get all the data in for DADA2. Yes, input is the number of of reads after primer trimming. I am worried that because it was trimmed, DADA2 is recognizing some of the reads as bad quality. I tried again with max_ee_r = 6 and it the number of filtered went up to 85811.

Because it seems like there are too many reads filtered out, therefore I am trying to get as much reads into DADA2 as possible. Or is it normal for an ITS data? Even with the rcomp=t parameter in bbduk, I don't see any difference.

ADD REPLYlink modified 5 months ago • written 5 months ago by AfinaM0

It seems like your data is moderately bad (or moderately good, if you are optimist). You have to evaluate the quality of the reads to set optimal maxEE, truncLen, etc parameters. Did you examine the DADA2 quality profile plots, or FastQC quality plots? They will help you decide on the best parameters.

ADD REPLYlink written 5 months ago by h.mon32k

Can I ask how do you determine that the data is 'moderately bad'? I did run FASTQC for both before and after trimming and it looks okay to me.

ADD REPLYlink written 5 months ago by AfinaM0

It is just a guess, based on the fact you are discarding ~43% of the reads at the filterAndTrim() step - my experience is, for good datasets, one discards ~5-20% of the reads. But it may not be related to quality, instead be related to the truncLen parameter.

ADD REPLYlink written 5 months ago by h.mon32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1569 users visited in the last hour
_