Question: Sequence length distribution after adapter trimming
1
gravatar for lamteva.vera
3.1 years ago by
lamteva.vera200
Ukraine, Kyiv
lamteva.vera200 wrote:

I work with TruSeq Custom Amplicon 1.5 data.

I have trimmed adapters using bbduk with parameters recommended in the manual: bbduk.sh -Xmx1g in1=forward.fastq.gz in2=reverse.fastq.gz out1=forward1.clean.fq out2=forward.clean.fq ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo.

I've run FastQC after this and adapter contamination seems to be gone, but now I'm disturbed by the Sequence length distribution. I've noticed in Basic statistics that Sequence length is now 10-251, while before trimming it was uniform and equal to 251.

I've run awk '{if(NR%4==2) print length($0)}' FILE.fastq | sort -n | uniq -c > read_length.txt for fastq files before and after trimming.

  • Before: 930813 251

  • After: 4 10 ... (the long list of numbers counting truncated reads)... 917278 251

Should I be worrying? What to do with truncated reads? Is BWA-MEM aware of such reads? These are going to be discarded due to the low MAPQ, right?

Thank you!

fastqc bbduk adapter trimming • 1.8k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by lamteva.vera200

Extremely short reads are not going to be very useful since they are more likely to multimap. You could have filtered such reads out using minlen= option when you ran bbduk.sh. Since you are using BBMap already why not stay with the same package and use bbmap.sh the aligner to do the alignments. It can handle reads of varying lengths.

ADD REPLYlink written 3.1 years ago by genomax91k

Thank you, genomax. What are the factors to consider when setting the threshold for minlen or mlf?

ADD REPLYlink written 3.1 years ago by lamteva.vera200
0
gravatar for plat
3.1 years ago by
plat50
Barcelona
plat50 wrote:

I am not used to bbduk but I think if you cut the adaptors the length should not be the same. Some reads would be cut by for instance, 5 nucleotides, others 10, and others not cut at all. This is why the length distribution should not be the same.

ADD COMMENTlink written 3.1 years ago by plat50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1003 users visited in the last hour