Question: Many 0 length reads after trimming
gravatar for marcon
3.7 years ago by
marcon10 wrote:


I'm using cutadapt to trim adapter sequences from a small rna-seq dataset. However I'm getting a lot of very small reads after trimming with around 35% being 0 length reads. I'm using the following command:

cutadapt --discard-untrimmed -O 7 --minimum-length=18 --maximum-length=40 -a AGATCGGAAGAGC file.fastq > trimmed_file.fastq

With this settings I'm losing almost half of my data after trimming because they are becoming too short (<18). Am I doing something wrong? It's possible to get sequences reads without inserts (only adapter sequenced)?

Maybe I'm using the wrong sequence adapter, but from what read on foruns the sequence 'AGATCGGAAGAGC' is able to trim all adapters from Illumina sequencing, or am I wrong?

Thanks in advance.

ps: I have tried other overlap settings (3, 5 and 6) and the results are the same.

fastqc rna-seq rna small trimming • 2.3k views
ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 3.7 years ago by marcon10

Try "Trim Galore!" with the default settings. If you get similar results then you did everything right (Trim Galore! is a wrapper around cutadapt).

BTW, you have fastQC as a tag, so if you have a lot of adapter contamination (likely the cause) then it'll show up there.

ADD REPLYlink written 3.7 years ago by Devon Ryan91k

I have tried Trim Galore! with default settings and the results are the same. Running fastQC shows that there are a lot of overrepresented sequences, including "TruSeq Adapter, Index 7" and "Illumina Multiplexing PCR Primer 2.01" after trimming these sequences are gone, but still there's a lot of overrepresented sequences.

ADD REPLYlink written 3.7 years ago by marcon10

"It's possible to get sequences reads without inserts (only adapter sequenced)?"

Yes, and it's more common with small RNA libraries b/c purification by size selection is less effective. Insert-positive and insert-negative clones are similar in size and therefore difficult to resolve.

Follow Devon Ryan's recommendation for FastQC to detect adapter contamination. If it's 35%, that will be clearly visible in the per-cycle base graph and also flagged as over-represented kmers.

ADD REPLYlink written 3.7 years ago by harold.smith.tarheel4.4k

When I trim adapters are found in 94% of the reads, which I supposed it's normal when dealing with small RNA-seq, since the insert size are smaller than read length. So I guess that's probably "empty" reads and the only thing I can do is discard then, right?

Also, after trimming I'm getting reads from different length most peaking at 18 and 33. I know that for miRNAs most range from 20-23 and others sRNAs peak around 35. Is it possible that those 18nt reads are miRNAs?

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by marcon10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2256 users visited in the last hour