Question: Trimmomatic output: how to include unpaired reads in the assembly?
gravatar for dark.lord
2.3 years ago by
dark.lord30 wrote:

Hi everyone,

First post here - this forum is being of immense help in my bioinformatics journey so far.

Brief explanation: I have Illumina MiSeq 2x300 bp reads metagenomes. I have used BBduk + Trimmomatic to remove adapters and to quality trim the sequences. I have four output files - forward paired, forward unpaired, reverse paired and reverse unpaired. I run FastQC on all of them and the quality of unpaired output is slightly worse than that of the paired output.

Input Read Pairs: 3163058 Both Surviving: 2631476 (83.19%) Forward Only Surviving: 363260 (11.48%) Reverse Only Surviving: 48940 (1.55%) Dropped: 119382 (3.77%)

Forward only surviving: % of forward reads that was high quality but couldn't be kept because the paired reverse read was low quality Reverse only surviving: % of reverse reads that was high quality but couldn't be kept because the paired forward read was low quality Dropped: sequences which have been dropped because BOTH forward and reverse were bad quality.

From what I understood, Trimmomatic drops both forward and reverse reads when one or both of the reads do not go through the quality threshold. The whole pair will be dropped and will end in the unpaired output.

First question: why is that? Even though the quality is not high enough, the sequences will BOTH be dropped and end in the UNPAIRED output, but they will still be paired, right? So why is it called unpaired?

Second question: should I include my unpaired output in the assembly process? In my opinion, it would add a lot of information. I am going to use Megahit, and I would run it in a way that includes both paired and unpaired sequences. Like this:

megahit  -1 1_R1_paired, 1_R1_unpaired -2 1_R2_paired, 1_R2_unpaired -o output/

Is it Megahit going to recognize the R1_unpaired and the R2_unpaired as still paired?

Sorry for any confusion I might have created, if necessary I will try to explain it better. Thanks in advance for your help.


ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by dark.lord30
gravatar for cschu181
2.3 years ago by
cschu1812.3k wrote:

If I remember correctly: if you apply a lengthfilter-step and one of the mates is too short after the trimmers/clippers are applied, then this read will be dropped (i.e. it will never end up in either output) and its mate will end up in either unpaired forward or unpaired reverse.

Second, if your assembler allows it, you might as well add the unpaired reads. They might potentially cover some extra sequence or just bump coverage over the edge. In my experience, it doesn't matter whether you include them or not, but that is anecdotal and personal preference. What you lose is the paired-information, so the unpaired reads cannot help in the contigging/scaffolding process.

Edit: grammar/clarity

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by cschu1812.3k


I got it now! I thought both reads were included in the unpaired output if one or both were not high-quality enough.

While the reality is that when one of the paired sequence is bad quality, it will be dropped; the other one will be kept but in the unpaired file. At this point, I am wondering: can I treat the unpaired output as a single read output, given that they have lost their paired sequence?

If this is the case, Megahit allows (-r) to list single end files and use them as input. Regarding whether I should use them or not, I think I will run the assembler twice and evaluate the output.

Thanks again, you have been extremely clear and helpful.

ADD REPLYlink written 2.3 years ago by dark.lord30

You're welcome. And yes, you can treat the unpaired as single.

ADD REPLYlink written 2.3 years ago by cschu1812.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1606 users visited in the last hour