First post here - this forum is being of immense help in my bioinformatics journey so far.
Brief explanation: I have Illumina MiSeq 2x300 bp reads metagenomes. I have used BBduk + Trimmomatic to remove adapters and to quality trim the sequences. I have four output files - forward paired, forward unpaired, reverse paired and reverse unpaired. I run FastQC on all of them and the quality of unpaired output is slightly worse than that of the paired output.
Input Read Pairs: 3163058 Both Surviving: 2631476 (83.19%) Forward Only Surviving: 363260 (11.48%) Reverse Only Surviving: 48940 (1.55%) Dropped: 119382 (3.77%)
Forward only surviving: % of forward reads that was high quality but couldn't be kept because the paired reverse read was low quality Reverse only surviving: % of reverse reads that was high quality but couldn't be kept because the paired forward read was low quality Dropped: sequences which have been dropped because BOTH forward and reverse were bad quality.
From what I understood, Trimmomatic drops both forward and reverse reads when one or both of the reads do not go through the quality threshold. The whole pair will be dropped and will end in the unpaired output.
First question: why is that? Even though the quality is not high enough, the sequences will BOTH be dropped and end in the UNPAIRED output, but they will still be paired, right? So why is it called unpaired?
Second question: should I include my unpaired output in the assembly process? In my opinion, it would add a lot of information. I am going to use Megahit, and I would run it in a way that includes both paired and unpaired sequences. Like this:
megahit -1 1_R1_paired, 1_R1_unpaired -2 1_R2_paired, 1_R2_unpaired -o output/
Is it Megahit going to recognize the R1_unpaired and the R2_unpaired as still paired?
Sorry for any confusion I might have created, if necessary I will try to explain it better. Thanks in advance for your help.