Excuse me for this long post:
I am performing a de novo genome assembly using Illumina paired-end short reads. At present, I am in the stage of trimming the adapters. Here, you can have a look at the basic statistics and information on the adapter content obtained from the Fast QC report, for R1.
The basic statistics of raw reads given
The adapter content of the raw reads given
I used Trimmomatic for trimming the adapter. The following is the Trimmomatic Settings
Below, you can see the basic statistics and adapter content of the Trimmed reads.
Here, the output was:
Both surviving: 566832403 Forward only surviving: 39244376 Reverse only surviving: 0.00 Dropped reads: <1%
Now following are my questions:
Can I go ahead with the assembly process, because there is zero adapter presence in the reads? Should I mind the loss of reads?
I see that there are over-represented sequences, both in read 1 and read 2. I doubt if I can leave them be, or if I should trim them too. Can these over-represented sequences be trimmed using Trimmomatic? Can you provide me with suggestions on this?
The following are the over-represented sequences for R1
The following are the over-represented sequences for R2