Hi all!
I have been trying to get a good assembly of my fungal organism (I have paired end 250bp read data), here are the steps I follow:
- adapter removal
- quality trimming of reads based on base quality
- removal of contaminating bacteria (E.coli) reads
Even after this, I somehow still see some lower coverage bacteria contamination and therefore my final assembly size is larger than I would expect for this organism. I have pretty high coverage (1400X) so I decided to normalize using bbnorm (target=100, mindepth=6), however this resulted in a lot less reads, lower N50, and a even larger assembly size (see below, M=million)
Total reads(paired) Scaffold N50 Sum
Original 21M 50,501 5M
Normalized 2.9M 17,100 7M
I am wondering if the bbnorm target value I selected is perhaps too stringent? Or perhaps there is errors in the reads that is causing this contamination to still come through and I should look into error correction? Any suggestions will be very helpful, thank you!