I am doing single cell genomics. However, I am a newbies on bioinformatics.
My sample is single picoeukaryotes from the ocean. I did MDA followed by NGS using Hiseq 2500 PE 150 bp with Nextera libraries preparation.
Here, my questions is
1) In order to get better assemblies, which trimming tool do you recommend for trim sequences?
2) For single eukaryotes do novo assembly, I did not find suitable assemblers.
Could you recommend some for me? P.S. I tried several assemblers, like, IDBA_UD, SPADes, however, I can't get long contigs, N50 only around 1500bp.
3) Could you recommend some experts or research center who are professional on de novo assembley?
4) BTW, the genome size of my sample is around 20 Mbp to 40 Mbp according to reference paper.
All my PhD project is stuck here, I wish you could help me out.
Thank you very much!
For MDA-d single cells, I recommend BBDuk for trimming (both adapter and quality) and Spades for assembly. But, 20 to 40 Mbp might be big for Spades; we usually use it on bacteria. MDA'd single cell typically has very uneven coverage; depending on the degree of nonuniformity, it can be helpful to normalize the data (with for example BBNorm). In my testing this often improves Spades assemblies (particularly if the depth is very high), and always improves Velvet assemblies. Also, with 150bp reads, be sure that you are using long kmers - particularly, Spades defaults to a max of 55, which is too short when you have good coverage.
If your inserts are short enough to overlap, it may help to first merge the reads with BBMerge, then assemble. That will allow you to use a longer kmer, and will reduce the error rate in the reads.
Do related organisms seem to have a very high repeat content, or do your reads have extreme GC content? It could be that the bad assemblies are simply inherent to the organism rather than the methodology. Also, what are your coverage, insert-size, and quality distributions like?
Posting fastQC results may help.