Entering edit mode
4.3 years ago
doron • 0
I have performed de novo assembly for several metagenomic samples (using MegaHIT). The overall statistics (e.g. N50) seem OK. I am now interested in mapping the original reads to the contigs (to estimate coverage, which I need for binning).
However, it is not entirely clear what parameters should I use for the mapping, in terms of number of allowed indels, mismatches, longest indel etc. Are there any known practices? I was not able to find any.
Note: Olson et. al discuss the various sources of assembly errors, but I wasn't able to make it useful.
I think the best thing would be to just use default parameters in mapping tool such as
bowtie2and then look at the alignment stats. You can use tools such as weeSAM that can produce alignment stats.
Thanks! However, I'm confident that different parameters will yield different alignment stats... I doubt that same default parameters may suit all alignment objectives. Does my concern make sense?
If you are too paranoid then you can use
--sensitive(I think this is the default on
--very-sensitiveparameters with bowtie2. However, the default parameters would still help you get an idea of the number of mapped reads, read depth and other assembly stats. Alternatively, you could use other tools such as QUAST to assess the assembly quality.