There is one main consideration in trying to reproduce the coverage profile from an assembly of short reads:
- You do it during assembly without using information, or
- You do it post-assembly, possibly losing information.
Most assemblers for NGS data won't do it by default during assembly since it takes up more memory to keep track of the reads pileup during the assembly process. Then the option is to do it post-assembly, which is not perfect. What mainly makes it imperfect is that one will use a different tool to pileup the reads back to the contigs in a way that won't reproduce with 100% fidelity the decisions taken by the assembler during the process. Repetitive regions are one of the main sources of difference in this post-assembly pileup differences. Maybe other people know of other genomic features that will make this different.
BWASW as an example for short-read post-assembly pileup.
I agree that using directly the produced assembly support is prefered above an after-mapping.
There are quite some good tools for this. I suspect you want to have some idea of the assembly quality right? If you have paired-end or mate-pair data you should definitly try to look at so-called regions where the mapped/assemled reads significantly violate the expected distance between or mapped orietation of reads (eg. compressed regions).
First use BWA to align reads to contigs:
bwa index contigs.fasta bwa aln -t NUMBER_OF_THREADS contigs.fasta short_reads.fastq > alignment.sai bwa samse contigs.fasta alignment.sai short_reads.fastq > alignment.sam
If you have paired-end reads, use
bwa sampe instead of
If you have long reads, use
bwa bwasw instead of both
bwa aln and
Alternatively, you can use Bowtie to make alignment.
Second use Tablet to visualize resulting mapping.