metagenomic assembly has low coverage; what are my options?
1
1
Entering edit mode
3.8 years ago
willnotburn ▴ 40

My metagenomic assembly has low coverage. What upstream steps can I adjust to increase coverage? At what possible cost? Low coverage of metagenomic assemblies must be a common problem, but I've been unable to find a "quick start guide" i.e. an outline of recommendations to remedy this, barring "do more sequencing".

From the top of my head, I am speculating that relaxing the quality cutoff score during the trimming step could help. It would give the assembler more data. Any thoughts on this? Here's my pipeline outline.

reads -> Trimmomatic (cutoff 30) -> HQ reads -> megahit -> assembly -> bowtie2 + samtools -> bam files, from which I've determined the coverage is low

metagenomic assembly coverage trimming megahit • 1.4k views
0
Entering edit mode

1. Make the reference assembly with high quality (>Q30) reads (using megahit)
2. Get coverage information (using bowtie2) by mapping an increased number of reads by dropping quality cutoff to >Q20

My current assembly, made with >Q30 reads, recruits 80% of those reads. Must be a good assembly! Can I keep that but expand coverage with more lower quality reads and rely on the mapper bowtie2 to score alignments appropriately and output better coverage values? Would this approach add any value to the workflow?

0
Entering edit mode

I don't have good suggestions regarding what you could / should do. If you relax the alignment too much, mapping rate will increase, but reads may map to the wrong place. Consider, for example, a low-abundance species, with no contigs on your assembly because the reads were filtered. When you map, these reads may now map to another species contigs, and the problem will be worst if you use sensitive (permissive) mapping settings.

0
Entering edit mode

I played around with re-trimming. What seems to have made the most difference is the MINLEN parameter. I used to have it set at 135, which I thought was fine for a 250 kit. Changing it to 50 now gives me >90% of reads passing the filter.

1
Entering edit mode
3.8 years ago
h.mon 33k

There is no way to increase coverage, barring doing more sequencing - or trimming your data less. You could try to error-correct your reads (maybe BayesHammer or khmer), and I would definitely apply a less stringent quality cut-off - this is particularly true as you have low coverage. If you want a number, I would use 10. But how much are you discarding anyway?

0
Entering edit mode

Thanks, @h.mon! I haven't considered going as low as 10 cutoff on trimming. According to FastQC, excellent quality is >28 and good quality is >20. Maybe I'll try 20 first. Indeed, with the cutoff at 30, I'm discarding about 73% of my seqs.

Also, I'll try to use --very-sensitive-local flag in bowtie2, as per @Devon advice here.

0
Entering edit mode

What sequencing platform? If you did Illumina, discarding 73% of your reads at Q30 seems the run falls short on Illumina specs - e.g., The HiSeq SBS v4 and the TruSeq SBS v3 promises greater than 80% of bases above Q30 at 2 × 100 bp.

0
Entering edit mode

It was a HiSeq Rapid SBS Kit V2 dual-flow cell paired-end 250. The page you liked says greater than 75% of bases above Q30 at 2 × 250 bp. The reverse reads were a little shotty. The enzyme probably died early. Alas, I no longer have the DNA samples for additional sequencing.

0
Entering edit mode

If everything was done using Illumina kits and following their protocols, Illumina has a policy of providing new reagents for a run that does not reach the specs. If you hired a sequencing center, they may still have DNA, they usually ask for more than necessary, in case bad stuff happens.

0
Entering edit mode

Is there harm in going too low on the trimming cutoff score? Would the aligner, like bowtie2, be able to discard bad reads that wouldn't align well?