Question: metagenomic assembly has low coverage; what are my options?
1
gravatar for willnotburn
20 months ago by
willnotburn40
United States, Michigan State Universtiy
willnotburn40 wrote:

My metagenomic assembly has low coverage. What upstream steps can I adjust to increase coverage? At what possible cost? Low coverage of metagenomic assemblies must be a common problem, but I've been unable to find a "quick start guide" i.e. an outline of recommendations to remedy this, barring "do more sequencing".

From the top of my head, I am speculating that relaxing the quality cutoff score during the trimming step could help. It would give the assembler more data. Any thoughts on this? Here's my pipeline outline.

reads -> Trimmomatic (cutoff 30) -> HQ reads -> megahit -> assembly -> bowtie2 + samtools -> bam files, from which I've determined the coverage is low

ADD COMMENTlink modified 20 months ago by h.mon29k • written 20 months ago by willnotburn40

An idea that incorporates @h.mon's comments! How about

  1. Make the reference assembly with high quality (>Q30) reads (using megahit)
  2. Get coverage information (using bowtie2) by mapping an increased number of reads by dropping quality cutoff to >Q20

My current assembly, made with >Q30 reads, recruits 80% of those reads. Must be a good assembly! Can I keep that but expand coverage with more lower quality reads and rely on the mapper bowtie2 to score alignments appropriately and output better coverage values? Would this approach add any value to the workflow?

ADD REPLYlink modified 20 months ago • written 20 months ago by willnotburn40

I don't have good suggestions regarding what you could / should do. If you relax the alignment too much, mapping rate will increase, but reads may map to the wrong place. Consider, for example, a low-abundance species, with no contigs on your assembly because the reads were filtered. When you map, these reads may now map to another species contigs, and the problem will be worst if you use sensitive (permissive) mapping settings.

ADD REPLYlink written 20 months ago by h.mon29k

I played around with re-trimming. What seems to have made the most difference is the MINLEN parameter. I used to have it set at 135, which I thought was fine for a 250 kit. Changing it to 50 now gives me >90% of reads passing the filter.

ADD REPLYlink written 20 months ago by willnotburn40
1
gravatar for h.mon
20 months ago by
h.mon29k
Brazil
h.mon29k wrote:

There is no way to increase coverage, barring doing more sequencing - or trimming your data less. You could try to error-correct your reads (maybe BayesHammer or khmer), and I would definitely apply a less stringent quality cut-off - this is particularly true as you have low coverage. If you want a number, I would use 10. But how much are you discarding anyway?

ADD COMMENTlink written 20 months ago by h.mon29k

Thanks, @h.mon! I haven't considered going as low as 10 cutoff on trimming. According to FastQC, excellent quality is >28 and good quality is >20. Maybe I'll try 20 first. Indeed, with the cutoff at 30, I'm discarding about 73% of my seqs.

Also, I'll try to use --very-sensitive-local flag in bowtie2, as per @Devon advice here.

ADD REPLYlink modified 20 months ago • written 20 months ago by willnotburn40

What sequencing platform? If you did Illumina, discarding 73% of your reads at Q30 seems the run falls short on Illumina specs - e.g., The HiSeq SBS v4 and the TruSeq SBS v3 promises greater than 80% of bases above Q30 at 2 × 100 bp.

ADD REPLYlink written 20 months ago by h.mon29k

It was a HiSeq Rapid SBS Kit V2 dual-flow cell paired-end 250. The page you liked says greater than 75% of bases above Q30 at 2 × 250 bp. The reverse reads were a little shotty. The enzyme probably died early. Alas, I no longer have the DNA samples for additional sequencing.

ADD REPLYlink modified 20 months ago • written 20 months ago by willnotburn40

If everything was done using Illumina kits and following their protocols, Illumina has a policy of providing new reagents for a run that does not reach the specs. If you hired a sequencing center, they may still have DNA, they usually ask for more than necessary, in case bad stuff happens.

ADD REPLYlink written 20 months ago by h.mon29k

Is there harm in going too low on the trimming cutoff score? Would the aligner, like bowtie2, be able to discard bad reads that wouldn't align well?

ADD REPLYlink modified 20 months ago • written 20 months ago by willnotburn40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 897 users visited in the last hour