Question

Assembly for metagenomics

0

Entering edit mode

24 months ago

valentinavan ▴ 50

Hello,

I am working with metagenomes of prokaryotes (modern sediment samples) obtained from WGS: some obtained with an Illumina machine and other using the Nanopore MinION. What I usually do is to trim and clean the data, align to a db and then doing the quantification. Should I also assembly prior to align?

1) I have always assumed that for the nanopore ones, considering that these are long reads: 3k-5k bp in my case, the assembly is not necessary. However, reading around, I have seen that there are assembly tools like Canu, Unicycler or Flye that can handle long reads. Would assembly make sense for WGS reads (my understanding is that it makes sense only when you know what you are sequencing)? Or maybe I could use these assembly tools to just reduce the complexity of my data (Lapidus & Korobeynikov 2021, Crusoe et al. 2015)?

2) My Illumina reads are all pair-end and 150 bp long each. Still, being WGS does it really makes sense to do the assembly or will it be risky and create wrong contigs?

Thanks in advance for clarifying this for me.

metagenomics assembly WGS • 639 views

ADD COMMENT • link 24 months ago by valentinavan ▴ 50

score 0 · Answer 1 · 2022-05-11

What I usually do is to trim and clean the data, align to a db and then doing the quantification. Should I also assembly prior to align?

Align to what database? What if your organism is novel and not in the database? The answer to your question is yes. Most people in your position would assemble based on long reads and use short reads for polishing (they are more accurate). Then bin the assembly, map the reads to bins, and determine the abundance from that mapping.

My Illumina reads are all pair-end and 150 bp long each. Still, being WGS does it really makes sense to do the assembly or will it be risky and create wrong contigs?

Nothing particularly risky about assembling from short paired-end reads. How do you think people made metagenomic assemblies before long-read technologies were available? I gave you my take above how I would assemble given your data types.