Assembly for metagenomics
1
0
Entering edit mode
24 months ago
valentinavan ▴ 50

Hello,

I am working with metagenomes of prokaryotes (modern sediment samples) obtained from WGS: some obtained with an Illumina machine and other using the Nanopore MinION. What I usually do is to trim and clean the data, align to a db and then doing the quantification. Should I also assembly prior to align?

1) I have always assumed that for the nanopore ones, considering that these are long reads: 3k-5k bp in my case, the assembly is not necessary. However, reading around, I have seen that there are assembly tools like Canu, Unicycler or Flye that can handle long reads. Would assembly make sense for WGS reads (my understanding is that it makes sense only when you know what you are sequencing)? Or maybe I could use these assembly tools to just reduce the complexity of my data (Lapidus & Korobeynikov 2021, Crusoe et al. 2015)?

2) My Illumina reads are all pair-end and 150 bp long each. Still, being WGS does it really makes sense to do the assembly or will it be risky and create wrong contigs?

Thanks in advance for clarifying this for me.

metagenomics assembly WGS • 639 views
ADD COMMENT
0
Entering edit mode
24 months ago
Mensur Dlakic ★ 27k

What I usually do is to trim and clean the data, align to a db and then doing the quantification. Should I also assembly prior to align?

Align to what database? What if your organism is novel and not in the database? The answer to your question is yes. Most people in your position would assemble based on long reads and use short reads for polishing (they are more accurate). Then bin the assembly, map the reads to bins, and determine the abundance from that mapping.

My Illumina reads are all pair-end and 150 bp long each. Still, being WGS does it really makes sense to do the assembly or will it be risky and create wrong contigs?

Nothing particularly risky about assembling from short paired-end reads. How do you think people made metagenomic assemblies before long-read technologies were available? I gave you my take above how I would assemble given your data types.

ADD COMMENT
0
Entering edit mode

Thanks Mensur.

I usually align using the nr, nt or refseq databases. If I have any novel organisms I will never know and I atm I am not interested in investigating this. I just want to do comparative analyses of microbial compositions. Since I do not know a priori what is inside my samples I just want to make sure to do everything I can to end up with the most accurate compositional data that I can get from each of these samples. So far, I have skipped the assembly step and gone straight to the alignement using k-mers or score base aligning tools like Kaiju or Kraken or Bowtie2...

Your last comment makes sense to me too now. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1338 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6