Question: shotgun metagenomics assembly
0
gravatar for luyang1005
9 months ago by
luyang100520
luyang100520 wrote:

Hi, community,

I am dealing with shotgun metageomincs data with 100bp paired-end reads from Hiseq.

I have used the default setting in metaSPAdes, IDBA and MEGAHIT to assemble the reads. But I do not know which one is better. Any suggestions? Are there any indicators to evaluate it? Even I know some software like QUAST can do this, they give me too many parameters, I do not know to choose which one. May I know %raw reads can be assembled matters? How to know how many reads can be assembled?

Thanks in advance.

next-gen assembly • 417 views
ADD COMMENTlink modified 9 months ago by dllopezr40 • written 9 months ago by luyang100520
1

How to know how many reads can be assembled

You can backmap the reads with any mapper. bowtie2 for example, tells you directly how many reads map.

ADD REPLYlink modified 9 months ago • written 9 months ago by Carambakaracho750
1
gravatar for dllopezr
9 months ago by
dllopezr40
dllopezr40 wrote:

Hi

Look this article, can be helpful; van der Walt AJ, van Goethem MW, Ramond J-B, Makhalanyane TP, Reva O, Cowan DA. Assembling metagenomes, one community at a time. BMC Genomics. 2017;18:521.

Check out this one too: Papudeshi B, Haggerty JM, Doane M, Morris MM, Walsh K, Beattie DT, et al. Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes. BMC Genomics

ADD COMMENTlink modified 9 months ago • written 9 months ago by dllopezr40

Yes, the first paper is helpful. Thanks. But I still do not know whether my assembly is Ok or not. Since my raw fastq reads after trimmomatic is 35G, while after assembly, it is 200M. Anything wrong?

ADD REPLYlink written 9 months ago by luyang100520
1

Hi luyang1005, this is close to impossible to tell based on the sparse information you provided. Probably all is good if each of your three assemblies shows similar trends. Probably not if each assembly is bigger by a factor compared to another. If your expected microbiome complexity is moderate, this is probably it. If you expect a highly complex microbiome, something might have gone wrong.

ADD REPLYlink written 9 months ago by Carambakaracho750

Hi, Carambakarocho, Thanks for your reply. My purpose is to know what taxonomy is there and do get some functional characterization of the samples. My samples are anaerobic manure samples. I have got the 25000 contigs from IDBA and MEGAHIT, 44000 contigs from metaSpades. Each software has weak indicators and good performed parts. I am in the confusing status (1) Does these assemblers suitable for the next step bin or annotation? (2) Binning is a must procedure? Or go ahead to annotate is ok? I am still confusing. Any suggestions? Millions of thanks in advance.

ADD REPLYlink written 9 months ago by luyang100520
1

unfortunately, the direct comparison of fragmented assemblies is not trivial, not even for single genomes, and even worse for metagenomes. Neither contig length nor number are a good measure for assembly quality.

In any case you can go ahead with protein prediction and annotation. You can then check which proteins are in both assemblies and see how big the difference really is. Diamond is a good blastp substitute. An excellent source for functional annotation is the EggNOG database and the eggnog-mapper, though other people might have different opinions. In case you have more than one condition and assembled all reads, you can bin your contigs using something like metabat or concoct.

ADD REPLYlink written 9 months ago by Carambakaracho750

Thanks so much for the suggestions. I will go ahead on protein prediction and annotation to have a look. Yesterday I have done one sample's binning. And I checked it with CheckM, it seems that only binning for each sample I can get 30 (total76) bins with completeness> 90% and contanmination<5%. But the lowest level is class level, I think it is not a good situation. Right? Besides, I have also mapped raw cleaned reads to my assemblers by BOWTIE2, 50% of reads can be mapped. Is this enough? OR I need to go back to adjust some parameters in the assembly to adjust a new assembly can improve it? Sorry for so many questions, your help is really appreciated! Thanks a lot.

ADD REPLYlink modified 9 months ago • written 9 months ago by luyang100520

the lowest level is class level

Classification is extremely depended on how well your microbiome composition is represented in the database - I had less than 5% classifiable sequence against the nt database but more than 50% against a custom build database based on a study from colleagues.

50% of reads can be mapped

This seems rather low. Without filtering the assembly, you should get more than that, especially on the spades assembly

ADD REPLYlink written 9 months ago by Carambakaracho750

I see. Thanks for your answers.

ADD REPLYlink written 9 months ago by luyang100520
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2491 users visited in the last hour