Truncated/duplicated genes in Metagenome Assembled Genomes (MAGs) from ONT+Illumina hybrid assembly
1
1
Entering edit mode
11 months ago

Hello there,

I have performed a metagenome assembly on a single sample using both Illlumina and ONT reads (average q-score 10) with opera-ms. Then contigs were binned using maxbin2, metabat2 and concoct, and metaWRAP was used for bin refinement. From this sample I have obtained 22 good quality MAGs with a completness > 70% and contamination < 5%.

The problem that I am facing right now is the presence of truncated/duplicated genes which are probably caused by false frameshifts introduced by ONT reads. I tried pilon to fix these sequencing errors but because of the low coverage, pilon did not solve the problem.

Also, It seems that the annotation pipelines can be more or less sensitive to this kind of problems. For example, the same MAG was annotated with two different pipeline: RAST and prokka (metagenome mode). With RAST, the rubisco gene (rbcL) was splitted in 4 fragments, while with prokka the same gene was represented by 2 fragments.

My question is: are you aware of any annotation pipeline for MAGs that is less sensitive to false frameshifts introduced by sequencing errors? Perhaps an annotation pipeline that make use of a user-defined database of reference genomes (lucky for me my MAGs belong to taxa with a good number of genomes from isolated strains).

Thanks

hybrid annotation metagenome assembly • 687 views
1
Entering edit mode
10 months ago
oschwengers ▴ 80

Hi, before addressing this by other gene prediction tools and annotation pipelines, you could try other long- and short-read polishing tools. As each polishing tool provides its on strength and weaknesses in terms of certain error types, I'd suggest to combine them in sequential order: 1) long reads: Racon, Medaka 2) short reads: Pilon, Nextpolish, Polypolish

Particularly, the latter might help as it expoits multiple mappings of the same read instead just the single best one.

Since annotation pipelines like Prokka, RAST, PGAP, DFAST and Bakta utilize gene prediction tools, they depend on the result of these tools which in turn depend on the assembly quality.

0
Entering edit mode

Hi, the assembly pipeline (opera-ms) already include some of those tools, i.e. racon and pilon but Polypolish sounds interesting so I will give it a try.

I will update this post if I get better results.

Thanks