How To Choose Metagenomics Function annotation : Reads-based v.s. Contigs-based
1
0
Entering edit mode
12 months ago
JyiYeung • 0

Hi, I'm new to the field of environmental metagenomics and I want to do some functional annotation for my shotgun metagenome data. From the literature, I came across two types of pipeline for metagenomics function annotation: reads-based v.s. config-based. But I could not determine which one to use with both pros and cons. For reads-based pipeline, the blast done directly on the clean reads, it means more data maybe. But the reads are short, like mostly 150bp, less than many length of target genes. If it will influence the final blast accuracy or efficiency?

For contig-based pipeline, after assembly, there might be a big loss that many sequences could not be assembled for natural samples. If I use it, how to evaluate the assemble result? like CheckM? what level of evaluation standard is reasonable?

I would really appreciate all of you and responses. Sorry if it is a naive question. Thank you!

metagenomics function annotation reads contigs • 877 views
ADD COMMENT
2
Entering edit mode
12 months ago

Hi,

It really depends on your data and what you are working on (eukaryotes or prokaryotes). However, I will try to answer your question.

Actually, you already answered yourself in your question. Read-based approach is not very robust, especially for the functional annotation, since they are very short. Think about it, you will translate the nucleotide into amino acid to match with the database. You have 150~ bp and when you translate it 150/3 = 30 bp aa and you do not know the reads come from which part of protein. Do you think it is enough?

However, If you have ultra low-coverage data and when you perform de novo assembly, you do not have reasonable result, then you might not have chance to go for the downstream analysis.

In read-based; there are different options that I know;

1- blastx

2- mmseqs

3- fraggenescan - gene prediction on short reads

In Contig-based approach; you can assess your assembly using;

1- map the reads to assembly to see the coverage, then you can understand how much information you lost.

2- QUAST, basic statistics

3- You can use single-copy orthologous for the MAGs,not assembly. Otherwise, you cannot really evaluate your assembly using single-copy genes because you see many duplicates which is very normal in metagenomic assembly.

I would say that the contig-based approach is safer than the read-based approach for the functional annotation.

Hope it helps.

ADD COMMENT
0
Entering edit mode

Thank you so much for your kind and time! That's really helpful! I will try the config-based approach first and search information about single-copy orthologous for the MAGs, that's new to me. :) Best wishes!

ADD REPLY

Login before adding your answer.

Traffic: 2389 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6