Question

gene abundance profiling from metagenomes

0

Entering edit mode

22 months ago

v.berriosfarias ▴ 140

Hello l this is the first time that I have to do this so I don't know what tools to use.

I have 19 metagenomic samples represented by sets of contigs and a database of proteins of interest in fasta format. The issue is that I have to perform a gene abundance profile per each metagenomic sample.

By now I performed a gene copy number analysis using mmseqs2 using mmseqs search program with the contigs as query and the gene database as target and it gave as output a blastn table (output format 6): https://www.metagenomics.wiki/tools/blast/blastn-output-format-6

At this point I don't know if parsing this table will give me abundance information. I think that for performing gene abundance I must use the read information instead of the contig information isn't it?

The contigs represent the consensus of the reads so the proper way to calculate the abundance of the genes may be by mapping the reads against the contigs at which the genes mapped isn't it? any tools for that?

Do you recommend me some tools or paper to read for performing a gene profiling analysis for metagenomic samples?

I also have the metagenomic reads from which the contigs came from.

Thanks for your time :)

gene-abundance mapping metagenomics • 876 views

ADD COMMENT • link 22 months ago by v.berriosfarias ▴ 140

score 1 · Answer 1 · 2022-06-04

Not sure at all what you are trying to do, despite the lengthy explanation. What you seem to suggest is not something that can be done, or something that would yield useful information.

Let's say that organisms XX and YY both have a gene called geneA. You can asses the presence of that gene in both genomes, but raw read abundance will only tell you about the relative abundance of organisms and nothing else. That is to say that if XX is 10x more abundant than YY, its geneA in theory will have 10x the reads mapping to it.

If you are trying to ascertain the abundance of geneA in the whole community, that again will likely be the function of organismal abundance rather than anything else. Let's say that you have one community where geneA is present in a single organism which comprises 50% of the total community. You will get a larger abundance of mapping reads in that community than from a different community where geneA is present in 3 organisms, but each of them comprises only 10% of the total community. I don't think that knowing gene copy numbers will help you much without knowing the organism(s) they belong to, and their relative community abundance as well.