identification of catabolic genes in soil metagenome shotgun data
1
0
Entering edit mode
3.0 years ago
mewgia • 0

Hello.

I have 4 shotgun sequencing data of soil microbial communities. I have compiled a custom database of genes of polyciclic aromatics catabolism and mapped my raw shotgun reads to this database. So from each sample I obtained reads corresponding to PAH catabolic genes. For example:

sample 1 - 100 reads

sample 2 - 200 reads

sample 3 - 300 reads

sample 4 - 150 reads

And the question is: can I, based on the number of reads, make conclusions about the number of genes in the samples? I.e. can I say, that the sample 3 contains more PAH genes than sample 1? Also I have carried out the taxonomic analysis, but as for the genes, it's not so obvious.

Thanks.

shotgun metagenome • 886 views
ADD COMMENT
0
Entering edit mode

based on the number of reads, make conclusions about the number of genes in the samples?

Do you have multimapping reads, if you do then you really can't conclude anything. If the reads are mapping uniquely then you can maybe say that that subset of genes appears to be expressed (ones that have reads aligning to them) but you can't conclude anything about number of genes.

You are also aligning to a custom (reduced representation) database so aligners will try and align reads to genes in your database even if they may not have originated from those genes.

ADD REPLY
0
Entering edit mode

Assuming that you have biological replicates, the best way to answer your question in literally count how many PAH catabolic genes you have in your samples. For each sample, assemble your reads in contigs with a metagenomic assembler of your choice > run prodigal in metagenome mode to detect the protein coding genes > use the protein sequence in eggNOG to identify which proteins have a KO (KEGG Orthology) ID associated with the degradation of PAH > count how many PAH KO you have in each sample:

K14579: naphthalene 1,2-dioxygenase subunit alpha   7   8   9   15
K11943: PAH dioxygenase large subunit   32  8   13  8
...

The numbers indicates the amound of CDS, e.g. K11943, identified in your samples

ADD REPLY
0
Entering edit mode
3.0 years ago

I think If you knew for example how much colony forming units you have sequenced in each sample then you can scale these values for example genes per 100 CFU, or something like this. But since the numbers here are quite close together I don't think you would find statistical difference. I suppose for that would need replicates for each sample.

ADD COMMENT
0
Entering edit mode

OP also does not seem to indicate case and control scenarios?

ADD REPLY

Login before adding your answer.

Traffic: 2498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6