Hi, I have a set of predicted orfs from metagenomics bins after assembly. I have extracted a set of orfs that are single copy genes , reflecting the diverstity of my sample.

How can i get the copy number for each of this genes (actually 139 genes) in my assembly.

My first idea is to map my single copy genes to my assembly and get coverage for each. Is that a suitable approach or is there a better way to get copy number for each my single copy genes ?


I'm quite confused by your question. If you claim those genes are single copy genes, then why would you look for the the copy number? Single copy for me means copy number = 1.

Yes sorry, I have isolated them from a metagenomics sample containing diferrent bacterial specimens. Each marker is unique of one species, but i want to know how many copies of the same species i have in my sample. For instance i can have one single copy gene that is carried let´s say by 40 times the same bacteria contained in my metasample.

The workflow is:

1 - Assembly metagenome

2 - Binining of contigs into bins (set of contigs)

3 - orf prediction from each bin

4- Hmmsearch for 140 specific marker genes in my predicted orfs.

5- get the dna sequences of this orfs

How to know how many copies of bacterial genomes you have ?

