What are some drawbacks in using highly conserved genes to determine genomic completeness?
1
0
Entering edit mode
8.1 years ago
Tom ▴ 20

For bacteria for example; draft assemblies and draft genomes are often assessed for how complete they are based on how well the raw reads cover certain conserved genes that are assumed to be present in all bacteria. What are some limitations and weaknesses in using this model? I can only imagine that some regions can be covered more than others, and thus would give you false positives/negatives.

https://peerj.com/preprints/554.pdf

genome bacteria sequencing Assembly • 2.2k views
ADD COMMENT
0
Entering edit mode

I think it was some guy from JGI who made the observation that many of such conserved genes are located in proximity to each other, so if you miss one such region from your assembly, the completeness estimate can be way off..

ADD REPLY
0
Entering edit mode

Thats a really cool observation, can I have a reference to what you're talking about? I'm doing a discussion on this topic and it would be most helpful!

ADD REPLY
0
Entering edit mode

Do you have a name for this guy?

ADD REPLY
0
Entering edit mode

Just a guess. Perhaps Nikos C. Kyrpides.

ADD REPLY
0
Entering edit mode

I think I heard it in person at the JGI but it most definitely wasn't Kyrpides. Sorry, this was a few years ago..

ADD REPLY
0
Entering edit mode

If not the person specifically; do you know of any papers that would illustrate this evidence? I'm unable to find anything

ADD REPLY
0
Entering edit mode

Have a look for example here.

ADD REPLY
0
Entering edit mode
8.1 years ago

I would assume if one wants to assess quality of the mappings he/she may want to use highly conserved genes (because you know there 'should' not be a wrong call there). But if by genomic completeness you mean what percentage of the genome is covered than I would say ubiquitously expressed genes might be more accurate. Most of the time one would assume highly conserved genes = highly expressed, but this may not always be the case. Therefore the ideal gene set might deviate from cell type to cell type. Due to applicability I assume there is a consensus gene set that is more or less OK. In any case you need to validate what is the state of art, my reasoning might be wrong.

ADD COMMENT
0
Entering edit mode

I am not sure I understand your point. If you're looking at genomes, expression doesn't matter, the genes just have to be present/absent.

ADD REPLY
0
Entering edit mode

Dear Jean, I just answered another question regarding gene expression before this post. Somehow I interpreted this question within the context of RNAseq. So my point here does not make sense. I dont understand myself either :). I will keep this post in case people ask similar question regarding RNAseq.

ADD REPLY

Login before adding your answer.

Traffic: 2146 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6