Question: Annotation convention when dealing with sequencing error/ambiguity
0
gravatar for Ghoti
10 weeks ago by
Ghoti40
Ghoti40 wrote:

I'm annotating sequences for use in a database. When dealing with a sequence that has obvious error at the terminal end/s, is it reasonable to salvage/annotate as much biological plausible information as possible, or does any sequencing error cast doubt on the reliability of the sequence as a whole? I do not have access to the sequencing technique used or the quantitative output, only the interpreted nucleotides.

I've got three options, which I will demonstrate with the following example:

I have a sequence with five (1-5) genes. Genes 1-3 and half of 4 appear plausible. However midway through gene four and carrying through to the end of the sequence (which includes gene 5), stop codons are found in abundance in all three reading frames. This portion of sequence appears to be in error (biologically implausible).

  1. Annotate genes 1-3 and half of 4
  2. Annotate genes 1-3
  3. Annotate nothing
sequencing • 146 views
ADD COMMENTlink modified 10 weeks ago by lieven.sterck2.6k • written 10 weeks ago by Ghoti40
1
gravatar for lieven.sterck
10 weeks ago by
lieven.sterck2.6k
Belgium, Ghent, VIB
lieven.sterck2.6k wrote:

I would generally advice to annotate as much as possible (as long as it makes sense of course). It's always possible to add 'questionable' genes as pseudogene or such, so they will not be seen as part of the protein coding gene set but at least there will be a record of it.

In this specific case, however, we have far too little info to make a solid suggestion.

ADD COMMENTlink written 10 weeks ago by lieven.sterck2.6k

I'm trying to develop a predominantly automatic annotation tool. The biological interpretation of sequences can be influenced by a variety of factors. Factors which can be explored in more detail manually. I was hoping for a generic approach that could fit most/all sequences without special treatment requirements. I felt that options 2 and 3 were cop outs, but I don't want to potentially mislead anyone that might use the data. Recategorizing questionable data should fix this problem.

Edit1 - I'll also look more into how questionable data might be stored within Chado (a biological database schema I'm using).

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Ghoti40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1049 users visited in the last hour