I'm annotating sequences for use in a database. When dealing with a sequence that has obvious error at the terminal end/s, is it reasonable to salvage/annotate as much biological plausible information as possible, or does any sequencing error cast doubt on the reliability of the sequence as a whole? I do not have access to the sequencing technique used or the quantitative output, only the interpreted nucleotides.
I've got three options, which I will demonstrate with the following example:
I have a sequence with five (1-5) genes. Genes 1-3 and half of 4 appear plausible. However midway through gene four and carrying through to the end of the sequence (which includes gene 5), stop codons are found in abundance in all three reading frames. This portion of sequence appears to be in error (biologically implausible).
- Annotate genes 1-3 and half of 4
- Annotate genes 1-3
- Annotate nothing