What Are The Important Considerations When Merging Annotations From Separate Contigs Into A Single Sequence?
1
3
Entering edit mode
13.1 years ago
Michael Barton ★ 1.8k

If I have contigs A, B and C and the annotations for each in a GFF3 file what is the algorithm for merging their annotations when the contigs are merged into the single sequence ABC?

For instance the start, stop coordinates and phase should be incremented accordingly. E.g. The start and stop position for gene x on contig B should be incremented by the length of contig A and the phase should also be updated by A % 3.

What other considerations are there?

genomics assembly contigs annotation • 2.4k views
ADD COMMENT
0
Entering edit mode

How do you do merging? With some padding of gaps (A-NNN-B---C), or ends are touching (ABC)?

ADD REPLY
0
Entering edit mode

Either case. I'm asking with respect to the scaffold software I'm writing - http://next.gs.

ADD REPLY
1
Entering edit mode
13.1 years ago
Darked89 4.6k

One problem hard to handle:

large coding exon (or ORF in bacteria) running to the end of contig A (or close to it *) and starting close to the edge of contig B. Lets assume that this was annotated using exonerate and some protein X.

No padding between contigs still does not solve deletion of the part of the exon/ORF.

Semi-accurate padding requires that the number of Ns corresponds to the distance on X between the matches.

  • ends of contigs may me bad quality / have indels / contain less conserved parts of protein
ADD COMMENT
1
Entering edit mode

Possibly these instances may have to be resolved by manually on a case-by-case basis?

ADD REPLY

Login before adding your answer.

Traffic: 2398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6