comparing annotations between genomes.
2
3
Entering edit mode
4.4 years ago
Ric ▴ 390

Hi, I used flo which did a mapping of annotations from one genome assembly to another. The flo developer did on their page the following calculation:

For an ant genome (~350 Mb) we saw 90% annotations map identically to the new assembly (unpublished result).

How did they calculate the above percentage?

annotation gff liftOver CrossMap • 1.4k views
0
Entering edit mode

Thank you, but I still not sure which files should have I to use?

ls
input.cds.fa  input.gff  lifted_cleaned.cds.fa  lifted_cleaned.gff  lifted.gff3  unlifted.gff3  unmapped.txt

> grep "ID=" unlifted.gff3 | wc -l
19233
> grep "ID=" lifted_cleaned.gff | wc -l
33639
> wc -l unmapped.txt
43632 unmapped.txt
> grep "ID=" input.gff | wc -l
45857
> python
>>> float(33639*100)/45857
73.35630329066446


Do you think that I choose the correct ones?

2
Entering edit mode
18 months ago
Priyam ▴ 20

Coding sequences of the lifted gene models were obtained and checked if they were exactly identical to the coding sequence of the corresponding input gene model. I do not remember if I reported that number as percentage of gene models that were lifted, or as a percentage of input gene models. We can be more confident that a gene model was lifted correctly if the coding sequences of input and lifted gene models are exactly identical. If the coding sequences are not identical, the lifted gene model may still be correct but there is a higher chance we mapped to a duplicate. I believe gff_compare.rb script in flo gives you the id of gene models that were either not lifted or had a non-identical coding sequence.

0
Entering edit mode
4.4 years ago

Flo gives you a file of unlifted genes - those without mappings, those that had to be split up, etc. I guess they took the number of those unlifted genes and subtracted them from the original number of genes?