Reannotate genomic data with new Ensemble release
3
0
Entering edit mode
4.7 years ago
mgdias.jose ▴ 10

I have transposons data for mouse organised in the following manner (txt)

chr start end hit 
1 111 130 geneXXX 
2 1546 1867 intergenic 
3 123 234 geneYYY

this is for the assembly 38, Ensembl release 73.

I want to compare it now with the release 90 of Ensembl (which is still assembly 38 of mouse) to confirm if the hits of the transposon are still accurate with most updated version. (e.g. to know if the transposon on chr 2 is still hitting on an intergenic region or if that region is now attributed to a gene. Or if the region in chr 3 where my transposon hit is still annotated as geneYYY and so on).

Are there any (semi) automated tools to do it? Which would be the best approach?

Thank you !

Ensembl Annotation transposon gene data • 1.1k views
ADD COMMENT
0
Entering edit mode

I would suggest just re-annotating your data from scratch (starting with the first three columns), assuming that you have code to do the annotation in the first place. The changes in the Ensembl genes will not be captured by "coordinate conversion", so do not spend any time looking at tools that do that. Instead, look for approaches to annotate genomic regions from a gtf/gff file.

ADD REPLY
2
Entering edit mode
4.7 years ago

You can use the Assembly Converter in Ensembl to convert coordinates from one genome assembly to another: http://www.ensembl.org/Homo_sapiens/Tools/AssemblyConverter?db=core

I assume you work with human data, but the principle is the same for other species.

Your data from Ensembl release 73 used the assembly GRCh37.p12, while Ensembl 90 is based on GRCh38.p10: http://www.ensembl.org/info/website/archives/assembly.html

I.e. you need to do a GRCH37 -> GRCH38 assembly mapping with the Assembly Converter.

Your input file needs to be in one of these formats: BED GFF GTF WIG VCF

ADD COMMENT
0
Entering edit mode

Hello,

my data is in mouse and therefore, there is only possible to assemble Ensembl vs NCBI and it is in txt format.

Do you know any other platform/tool? Is there any way to convert it to one of those file formats without losing configuration ?

Thanks

ADD REPLY
0
Entering edit mode
4.7 years ago

I see you have already found the comprehensive post/thread on various options for converting genome coordinates: Converting Genome Coordinates From One Genome Version To Another (Ucsc Liftover, Ncbi Remap, Ensembl Api)

ADD COMMENT

Login before adding your answer.

Traffic: 2041 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6