Question

Retrieving gene name from its localisation in a .gff3 annotation file.

0

Entering edit mode

8.4 years ago

user31888 ▴ 130

Rookie question. I have a tabulated .gff3 annotation file of human alternative events obtained here that looks like this (showing first record only):

chr1    A3SS    gene    15796    16765    .    -    .    ID=chr1:6470:6628:-@chr1:5805|5810:5659:-;Name=chr1:6470:6628:-@chr1:5805|5810:5659:-;gid=chr1:6470:6628:-@chr1:5805|5810:5659:-

I am trying to annotate the genes (getting an ID or name,...) in order to know if they are associated with certain disease states, based on the information contained in this annotation file.

I thought about extracting the chromosome localisation (columns 1, 4 and 5) and converting it to a VCF-style file that I could use with ANNOVAR or a similar program:

1   15796   16765   0   0

However, I am not sure of the meaning of the ID event (i.e. ID=chr1:6470:6628:-@chr1:5805|5810:5659:-).

(a) Could someone explain the format of the ID?

(b) Are the yellow parts the chromosome localisation and could it be used to retrieve the gene name and other info?

(c) Is there a more straightforward way of annotating the gene based on this localisation?

gene UCSC annotation • 1.7k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.4 years ago by user31888 ▴ 130