Question

Where Can I Get Annovar 'Refgene' Format?

4

Entering edit mode

13.3 years ago

jessada ▴ 150

I used the ANNOVAR command line

annotate_variation.pl -downdb -buildver hg19 refGene humandb

to download hg19_refGene.txt from UCSC and I'll use this database to create the input file for ANNOVAR in format http://www.openbioinformatics.org/annovar/annovar_filter.html#generic but all information I can get for the refGene format is from http://genome.ucsc.edu/FAQ/FAQformat

(
string  geneName;           "Name of gene as it appears in Genome Browser."
string  name;               "Name of gene"
string  chrom;              "Chromosome name"
char[1] strand;             "+ or - for strand"
uint    txStart;            "Transcription start position"
uint    txEnd;              "Transcription end position"
uint    cdsStart;           "Coding region start"
uint    cdsEnd;             "Coding region end"
uint    exonCount;          "Number of exons"
uint[exonCount] exonStarts; "Exon start positions"
uint[exonCount] exonEnds;   "Exon end positions"
)

which is not sufficient because the downloaded refGene has more columns. For example

1475    NM_000039    chr11    -    116706468    116708338    116706523    116708103    4    116706468,116707716,116708060,116708320,    116707127,116707873,116708123,116708338,    0    APOA1    cmpl    cmpl    2,1,0,-1,

I tried to look many place to find the meaning of the last 6 columns. Anyone here can give the site that can explain the meaning of those columns?

annovar ucsc • 14k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 13.3 years ago by jessada ▴ 150

0

Entering edit mode

The format description has been updated: http://genome.ucsc.edu/FAQ/FAQformat#format9

But it is still wrong: before name there is an something non-unique called bin and the uint id is the score.

Looks like there was no primary key for the data then.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by gresserT ▴ 50

Ram · Answer 1 · 2012-03-12

as far as i can see in curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.sql"

  `bin` smallint(5) unsigned NOT NULL,
  `name` varchar(255) NOT NULL,
  `chrom` varchar(255) NOT NULL,
  `strand` char(1) NOT NULL,
  `txStart` int(10) unsigned NOT NULL,
  `txEnd` int(10) unsigned NOT NULL,
  `cdsStart` int(10) unsigned NOT NULL,
  `cdsEnd` int(10) unsigned NOT NULL,
  `exonCount` int(10) unsigned NOT NULL,
  `exonStarts` longblob NOT NULL,
  `exonEnds` longblob NOT NULL,
  `score` int(11) default NULL,
  `name2` varchar(255) NOT NULL,
  `cdsStartStat` enum('none','unk','incmpl','cmpl') NOT NULL,
  `cdsEndStat` enum('none','unk','incmpl','cmpl') NOT NULL,
  `exonFrames` longblob NOT NULL,

all the fields you need are present in http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz

you just need to remove the some columns (bin...)

score 1 · Answer 2 · 2015-03-14

1

Entering edit mode

10.3 years ago

gresserT ▴ 50

Under "describe table schema" is the complete and right description:

http://genome.ucsc.edu/cgi-bin/hgTables?hgta_track=refGene

ADD COMMENT • link 9.7 years ago by gresserT ▴ 50

score 1 · Answer 3 · 2018-08-01

1

Entering edit mode

6.9 years ago

biock ▴ 70

http://rohsdb.cmb.usc.edu/GBshape/cgi-bin/hgTables?hgsid=583_AkEae6dMkhjf5kd9BxNksFo9ySiK&hgta_doSchemaDb=mm10&hgta_doSchemaTable=refGene

ADD COMMENT • link 6.9 years ago by biock ▴ 70

score 1 · Answer 4 · 2018-10-11

1

Entering edit mode

6.7 years ago

lffu_0032 ▴ 90

you can visit http://genome.ucsc.edu/cgi-bin/hgTables and then click "describe the schema" button, the you can see the RefSeq gene predictions format.

ADD COMMENT • link 6.7 years ago by lffu_0032 ▴ 90