Question: Where Can I Get Annovar 'Refgene' Format?
4
gravatar for jessada
8.9 years ago by
jessada140
jessada140 wrote:

I used the ANNOVAR command line

annotate_variation.pl -downdb -buildver hg19 refGene humandb

to download hg19_refGene.txt from UCSC and I'll use this database to create the input file for ANNOVAR in format http://www.openbioinformatics.org/annovar/annovar_filter.html#generic but all information I can get for the refGene format is from http://genome.ucsc.edu/FAQ/FAQformat

(
string  geneName;           "Name of gene as it appears in Genome Browser."
string  name;               "Name of gene"
string  chrom;              "Chromosome name"
char[1] strand;             "+ or - for strand"
uint    txStart;            "Transcription start position"
uint    txEnd;              "Transcription end position"
uint    cdsStart;           "Coding region start"
uint    cdsEnd;             "Coding region end"
uint    exonCount;          "Number of exons"
uint[exonCount] exonStarts; "Exon start positions"
uint[exonCount] exonEnds;   "Exon end positions"
)

which is not sufficient because the downloaded refGene has more columns. For example

1475    NM_000039    chr11    -    116706468    116708338    116706523    116708103    4    116706468,116707716,116708060,116708320,    116707127,116707873,116708123,116708338,    0    APOA1    cmpl    cmpl    2,1,0,-1,

I tried to look many place to find the meaning of the last 6 columns. Anyone here can give the site that can explain the meaning of those columns?

annovar ucsc • 10k views
ADD COMMENTlink modified 2.3 years ago by lffu_003260 • written 8.9 years ago by jessada140

The format description has been updated: http://genome.ucsc.edu/FAQ/FAQformat#format9
But it is still wrong: before name there is an something non-unique called bin and the uint id is the score.
Looks like there was no primary key for the data then.

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by gresserT50
5
gravatar for Pierre Lindenbaum
8.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

as far as i can see in curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.sql"

  `bin` smallint(5) unsigned NOT NULL,
  `name` varchar(255) NOT NULL,
  `chrom` varchar(255) NOT NULL,
  `strand` char(1) NOT NULL,
  `txStart` int(10) unsigned NOT NULL,
  `txEnd` int(10) unsigned NOT NULL,
  `cdsStart` int(10) unsigned NOT NULL,
  `cdsEnd` int(10) unsigned NOT NULL,
  `exonCount` int(10) unsigned NOT NULL,
  `exonStarts` longblob NOT NULL,
  `exonEnds` longblob NOT NULL,
  `score` int(11) default NULL,
  `name2` varchar(255) NOT NULL,
  `cdsStartStat` enum('none','unk','incmpl','cmpl') NOT NULL,
  `cdsEndStat` enum('none','unk','incmpl','cmpl') NOT NULL,
  `exonFrames` longblob NOT NULL,

all the fields you need are present in http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz

you just need to remove the some columns (bin...)

ADD COMMENTlink written 8.9 years ago by Pierre Lindenbaum133k

Hi Pierre,

How to create RefGene file for virus using annovar

ADD REPLYlink modified 14 months ago by _r_am32k • written 5.3 years ago by bioinforesearchquestions280
1
gravatar for gresserT
5.9 years ago by
gresserT50
Germany
gresserT50 wrote:

Under "describe table schema" is the complete and right description:

http://genome.ucsc.edu/cgi-bin/hgTables?hgta_track=refGene

ADD COMMENTlink modified 5.2 years ago • written 5.9 years ago by gresserT50
1
1
gravatar for lffu_0032
2.3 years ago by
lffu_003260
lffu_003260 wrote:

you can visit http://genome.ucsc.edu/cgi-bin/hgTables and then click "describe the schema" button, the you can see the RefSeq gene predictions format.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by lffu_003260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1535 users visited in the last hour
_