Question: Where Can I Get Annovar 'Refgene' Format?
2
gravatar for jessada
6.4 years ago by
jessada100
jessada100 wrote:

I used the ANNOVAR command line

annotate_variation.pl -downdb -buildver hg19 refGene humandb

to download hg19_refGene.txt from UCSC and I'll use this database to create the input file for ANNOVAR in format http://www.openbioinformatics.org/annovar/annovar_filter.html#generic but all information I can get for the refGene format is from http://genome.ucsc.edu/FAQ/FAQformat

(
string  geneName;           "Name of gene as it appears in Genome Browser."
string  name;               "Name of gene"
string  chrom;              "Chromosome name"
char[1] strand;             "+ or - for strand"
uint    txStart;            "Transcription start position"
uint    txEnd;              "Transcription end position"
uint    cdsStart;           "Coding region start"
uint    cdsEnd;             "Coding region end"
uint    exonCount;          "Number of exons"
uint[exonCount] exonStarts; "Exon start positions"
uint[exonCount] exonEnds;   "Exon end positions"
)

which is not sufficient because the downloaded refGene has more columns. For example

1475    NM_000039    chr11    -    116706468    116708338    116706523    116708103    4    116706468,116707716,116708060,116708320,    116707127,116707873,116708123,116708338,    0    APOA1    cmpl    cmpl    2,1,0,-1,

I tried to look many place to find the meaning of the last 6 columns. Anyone here can give the site that can explain the meaning of those columns?

annovar ucsc • 7.4k views
ADD COMMENTlink modified 20 days ago by biock0 • written 6.4 years ago by jessada100

The format description has been updated: http://genome.ucsc.edu/FAQ/FAQformat#format9
But it is still wrong: before name there is an something non-unique called bin and the uint id is the score.
Looks like there was no primary key for the data then.

ADD REPLYlink modified 3.4 years ago • written 3.5 years ago by gresserT40
3
gravatar for Pierre Lindenbaum
6.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum111k wrote:

as far as i can see in curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.sql"

  `bin` smallint(5) unsigned NOT NULL,
  `name` varchar(255) NOT NULL,
  `chrom` varchar(255) NOT NULL,
  `strand` char(1) NOT NULL,
  `txStart` int(10) unsigned NOT NULL,
  `txEnd` int(10) unsigned NOT NULL,
  `cdsStart` int(10) unsigned NOT NULL,
  `cdsEnd` int(10) unsigned NOT NULL,
  `exonCount` int(10) unsigned NOT NULL,
  `exonStarts` longblob NOT NULL,
  `exonEnds` longblob NOT NULL,
  `score` int(11) default NULL,
  `name2` varchar(255) NOT NULL,
  `cdsStartStat` enum('none','unk','incmpl','cmpl') NOT NULL,
  `cdsEndStat` enum('none','unk','incmpl','cmpl') NOT NULL,
  `exonFrames` longblob NOT NULL,

all the fields you need are present in http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz

you just need to remove the some columns (bin...)

ADD COMMENTlink written 6.4 years ago by Pierre Lindenbaum111k

Hi Pierre,

How to create RefGene file for virus using annovar 

 

 

ADD REPLYlink written 2.8 years ago by bioinforesearchquestions150
0
gravatar for gresserT
3.4 years ago by
gresserT40
Germany
gresserT40 wrote:

Under "describe table schema" is the complete and right description:

http://genome.ucsc.edu/cgi-bin/hgTables?hgta_track=refGene

ADD COMMENTlink modified 2.8 years ago • written 3.4 years ago by gresserT40
0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 977 users visited in the last hour