I used the ANNOVAR command line
annotate_variation.pl -downdb -buildver hg19 refGene humandb
to download hg19_refGene.txt from UCSC and I'll use this database to create the input file for ANNOVAR in format http://www.openbioinformatics.org/annovar/annovar_filter.html#generic but all information I can get for the refGene format is from http://genome.ucsc.edu/FAQ/FAQformat
( string geneName; "Name of gene as it appears in Genome Browser." string name; "Name of gene" string chrom; "Chromosome name" char strand; "+ or - for strand" uint txStart; "Transcription start position" uint txEnd; "Transcription end position" uint cdsStart; "Coding region start" uint cdsEnd; "Coding region end" uint exonCount; "Number of exons" uint[exonCount] exonStarts; "Exon start positions" uint[exonCount] exonEnds; "Exon end positions" )
which is not sufficient because the downloaded refGene has more columns. For example
1475 NM_000039 chr11 - 116706468 116708338 116706523 116708103 4 116706468,116707716,116708060,116708320, 116707127,116707873,116708123,116708338, 0 APOA1 cmpl cmpl 2,1,0,-1,
I tried to look many place to find the meaning of the last 6 columns. Anyone here can give the site that can explain the meaning of those columns?