Question: downloading human gene symbol with duplicates
0
gravatar for jiwpark00
3.9 years ago by
jiwpark00210
jiwpark00210 wrote:

I've looked on Google and Biostars before but I can't quite seem to find this information.

I've tried both UCSC Table Browser and HUGN but both lists seem to have their own problems. I'm basically trying to:

  • Download a list of human gene symbols for all protein-coding genes
  • Along with their duplicate names

Thank you. I've seen posts for Ensembl and BioMart but can't seem to find the right link to do this.

gene symbol human genome • 1.2k views
ADD COMMENTlink modified 3.9 years ago by EagleEye6.6k • written 3.9 years ago by jiwpark00210
4
gravatar for EagleEye
3.9 years ago by
EagleEye6.6k
Sweden
EagleEye6.6k wrote:

ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/

zcat Homo_sapiens.gene_info.gz | grep -w "protein-coding" | cut -f2,3,5,9,10 > output_table.txt

Sample output

1   A1BG    A1B|ABG|GAB|HYST2477    alpha-1-B glycoprotein  protein-coding
2   A2M A2MD|CPAMD5|FWP007|S863-7   alpha-2-macroglobulin   protein-coding
9   NAT1    AAC1|MNAT|NAT-1|NATI    N-acetyltransferase 1   protein-coding
10  NAT2    AAC2|NAT-2|PNAT N-acetyltransferase 2   protein-coding
12  SERPINA3    AACT|ACT|GIG24|GIG25    serpin family A member 3    protein-coding
13  AADAC   CES5A1|DAC  arylacetamide deacetylase   protein-coding
14  AAMP    -   angio associated migratory cell protein protein-coding
15  AANAT   DSPS|SNAT   aralkylamine N-acetyltransferase    protein-coding
16  AARS    CMT2N|EIEE29    alanyl-tRNA synthetase  protein-coding
18  ABAT    GABA-AT|GABAT|NPD009    4-aminobutyrate aminotransferase    protein-coding
  • information updated on daily basis
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.6k

Thank you, that's very helpful. I did a simialr with HUGO before and HUGO lists 19008 genes whereas NIH version gives 20731 genes. Is it because HUGO is "outdated"?

It seems like I'm getting different counts each time so that's why I was wondering.

ADD REPLYlink written 3.9 years ago by jiwpark00210

Annotations change all the time, as knowledge about the genome is updated, so it would not be surprising to have a slight change in the number of genes from one release of the annotation files to the next.

The number of genes in the NCBI and HUGO lists may also differ because they each have their own annotation methods.

ADD REPLYlink written 3.9 years ago by mastal5112.0k

HUGO names are "official" names for human genes.

ADD REPLYlink written 3.9 years ago by genomax87k

What do you mean "official" with quote? Are they better than NCBI list?

ADD REPLYlink written 3.9 years ago by jiwpark00210
2

The HUGO Gene Nomenclature Committee is the only worldwide authority that assigns standardised nomenclature to human genes.

From this page. Also check the second question/answer.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1441 users visited in the last hour