Question: TAIR Gene Symbols
0
gravatar for pthom010
23 days ago by
pthom0100
pthom0100 wrote:

Does anybody know where I can acquire a table of TAIR Gene Symbols? What I have is a file with TAIR locus IDs and I would like to get the gene symbols that correspond with them. I have some coding experience in Linux and R so if there is a solution based in either one of those resources, that would be even better.

ADD COMMENTlink modified 23 days ago by genomax91k • written 23 days ago by pthom0100
1
gravatar for Kevin Blighe
23 days ago by
Kevin Blighe66k
Kevin Blighe66k wrote:

Hey,

There are two approaches here.

1, org.At.tair.db

You can use the annotation DB packages from Bioconductor, specifically org.At.tair.db.

Copying my own answer from here: A: Biomart query returns NA when searching for entrez_id, while manual search works

library(org.At.tair.db)

genes <- c("AT2G14610","AT4G23700","AT3G26830",
  "AT3G15950","AT3G54830","AT5G24105")

keytypes(org.At.tair.db)

mapIds(org.At.tair.db, keys = genes,
  column = c('SYMBOL'), keytype = 'TAIR')
 AT2G14610  AT4G23700  AT3G26830  AT3G15950  AT3G54830  AT5G24105 
 "AtCAPE9"  "ATCHX17" "CYP71B15"     "NAI2"         NA    "AGP41"

select(org.At.tair.db, keys = genes,
  column = c('ENTREZID', 'SYMBOL', 'REFSEQ'), keytype = 'TAIR')

       TAIR ENTREZID   SYMBOL       REFSEQ
1  AT2G14610   815949  AtCAPE9    NM_127025
2  AT2G14610   815949  AtCAPE9    NP_179068
3  AT2G14610   815949    ATPR1    NM_127025
4  AT2G14610   815949    ATPR1    NP_179068
5  AT2G14610   815949       PR    NM_127025
6  AT2G14610   815949       PR    NP_179068
7  AT2G14610   815949      PR1    NM_127025
8  AT2G14610   815949      PR1    NP_179068
9  AT4G23700   828470  ATCHX17 NM_001341626
10 AT4G23700   828470  ATCHX17    NM_118501
11 AT4G23700   828470  ATCHX17 NP_001328705
12 AT4G23700   828470  ATCHX17    NP_194101
13 AT4G23700   828470    CHX17 NM_001341626
14 AT4G23700   828470    CHX17    NM_118501
15 AT4G23700   828470    CHX17 NP_001328705
16 AT4G23700   828470    CHX17    NP_194101
17 AT3G26830   822298 CYP71B15    NM_113595
18 AT3G26830   822298 CYP71B15    NP_189318
19 AT3G26830   822298     PAD3    NM_113595
20 AT3G26830   822298     PAD3    NP_189318
21 AT3G15950   820839     NAI2 NM_001035631
22 AT3G15950   820839     NAI2 NM_001338191
23 AT3G15950   820839     NAI2 NM_001338192
24 AT3G15950   820839     NAI2 NM_001338193
25 AT3G15950   820839     NAI2    NM_112465
26 AT3G15950   820839     NAI2 NP_001030708
27 AT3G15950   820839     NAI2 NP_001326807

2, biomaRt

require(biomaRt)
tair_mart <- useMart(biomart = 'plants_mart',
  host = 'plants.ensembl.org', dataset = 'athaliana_eg_gene')

head(listAttributes(tair_mart), 15)

annot <- getBM(
  values = genes,
  mart = tair_mart,
  attributes = c('ensembl_gene_id', 'entrezgene_id',
    'description', 'external_gene_name'),
  filters = 'ensembl_gene_id')

  ensembl_gene_id entrezgene_id
1       AT2G14610        815949
2       AT3G15950        820839
3       AT3G26830        822298
4       AT3G54830            NA
5       AT4G23700        828470
6       AT5G24105       2745995
                                                                                          description
1                             Pathogenesis-related protein 1 [Source:UniProtKB/Swiss-Prot;Acc:P33154]
2                                          TSA1-like protein [Source:UniProtKB/Swiss-Prot;Acc:Q9LSB4]
3 Bifunctional dihydrocamalexate synthase/camalexin synthase [Source:UniProtKB/Swiss-Prot;Acc:Q9LW27]
4                                                                                                    
5                                  Cation/H(+) antiporter 17 [Source:UniProtKB/Swiss-Prot;Acc:Q9SUQ7]
6                                 Arabinogalactan protein 41 [Source:UniProtKB/Swiss-Prot;Acc:Q8L9T8]
  external_gene_name
1                PR1
2               NAI2
3           CYP71B15
4                   
5              CHX17
6              AGP41

If you want a complete table from biomaRt, just use:

annotComplete <- getBM(
  mart = tair_mart,
  attributes = c('ensembl_gene_id', 'entrezgene_id',
    'description', 'external_gene_name'))

dim(annotComplete)
[1] 33528     4

Kevin

ADD COMMENTlink written 23 days ago by Kevin Blighe66k
1
gravatar for genomax
23 days ago by
genomax91k
United States
genomax91k wrote:

Using EntrezDirect:

$ esearch -db gene -query "Arabidopsis thaliana [ORGN]" | efetch -format tabular | head -5
tax_id  Org_name    GeneID  CurrentID   Status  Symbol  Aliases description other_designations  map_location    chromosome  genomic_nucleotide_accession.version    start_position_on_the_genomic_accession end_position_on_the_genomic_accession   orientation exon_count  OMIM
3702    Arabidopsis thaliana    816394  0   live    PHYB    AT2G18790, HY3, MSF3.17, MSF3_17, OOP1, OUT OF PHASE 1, PHYTOCHROME B, phytochrome B    phytochrome B   phytochrome B       2   NC_003071.7 8139756 8144461 plus    3
3702    Arabidopsis thaliana    830878  0   live    FLC AT5G10140, AGAMOUS-like 25, AGL25, FLF, FLOWERING LOCUS C, FLOWERING LOCUS F, MADS BOX PROTEIN FLOWERING LOCUS F, REDUCED STEM BRANCHING 6, RSB6, T31P16.130, T31P16_130    K-box region and MADS-box transcription factor family protein   K-box region and MADS-box transcription factor family protein       5   NC_003076.8 3173382 3179448 minus   8
3702    Arabidopsis thaliana    817857  0   live    COP1    AT2G32950, ARABIDOPSIS THALIANA CONSTITUTIVE PHOTOMORPHOGENIC 1, ATCOP1, CONSTITUTIVE PHOTOMORPHOGENIC 1, DEETIOLATED MUTANT 340, DET340, EMB168, EMBRYO DEFECTIVE 168, FUS1, FUSCA 1, T21L14.11, T21L14_11 Transducin/WD40 repeat-like superfamily protein Transducin/WD40 repeat-like superfamily protein     2   NC_003071.7 13977881    13983609    plus    13
3702    Arabidopsis thaliana    842859  0   live    FT  AT1G65480, F5I14.3, F5I14_3, FLOWERING LOCUS T, REDUCED STEM BRANCHING 8, RSB8  PEBP (phosphatidylethanolamine-binding protein) family protein  PEBP (phosphatidylethanolamine-binding protein) family protein      1   NC_003070.9 24331373    24333999    plus    4

OR

$ esearch -db gene -query "Arabidopsis thaliana [ORGN]" | esummary | xtract -pattern DocumentSummary -element Name,OtherAliases | head -5 | awk -F "\t|," '{OFS="\t"}{print $2,$1}'
AT2G18790   PHYB
AT5G10140   FLC
AT2G32950   COP1
AT1G65480   FT
AT1G09570   PHYA

Remove head command to get them all.

ADD COMMENTlink modified 23 days ago • written 23 days ago by genomax91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1340 users visited in the last hour