Question: match gives me all NA in annotating my genes
0
gravatar for A
2.7 years ago by
A3.9k
A3.9k wrote:

Hi, I am doing GO in R

I downloaded this annotation

https://www.affymetrix.com/analysis/downloads/na33/wtgene-32_2/HuGene-1_0-st-v1.na33.2.hg19.probeset.csv.zip

    annot = read.csv(file = "HuGene-1_0-st.csv", header = T);
    dim(annot)
    probes = names(datExpr)
> head(probes)
[1] "MKL2"    "MAST2"   "KAT5"    "WWC2"    "UBE2Z"   "PHYHIPL"

    probes2annot = match(probes, annot$transcript_cluster_id)

Gives me all NA

sumis.na(probes2annot))

Should return 0 but returns 7243

What I am doing wrong?

> head(annot)
  probeset_id seqname strand  start   stop probe_count
1     7896739    chr1      +  63033  63649          31
2     7896741    chr1      +  69109  70008          24
3     7896743    chr1      + 334144 334272           6
4     7896745    chr1      + 367693 368597          36
5     7896747    chr1      + 564951 565019          28
6     7896751    chr1      + 568069 568136          28
  transcript_cluster_id  exon_id   psr_id
1               7896738 96595544 97686467
2               7896740 96595546 97686470
3               7896742 96595548 97686473
4               7896744 96595550 97686476
5               7896746 96595552 97686479
6               7896750 96595556 97686485
                                                                                                                                                        gene_assignment
1                                                                                                                                            ENST00000492842 // OR4G11P
2                                                                 BC136848 // OR4F17 /// NM_001005240 // OR4F17 /// NM_001004195 // OR4F4 /// ENST00000318050 // OR4F17
3                                                                                                                                                                   ---
4 NM_001005277 // OR4F16 /// NM_001005221 // OR4F29 /// NM_001005504 // OR4F21 /// ENST00000456475 // OR4F29 /// ENST00000456475 // OR4F16 /// ENST00000456475 // OR4F3
5                                                                                                                                                                   ---
6                                                                                                                                                                   ---
                                                                                                                                                                                    mrna_assignment
1                                                                                                                                                   ENST00000492842 // chr1 // 100 // 31 // 31 // 0
2    BC136848 // chr1 // 100 // 24 // 24 // 0 /// NM_001005240 // chr1 // 100 // 24 // 24 // 0 /// NM_001004195 // chr1 // 100 // 24 // 24 // 0 /// ENST00000318050 // chr1 // 100 // 24 // 24 // 0
3               ENST00000455207 // chr1 // 100 // 6 // 6 // 0 /// TCONS_l2_00002387-XLOC_l2_000726 // chr1 // 100 // 6 // 6 // 0 /// TCONS_l2_00002388-XLOC_l2_000726 // chr1 // 100 // 6 // 6 // 0
4 NM_001005277 // chr1 // 100 // 36 // 36 // 0 /// NM_001005221 // chr1 // 100 // 36 // 36 // 0 /// NM_001005504 // chr1 // 89 // 32 // 36 // 0 /// ENST00000456475 // chr1 // 100 // 36 // 36 // 0
5                                                                                                                                                           AK074482 // chr1 // 79 // 22 // 28 // 0
6                                                                                                                                                         NC_001807 // chr1 // 100 // 24 // 24 // 0
  crosshyb_type number_independent_probes number_cross_hyb_probes
1             3                         0                       0
2             3                         0                       0
3             3                         0                       0
4             3                         0                       0
5             3                         0                       0
6             3                         0                       0
  number_nonoverlapping_probes level bounded noBoundedEvidence
1                            4   ---       0                 0
2                            7   ---       0                 0
3                            0   ---       0                 0
4                            6   ---       0                 0
5                            0   ---       0                 0
6                            0   ---       0                 0
  has_cds fl mrna est vegaGene vegaPseudoGene ensGene sgpGene
1       0  0    0   0        0              0       1       0
2       0  1    0   0        0              0       1       0
3       0  0    0   0        0              0       1       0
4       0  3    0   0        0              0       1       0
5       0  0    0   0        0              0       1       0
6       0  0    0   0        0              0       1       0
  exoniphy twinscan geneid genscan genscanSubopt mouse_fl
1        0        0      0       0             0        0
2        0        0      0       0             0        0
3        0        0      0       0             0        0
4        0        0      0       0             0        0
5        0        0      0       0             0        0
6        0        0      0       0             0        0
  mouse_mrna rat_fl rat_mrna microRNAregistry rnaGene mitomap
1          0      0        0                0       0       0
2          0      0        0                0       0       0
3          0      0        0                0       0       0
4          0      0        0                0       0       0
5          0      0        0                0       0       0
6          0      0        0                0       0       0
  probeset_type
1          main
2          main
3          main
4          main
5          main
6          main
>
annotation R gene • 601 views
ADD COMMENTlink modified 2.7 years ago by Satyajeet Khare1.6k • written 2.7 years ago by A3.9k
1
gravatar for michael.ante
2.7 years ago by
michael.ante3.6k
Austria/Vienna
michael.ante3.6k wrote:

Hi,

in your annot table, the column transcript_cluster_id consists of numerical values. There should not be any match. In this case the match function return the value, given by the parameter 'nomatch'.

I guess, you can try match on the gene assignment. As fara as I remember, there are also a lot Affymetrix specific annotation provided in R (see here).

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by michael.ante3.6k

Thank you my data is on
GPL16791 Illumina HiSeq 2500 (Homo sapiens)

I also tried gene assignment by your suggestion that gives NA

ADD REPLYlink written 2.7 years ago by A3.9k
1
gravatar for Satyajeet Khare
2.7 years ago by
Satyajeet Khare1.6k
Pune, India
Satyajeet Khare1.6k wrote:

For Affy ST arrays, you can use oligo read.celfiles function like this...

rawData <- read.celfiles(celFiles)

You can try normalization

Data <- rma(rawData)

And finally try annotation on normalized data

Data <- annotateEset(Data, hugene10sttranscriptcluster.db)

You may have to change the annotation database. Not very sure about that.

ADD COMMENTlink written 2.7 years ago by Satyajeet Khare1.6k

Thank you, my data is Illumina HiSeq 2500

ADD REPLYlink written 2.7 years ago by A3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1499 users visited in the last hour