Should Affymetrix probe annotation include only specific gene biotypes?
1
0
Entering edit mode
3.9 years ago
Aspire ▴ 300

I am converting probe names from Affymetrix to Ensembl.

In the process of doing so, I have noticed that Affymetrix annotations (36MB file) have only one annotation for the probe 117_at. The corresponding version of Ensembl in biomart, in addition to listing HSPA6 as the gene symbol for 177_at also has an unproccesed pseudogene ENSG00000225217 (HSPA7) corresponding to this probeset.

My questions are (1) Why was the HSPA7 omitted from Affymetrix annotations? Is it due to the biotype? (2) I assume that it is better to work with the latest version of Ensembl (please correct me, if my assumption is wrong ) rather than with the Affymetrix annotations - that are of Ensebml 82. Should I select the Affymetrix probes that correspond to specific gene biotypes only?

microarrays affymetrix ensembl • 870 views
ADD COMMENT
1
Entering edit mode
3.9 years ago

(1) Why was the HSPA7 omitted from Affymetrix annotations? Is it due to the biotype?

That would be a question for Affymetrix. However, the primary target for the probe is HSPA6; so, that is [I presume] why only that is listed. BiomaRt actually lists all of them:

require("biomaRt")
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
getBM(
  mart = mart,
  attributes = c(
    "affy_hg_u133_plus_2",
    "ensembl_gene_id",
    "gene_biotype",
    "external_gene_name"),
  filter = "affy_hg_u133_plus_2",
  values = '117_at',
  uniqueRows=TRUE)

  affy_hg_u133_plus_2 ensembl_gene_id           gene_biotype external_gene_name
1              117_at ENSG00000225217 unprocessed_pseudogene              HSPA7
2              117_at ENSG00000173110         protein_coding              HSPA6
3              117_at ENSG00000273112                 lncRNA         AL590385.2
4              117_at ENSG00000244682 polymorphic_pseudogene             FCGR2C
5              117_at ENSG00000143226         protein_coding             FCGR2A

If you look at the target region at the UCSC Genome Browser, you can begin to see what's happening:

jjj

So,:

  • HSPA6 is target
  • HSPA7 is included due to the fact that, as HSPA7 is a pseudogene, the probe sequence may likely target it, too. However, as HSPA7 is an unprocessed pseudogene, it can be inferred that it may not even be expressed
  • The FCGR2A and FCGR2C genes are included because there is a 'rogue' non-coding RNA, AL590385.2, that is transcribed across all of these genes in this region

(2) I assume that it is better to work with the latest version of Ensembl (please correct me, if my assumption is wrong ) rather than with the Affymetrix annotations - that are of Ensebml 82. Should I select the Affymetrix probes that correspond to specific gene biotypes only?

Yes and No - these are design choices that you must make as the analyst. When annotating, you could code it such that the protein coding target, if present, is used in preference to other biotypes. Irrespective, in this case, HSPA6 is the target.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6