Question: Annotation For Affymetrix Probe Id "241838_At"
4
gravatar for Khader Shameer
8.8 years ago by
Manhattan, NY
Khader Shameer18k wrote:

241838_at is a significant hit in a gene expression analysis that I am currently working on. Affymetrix annotation provides Gene Symbol for this probe as

"chr6:167330486-167330903 (-)" with additional notes "This probe set was annotated using the Accession mapped clusters based pipeline to a UniGene identifier using 5 transcripts.".

There is no further annotation available for this probe in ADAPT, GATExplorer or AILUN. As this particular probe is a significant hit, I would like to know how can I report this. I would like to know the community is dealing with results based on such ambigous probes ? What could be the reason for Affymetrix to keep such a non-specific (GATExplorer says no genes are mapped to this probe) probe in the chip ?

ADD COMMENTlink modified 8.8 years ago by Laurent Gautier810 • written 8.8 years ago by Khader Shameer18k

All answers are nice and helped me to get a new insight in to the problem. I will be selecting best answer as the one with maximum votes by next week.

ADD REPLYlink written 8.8 years ago by Khader Shameer18k
6
gravatar for hurfdurf
8.8 years ago by
hurfdurf460
United States
hurfdurf460 wrote:

Why not look at the [?]probe alignment itself[?] on Ensembl? In this case the probe is intronic to the processed but noncoding transcript RP1-167A14.2. There are ESTs overlapping the probeset which are likely the source sequence used as evidence for inclusion of the probeset.

Affy tends to put every possible exon on the probesets and let the users puzzle out which ones are real rather than stick to a minimal canonical set of genes which may be proven wrong in the future.

You may also want to check the individual probe values for this probeset and reconcile them with any spurious mismatch alignments with other RNA species that could be causing off-target signal before proceeding further.

ADD COMMENTlink written 8.8 years ago by hurfdurf460

Thanks for the insights.

ADD REPLYlink written 8.8 years ago by Khader Shameer18k
5
gravatar for Laurent Gautier
8.8 years ago by
Laurent Gautier810 wrote:

As Daniel points it out there has been a drift between the "transcriptome as we thought we knew it" when arrays were designed and "the transcriptome as we know it today" (shameless plug to an early reference where this was called a "Dorian Gray effect").

If you are using bioconductor to perform the analysis, do consider using probe remapping to perform the same analysis (the MBNI provides regular updates of mappings built against RefSeq and other databases - latest is from July 2010).

ADD COMMENTlink written 8.8 years ago by Laurent Gautier810

Igautier, Thanks for this. This is very useful.

ADD REPLYlink written 8.8 years ago by Khader Shameer18k

I have also found the "customCDFs" (linked above) to be extremely useful. In a recent study I used both the current standard Affymetrix annotations and custom annotations to identify ~100 probesets useful for a specific classification problem. Manual validation of these probesets by alignment to reference genome found that ~10% of the standard probesets no longer work given our current understanding of the transcriptome (the problem is usually ambiguous assignment of probes to multiple loci). CustomCDF annotations had an almost perfect validation rate (unambiguous alignment to expected locus).

ADD REPLYlink written 7.4 years ago by Obi Griffith17k

One caveat - occasionally the customCDF probesets do not perform as expected. For example, U133A probesets for ESR1. From the standard CDF, only a single probeset out of nine (205225_at) works well for distinguishing ESR1+ from ESR1- patient samples (PMID:17329190). The single customCDF probe set for ESR1 doesn't work either, although alignment to genome doesn't reveal obvious problems. So, in this case, using customCDF will have poor results for an important gene. This experience has led me to use both custom/standard probeset annotations and sort out best probesets downstream.

ADD REPLYlink written 7.4 years ago by Obi Griffith17k
4
gravatar for Tim_Yates
8.8 years ago by
Tim_Yates110
Tim_Yates110 wrote:

I've got a mapping for the plus2 probes to Ensembl v58 (not the latest v59 though), and the stats I have on that probeset are:

11 probes
10 probes hit the human genome (1 misses):
  chr6:167410826-167410850 (-)
  chr6:167410818-167410842 (-)
  chr6:167410788-167410812 (-)
  chr6:167410774-167410798 (-)
  chr6:167410720-167410744 (-)
  chr6:167410653-167410677 (-)
  chr6:167410637-167410661 (-)
  chr6:167410601-167410625 (-)
  chr6:167410556-167410580 (-)
  chr6:167410543-167410567 (-)

This means that all the probes (that hit) are in the 5' intronic region of ENSG00000227598 (ENST00000444102) and also in the 5' intronic region of ENSESTG00007278250 (ENSESTT00007324270)

ADD COMMENTlink modified 8.8 years ago • written 8.8 years ago by Tim_Yates110
1

Basically, I have run the HG-U133_Plus_2.probe_tab file (downloaded from Affy) through my X:Map pipeline to get probe->genomic locations mappings. (The same as I used to do for ADAPT, but ADAPT just scanned CDNA sequences). I get the probe tab file, extract the probes, and then run them all through Bowtie (after generating the bowtie index for the Reference Genome of interest).

ADD REPLYlink written 8.8 years ago by Tim_Yates110

Thanks for this information, Tim.

ADD REPLYlink written 8.8 years ago by Khader Shameer18k

Thanks for this information, Tim. Can you tell me how / using what tool you did this search and obtained the mapping results ?

ADD REPLYlink written 8.8 years ago by Khader Shameer18k
2
gravatar for Daniel Swan
8.8 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

I think the point is when the U133plus2 chips were designed (I think this probe is from that chip from a quick look at NetAffx) there were a number of cDNA transcripts - indeed in this case a cluster thereof, potentially of unknown function that were used to design the probesets against. Over the course of time, this hasn't become a 'gene' or indeed any particular feature that we would find mapped onto a genome build.

So this boils down to a few things really, either you check your probes against a new build of the genome to make sure each one maps to something we recognise as 'real' or you use a remapped cdf file for your analysis (discussed in answers passim).

You could check the original IMAGE clones (etc. listed on NetAffx) to see whether they have been quietly sidelined, or indeed map to where you think the probeset should on a genome build.

Personally I report Affy accessions rather than gene names when reporting data. It's up to somebody else (perhaps) to disambiguate the situation. Sometimes these arrays throw up things you would spend more time chasing down than is useful or practical.

ADD COMMENTlink written 8.8 years ago by Daniel Swan13k

Thanks Daniel. This probe is significantly expressed in 2 different set of experiments with 10 replicates. I am reporting both genes and the probe ids in the results page, For this particular probe I am planning to report using the probe id.

ADD REPLYlink written 8.8 years ago by Khader Shameer18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1605 users visited in the last hour