Sequence corresponding to each probe id of affymetrix gene chip
1
1
Entering edit mode
5.9 years ago

How can I get the sequence corresponding to each probe id of affymetrix gene chip?

R • 3.8k views
ADD COMMENT
0
Entering edit mode

which chip? Affymetrix markets several chips.

ADD REPLY
0
Entering edit mode

[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, i need probe level data for further analysis , is this available publically. i want to analyse a cell line gene expression data with probe level sequence of that particular cell line. Gene expression i can get from GEO, from where i get probe level input sequence of each experiment, is there any dataset ? perfect match sequence i can collect from Afffymetrix / ThermoFisher website, from where i get the mismatch sequence

ADD REPLY
0
Entering edit mode

Hello, i am using gene expression dataset [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, i download CEL files from GEO and normalise it using RMA, some genes are found in multiple times while analysing the data: for eg: gene 'DDR1' has 3 expression values , how can i choose an expression of DDR1 from these 3 values, by using BiomaRt i can get chromosome details of these genes, but the result obtained from getBM contains more duplicate entries, is there is any other way to connect the genes with its chromosome details.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY
0
Entering edit mode

Yes, more than 1 probe can map to the same gene. When you normalise the data from the CEL file stage via the rma() function, you can typically 'summarise' expression values over the individual probes or over full transcripts by modifying the target parameter that is passed to rma()

ADD REPLY
0
Entering edit mode

can you write one example , i didn't get you the 'summarise' expression, averaging is not a good option for getting a full transcripts of a gene is it so?

ADD REPLY
0
Entering edit mode

At which GEO record are you looking? They usually provide the expression 'summarised' over each transcript.

When we say 'summarised', we refer to the way in which the expression values are calculated. The usual options are:

  • summarised per probe-set (a probe-set usually targets a single exon of a gene)
  • summarised per gene / transcript, in which case expression values will be gathered from numerous probe-sets
ADD REPLY
2
Entering edit mode
5.9 years ago

These are [thankfully] available on the Afffymetrix / ThermoFisher website. The documentation for Affymetrix arrays is comprehensive.

For example, for HuGene ST 1.0 / 2.0, the available documentation can be found here: Human Gene ST Arrays - Support Materials

The specific files that you'll want are the Sequence Files

f

head HuGene-2_0-st-v1.hg19.probe.fa 

>probe:HuGene-2_0-st-v1:1909182-16657436;573:1184; ProbeID=1909182; TranscriptClusterID=16657436; Assembly=build-GRCh37/hg19; Seqname=chr1; Start=12200; Stop=12224; Strand=+; Sense; category=main
CCTAGGTTGTGAGAGAAGTTGATGC

>probe:HuGene-2_0-st-v1:1481686-16657436;257:919; ProbeID=1481686; TranscriptClusterID=16657436; Assembly=build-GRCh37/hg19; Seqname=chr1; Start=12616; Stop=12640; Strand=+; Sense; category=main
GAAGGGCATGCCTGGCATCACCACA

>probe:HuGene-2_0-st-v1:2398055-16657436;1010:1487; ProbeID=2398055; TranscriptClusterID=16657436; Assembly=build-GRCh37/hg19; Seqname=chr1; Start=12644; Stop=12668; Strand=+; Sense; category=main
TCTGCAGCTCTGGAGACCTGATGCT

>probe:HuGene-2_0-st-v1:403478-16657436;477:250; ProbeID=403478; TranscriptClusterID=16657436; Assembly=build-GRCh37/hg19; Seqname=chr1; Start=12668; Stop=12692; Strand=+; Sense; category=main
TGTGATCCAAGTCGGCCGTCGTCTT

>probe:HuGene-2_0-st-v1:1579074-16657436;925:979; ProbeID=1579074; TranscriptClusterID=16657436; Assembly=build-GRCh37/hg19; Seqname=chr1; Start=12669; Stop=12693; Strand=+; Sense; category=main
GTGTGATCCAAGTCGGCCGTCGTCT

You should be able to link these back to the original gene via the TranscriptClusterID. If you have done your annotation via some automated R package, then you may still have this ID.

By the way, if you have already used an R package for annotation, then it may already have a function that provides the sequences - check for that.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you kevin for your quick response, it really helps me, but still i have some doubts, i am using gene expression dataset [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, i need probe level data for further analysis , is this available publically. i want to analyse a cell line gene expression data with probe level sequence of that particular cell line. Gene expression i can get from GEO, from where i get probe level input sequence of each experiment, is there any dataset ? perfect match sequence i can collect from Afffymetrix / ThermoFisher website, from where i get the mismatch sequence

ADD REPLY
1
Entering edit mode

To get probe-level expression values, you should download the raw data CEL files from the GEO and then re-process them with the oligo package. When background correcting, normalising, and transforming the data with the rma() function, specify target="probeset". This will give you virtual probe-level expression values. See here: [HuEx-1_0-st] Affymetrix Human Exon 1.0 ST Array [transcript (gene) version]

Other information can be found on the page linked by cpad.

ADD REPLY

Login before adding your answer.

Traffic: 2632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6