Question

Sequence corresponding to each probe id of affymetrix gene chip

1

Entering edit mode

5.9 years ago

sujasubramanian ▴ 70

How can I get the sequence corresponding to each probe id of affymetrix gene chip?

R • 3.8k views

ADD COMMENT • link updated 5.8 years ago by zx8754 11k • written 5.9 years ago by sujasubramanian ▴ 70

0

Entering edit mode

which chip? Affymetrix markets several chips.

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, i need probe level data for further analysis , is this available publically. i want to analyse a cell line gene expression data with probe level sequence of that particular cell line. Gene expression i can get from GEO, from where i get probe level input sequence of each experiment, is there any dataset ? perfect match sequence i can collect from Afffymetrix / ThermoFisher website, from where i get the mismatch sequence

ADD REPLY • link 5.9 years ago by sujasubramanian ▴ 70

0

Entering edit mode

try here: http://www.affymetrix.com/support/technical/byproduct.affx?product=hg-u133-plus and package in R/Bioc: https://bioconductor.org/packages/release/data/annotation/html/pd.hg.u133.plus.2.html

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

Hello, i am using gene expression dataset [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array, i download CEL files from GEO and normalise it using RMA, some genes are found in multiple times while analysing the data: for eg: gene 'DDR1' has 3 expression values , how can i choose an expression of DDR1 from these 3 values, by using BiomaRt i can get chromosome details of these genes, but the result obtained from getBM contains more duplicate entries, is there is any other way to connect the genes with its chromosome details.

ADD REPLY • link 5.8 years ago by sujasubramanian ▴ 70

0

Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY • link 5.8 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes, more than 1 probe can map to the same gene. When you normalise the data from the CEL file stage via the rma() function, you can typically 'summarise' expression values over the individual probes or over full transcripts by modifying the target parameter that is passed to rma()

ADD REPLY • link 5.8 years ago by Kevin Blighe 87k

0

Entering edit mode

can you write one example , i didn't get you the 'summarise' expression, averaging is not a good option for getting a full transcripts of a gene is it so?

ADD REPLY • link 5.8 years ago by sujasubramanian ▴ 70

0

Entering edit mode

At which GEO record are you looking? They usually provide the expression 'summarised' over each transcript.

When we say 'summarised', we refer to the way in which the expression values are calculated. The usual options are:

summarised per probe-set (a probe-set usually targets a single exon of a gene)
summarised per gene / transcript, in which case expression values will be gathered from numerous probe-sets

ADD REPLY • link 5.8 years ago by Kevin Blighe 87k

score 2 · Answer 1 · 2018-06-01

These are [thankfully] available on the Afffymetrix / ThermoFisher website. The documentation for Affymetrix arrays is comprehensive.

For example, for HuGene ST 1.0 / 2.0, the available documentation can be found here: Human Gene ST Arrays - Support Materials

The specific files that you'll want are the Sequence Files

head HuGene-2_0-st-v1.hg19.probe.fa 

>probe:HuGene-2_0-st-v1:1909182-16657436;573:1184; ProbeID=1909182; TranscriptClusterID=16657436; Assembly=build-GRCh37/hg19; Seqname=chr1; Start=12200; Stop=12224; Strand=+; Sense; category=main
CCTAGGTTGTGAGAGAAGTTGATGC

>probe:HuGene-2_0-st-v1:1481686-16657436;257:919; ProbeID=1481686; TranscriptClusterID=16657436; Assembly=build-GRCh37/hg19; Seqname=chr1; Start=12616; Stop=12640; Strand=+; Sense; category=main
GAAGGGCATGCCTGGCATCACCACA

>probe:HuGene-2_0-st-v1:2398055-16657436;1010:1487; ProbeID=2398055; TranscriptClusterID=16657436; Assembly=build-GRCh37/hg19; Seqname=chr1; Start=12644; Stop=12668; Strand=+; Sense; category=main
TCTGCAGCTCTGGAGACCTGATGCT

>probe:HuGene-2_0-st-v1:403478-16657436;477:250; ProbeID=403478; TranscriptClusterID=16657436; Assembly=build-GRCh37/hg19; Seqname=chr1; Start=12668; Stop=12692; Strand=+; Sense; category=main
TGTGATCCAAGTCGGCCGTCGTCTT

>probe:HuGene-2_0-st-v1:1579074-16657436;925:979; ProbeID=1579074; TranscriptClusterID=16657436; Assembly=build-GRCh37/hg19; Seqname=chr1; Start=12669; Stop=12693; Strand=+; Sense; category=main
GTGTGATCCAAGTCGGCCGTCGTCT

You should be able to link these back to the original gene via the TranscriptClusterID. If you have done your annotation via some automated R package, then you may still have this ID.

By the way, if you have already used an R package for annotation, then it may already have a function that provides the sequences - check for that.

Kevin