Question: How can I combine different Affymetrix platform?
0
gravatar for lur_murad
6 days ago by
lur_murad0
UK
lur_murad0 wrote:

I would like to preprocess the microarry dataset GSE9006 which is Gene expression in PBMCs for children with diabetes. The array platforms are two, Affymetrix Human Genome HG-U133A and Affymetrix Human Genome HG-U133B. I need to combine two platform to increase the number of samples (n) and then analyse them for geting differential expression for each genes is it enough to download the data and normalize the expression matrix for each platform then merge them according to Gene Entrez? Thanks a lot for your cooperation

affymetrix • 136 views
ADD COMMENTlink modified 6 days ago • written 6 days ago by lur_murad0

Have a look at this previous question,

what is difference between HG-U133A and HG-U133B array ? which one to use ?,

and at the Affymetrix documentation for the arrays.

The Affymetrix HG-U133A and HG-U133B arrays were a set, not different platforms, so you might want to check whether samples from each patient were run on both arrays.

See also this page:

https://www.ncbi.nlm.nih.gov/gds?term=GSE9006

ADD REPLYlink modified 6 days ago • written 6 days ago by mastal5111.7k

@mastal511 I want to use both of them each platform contain (117 sample : 80T1D 24 Normal 12 T2D) I want to use all the T1D vs Normal samples which is (160 T1D vs 48 Normal). each platform has a defferent probes only 4478 common genes between them. and the expression level is deffrent as well.

I am new in the Bioinformatic I need a large number of samples to run my approach. How can i combain these set as you called them?

ADD REPLYlink written 6 days ago by lur_murad0

Please take a look at some of the comments here: How to integrate multiple data sets from microarray platform prior meta-analysis?

The ideal situation would be to use just the common genes and then include 'ArrayVersion' as a covariate in all downstream statistical analyses. I'm not sure there is any ideal way to use genes that don't overlap - where they don't overlap, the values would just have to be NA in samples were there's no data.

ADD REPLYlink written 6 days ago by Kevin Blighe6.7k

Thank you Kevin What do you mean by Merge the data I have 117 samples in each of them and 4478 common genes do ou mean simply combain the samples which will be 4478 genesx224 samples

%%%%%%%%%%%%%

Thank you Kevin Yeah I realised that some genes have reverse fold-changes. Do you think using one platform will be better than merge them?

My problem is the limited number of normal only 24 vs 80 T1D

ADD REPLYlink modified 6 days ago • written 6 days ago by lur_murad0

Yes, and then create a new categorical variable that records the array from which each sample was obtained (and include this as a covariate in all downstream analyses).

However, it looks like you have major issues with this data, as some genes have reverse fold-changes.

ADD REPLYlink written 6 days ago by Kevin Blighe6.7k

The 2 arrays form a set. The U133A array contained more well-known genes, and the U133B more probesets based on evidence from ESTs, and each array contained some 22K probesets.

Essentially there are 117 samples, presumably each run on both U133A and U133B, and in total, you have intensity measurements from some 44K or so probesets. So half the information from each sample is on the A array, and the other half is on the B array. This is not the same as trying to combine information from different experiments run on different technologies, like for example, Affymetrix expression arrays and Illumina expression arrays, which are designed in different ways.

Some genes will have several probesets assigned to them, either on the same array, or some on the A array and some on the B array, but the probesets will probably be looking at different parts of the gene (although in general most of the probes on those types of Affymetrix arrays targeted the 3' ends of the genes), or have been designed to target alternative transcripts produced from the same gene.

What you should do is look at which probesets are differentially expressed. The annotations (genes the probesets are assigned to) for the probesets may change from time to time, especially for probesets on the B array that were designed based on information from ESTs.

ADD REPLYlink modified 6 days ago • written 6 days ago by mastal5111.7k
Gene    U133A       U133B
'ABCC4' 1.316116395     1.099865145
'ADH1B' 1.327648636     -1.008188273
'ANGPTL4'   -2.350192037        -1.025937412
'ANK2'  1.558927203     -1.278292427
'ANXA1' -1.742008276        -1.01118217
'APOB'  -1.990832086        -1.189922907
'ARG1'  -1.318281938        1.238138398
'ARGLU1'    -1.636577742        -1.208285498
'ATP6V1D'   1.491767021     1.163026149
'BCL2L10'   -1.363365385        -1.033854088
'BCL2L14'   1.562237198     -1.02335048
'BHMT2' 1.797793045     -1.136313447
'BNC2'  -1.424195868        1.000070992
'BNC2'  -1.424195868        1.000070992
'CADM1' -1.384649004        1.042315341
'CADPS' 1.46785031      -1.041100165
'CBS'   -1.520996842        -1.066793757
'CD48'  1.410695857     1.096595355
'CD48'  1.410695857     1.096595355
'CHRM3' 1.35536998      1.003250995
'CLEC2D'    -1.302040283        -1.083768888
'CNDP2' -1.373034767        1.00867911
'CNTN5' 1.683794551     -1.270780239
'COL11A1'   1.32174493      1.194591417
'COL3A1'    1.303236594     -1.189626784
'COL4A5'    -1.306929948        -1.07094705
'CYBA'  -1.328410667        -1.128989273
'CYP19A1'   -1.341314108        -1.102602861
'CYP3A4'    1.634722747     1.056378251
'DCC'   1.510835634     -1.603497887
'DGCR14'    -1.342113648        -1.139748612
'DGUOK' 1.362356981     1.047008968
'DLC1'  1.497617851     1.025559915
'DMD'   1.365260185     1.146638213
'DPF3'  1.38986442      1.032422698
'EIF2AK3'   1.514038543     -1.275649019
'EPHA1' 1.428691574     1.064797524
'ERBB3' 1.556130395     -1.118611358
'ERBB4' 1.474139318     -1.223614286
'EVC'   1.744292145     -1.216207301
'F7'    -1.456596317        -1.256995954
'FADS2' 1.479012346     -1.052877335
'FAF1'  -1.314893414        1.019691453
'FGF12' 1.448485833     1.098273878
'FOXO1' -1.464185608        1.299814419
'GCLM'  -1.314824337        -1.052402075
'GJA5'  -1.610810766        -1.174838446
'GJC1'  1.317340693     -1.109040698
'GORASP1'   -1.369677308        1.084044212
'GPR98' -1.396324739        1.396612693
'GRIA4' 2.629273123     -1.162611226
'HDAC2' 1.458966128     -1.208953456
'HIVEP3'    1.339655881     -1.003643941
'HLA-DPB1'  -1.592675725        -1.02829585
'HSPA1L'    -1.322954357        1.14511081
'IGFBP1'    -1.334367668        -1.370205268
'IRF5'  1.392265225     -1.607073609
'ITGA1' 1.376487971     1.253676413
'ITGA2' 1.376399306     1.450818904
'ITGB6' -1.411455964        -1.181282596
'KLF15' 1.313649682     -1.2925
'KNG1'  1.746125086     -1.112399583
'LIPA'  -1.353394451        1.103174951
'LMNA'  -1.322405043        -1.204981404
'LMO7'  1.348071991     1.123337533
'MCM10' -1.330606937        1.673279525
'MGP'   -1.453242397        -1.787837838
'MYH7B' -2.387075147        1.058239767
'NAMPT' -1.334271765        -1.182607211
'NCOR1' 1.845354126     -1.037591636
'NFKBIB'    -1.450484045        -1.804063082
'NOS3'  -1.847852825        1.181066383
'NRXN3' -1.713424793        -1.027124238
'NTNG1' 1.459206108     1.160738621
'NUP133'    -1.367441332        -1.028757889
'OSM'   -1.469012887        -1.936409991
'PCYT1B'    1.419738334     1.102407891
'PDE4A' -1.956840391        -1.088125842
'PLD1'  -1.363600956        1.069971161
'PPP1R9A'   1.566093718     -1.002639916
'PRCP'  1.312576772     1.067534384
'PTGER1'    -1.815607567        -1.129215561
'PTPN2' 2.170804719     1.070092057
'PTPRD' 1.410802605     1.176360691
'PTPRG' -1.538189625        -1.302427304
'RAB6B' 1.400380196     -1.007393603
'RALGPS1'   -1.976895078        1.403315697
'RBMS1' 1.382796022     1.168304383
'RBMS1' 1.382796022     1.168304383
'RFC3'  1.302815625     1.082265432
'RFX2'  1.314235465     -1.5897521
'RNF17' -1.417139519        -1.532896813
'ROR1'  1.547575034     -1.208169794
'SASH1' -1.341710284        -1.084923833
'SCARB1'    -1.407964198        -1.113158737
'SCNN1G'    1.433211374     -1.137954715
'SLC27A5'   -2.103063728        -1.228554995
'SLC6A13'   1.571661605     1.131365114
'SLIT2' 2.205648092     -1.038132911
'SMARCB1'   -1.311952145        -1.012119746
'SOCS3' -2.774464415        -1.994889526
'SORBS1'    -1.562808615        -1.12105425
'SORBS2'    1.321248741     -1.306857616
'ST8SIA4'   -1.314179272        -1.237198388
'TCF7L2'    1.509933423     1.016953742
'THBS1' -1.35321437     1.161802987
'TRAF6' 1.477840872     -1.036330087
'TUB'   -1.301811754        1.093728112
'UPB1'  1.315262645     1.192613904
'WWOX'  -1.313117593        -1.033460492
'ZFAND5'    1.320821218     -1.022301519

I am intersted in those 112 genes. As you see their FC values is differnet in each platform How can I combain U133A+U133B in one expression matrix

ADD REPLYlink modified 6 days ago • written 6 days ago by lur_murad0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1660 users visited in the last hour