Question: Multiple transcripts for same gene in array gene expression profile?
0
gravatar for ravelarvargas
13 months ago by
ravelarvargas0 wrote:

I am analysing microarray data from GEO for the first time. The data has already been processed and normalized, but the illumina beads have caught multiple transcripts for the same gene (I assume), meaning that some of my genes appear twice (e.g. MCL1 is a gene I am looking at and has two different Illumina ID's associated with it). I am trying to look at total gene expression and gene expression differences between diseased states, so I need to aggregate the data. How can I do this when some genes appear multiple times?

ADD COMMENTlink modified 13 months ago by Kevin Blighe48k • written 13 months ago by ravelarvargas0

May be you can take average of expression of all the transcript as gene expression.

ADD REPLYlink written 13 months ago by Prakash1.5k
3
gravatar for Kevin Blighe
13 months ago by
Kevin Blighe48k
Kevin Blighe48k wrote:

The ideal situation would be to obtain the raw data CEL or TXT files (edit: or whatever BEAD arrays use) and then summarise expression over genes using RMA and the median polish.

Given the data that you've got, you can just summarise it yourself as follows:

df <- data.frame(
  c("gene1","gene1","gene2","gene2","gene2","gene3"),
  c(1,2,3,10,20,30),
  c(60,50,40,3,2,1),
  c(7,8,9,100,110,120),
  c(12,11,10,9,8,7)
)

colnames(df) <- c("gene","sample1","sample2","sample3","sample4")

df
   gene sample1 sample2 sample3 sample4
1 gene1       1      60       7      12
2 gene1       2      50       8      11
3 gene2       3      40       9      10
4 gene2      10       3     100       9
5 gene2      20       2     110       8
6 gene3      30       1     120       7

summarise by mean

aggregate(df[,2:ncol(df)], by=df[1], mean)
   gene sample1 sample2 sample3 sample4
1 gene1     1.5      55     7.5    11.5
2 gene2    11.0      15    73.0     9.0
3 gene3    30.0       1   120.0     7.0

summarise by median

    aggregate(df[,2:ncol(df)], by=df[1], median)
   gene sample1 sample2 sample3 sample4
1 gene1     1.5      55     7.5    11.5
2 gene2    10.0       3   100.0     9.0
3 gene3    30.0       1   120.0     7.0

summarise by sum

     aggregate(df[,2:ncol(df)], by=df[1], sum)
   gene sample1 sample2 sample3 sample4
1 gene1       3     110      15      23
2 gene2      33      45     219      27
3 gene3      30       1     120       7

Kevin

ADD COMMENTlink modified 8 months ago • written 13 months ago by Kevin Blighe48k

Dear @Kevin Blighe,

How to proceed with the FDR of each gene? Example: if I do a mean of expression between the probes of the same gene, should I also do a mean of FDR value? Would you have any reference to indicate me?

Best, Leite

ADD REPLYlink written 8 months ago by Leite410

You could just fit your own linear or logistic model to the data with lm() or glm(), and the adjust the p-values with p.adjust(), or use my package: https://bioconductor.org/packages/release/data/experiment/html/RegParallel.html

Better to have the raw data CEL, TXT, GAL, or DAT files, though.

ADD REPLYlink written 8 months ago by Kevin Blighe48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1505 users visited in the last hour