Question: Multiple transcripts for same gene in array gene expression profile?
0
gravatar for ravelarvargas
21 months ago by
ravelarvargas0 wrote:

I am analysing microarray data from GEO for the first time. The data has already been processed and normalized, but the illumina beads have caught multiple transcripts for the same gene (I assume), meaning that some of my genes appear twice (e.g. MCL1 is a gene I am looking at and has two different Illumina ID's associated with it). I am trying to look at total gene expression and gene expression differences between diseased states, so I need to aggregate the data. How can I do this when some genes appear multiple times?

ADD COMMENTlink modified 21 months ago by Kevin Blighe60k • written 21 months ago by ravelarvargas0

May be you can take average of expression of all the transcript as gene expression.

ADD REPLYlink written 21 months ago by Prakash1.9k
4
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

The ideal situation would be to obtain the raw data CEL or TXT files (edit: or whatever BEAD arrays use) and then summarise expression over genes using RMA and the median polish.

Given the data that you've got, you can just summarise it yourself as follows:

df <- data.frame(
  c("gene1","gene1","gene2","gene2","gene2","gene3"),
  c(1,2,3,10,20,30),
  c(60,50,40,3,2,1),
  c(7,8,9,100,110,120),
  c(12,11,10,9,8,7)
)

colnames(df) <- c("gene","sample1","sample2","sample3","sample4")

df
   gene sample1 sample2 sample3 sample4
1 gene1       1      60       7      12
2 gene1       2      50       8      11
3 gene2       3      40       9      10
4 gene2      10       3     100       9
5 gene2      20       2     110       8
6 gene3      30       1     120       7

summarise by mean

aggregate(df[,2:ncol(df)], by=df[1], mean)
   gene sample1 sample2 sample3 sample4
1 gene1     1.5      55     7.5    11.5
2 gene2    11.0      15    73.0     9.0
3 gene3    30.0       1   120.0     7.0

summarise by median

    aggregate(df[,2:ncol(df)], by=df[1], median)
   gene sample1 sample2 sample3 sample4
1 gene1     1.5      55     7.5    11.5
2 gene2    10.0       3   100.0     9.0
3 gene3    30.0       1   120.0     7.0

summarise by sum

     aggregate(df[,2:ncol(df)], by=df[1], sum)
   gene sample1 sample2 sample3 sample4
1 gene1       3     110      15      23
2 gene2      33      45     219      27
3 gene3      30       1     120       7

Kevin

ADD COMMENTlink modified 17 months ago • written 21 months ago by Kevin Blighe60k

Dear @Kevin Blighe,

How to proceed with the FDR of each gene? Example: if I do a mean of expression between the probes of the same gene, should I also do a mean of FDR value? Would you have any reference to indicate me?

Best, Leite

ADD REPLYlink written 17 months ago by Leite980

You could just fit your own linear or logistic model to the data with lm() or glm(), and the adjust the p-values with p.adjust(), or use my package: https://bioconductor.org/packages/release/data/experiment/html/RegParallel.html

Better to have the raw data CEL, TXT, GAL, or DAT files, though.

ADD REPLYlink written 17 months ago by Kevin Blighe60k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2057 users visited in the last hour