Question: Multiple transcripts for same gene in array gene expression profile?
0
gravatar for ravelarvargas
8 months ago by
ravelarvargas0 wrote:

I am analysing microarray data from GEO for the first time. The data has already been processed and normalized, but the illumina beads have caught multiple transcripts for the same gene (I assume), meaning that some of my genes appear twice (e.g. MCL1 is a gene I am looking at and has two different Illumina ID's associated with it). I am trying to look at total gene expression and gene expression differences between diseased states, so I need to aggregate the data. How can I do this when some genes appear multiple times?

ADD COMMENTlink modified 8 months ago by Kevin Blighe41k • written 8 months ago by ravelarvargas0

May be you can take average of expression of all the transcript as gene expression.

ADD REPLYlink written 8 months ago by Prakash920
3
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe41k
The Ether
Kevin Blighe41k wrote:

The ideal situation would be to obtain the raw data CEL or TXT files (edit: or whatever BEAD arrays use) and then summarise expression over genes using RMA and the median polish.

Given the data that you've got, you can just summarise it yourself as follows:

df <- data.frame(
  c("gene1","gene1","gene2","gene2","gene2","gene3"),
  c(1,2,3,10,20,30),
  c(60,50,40,3,2,1),
  c(7,8,9,100,110,120),
  c(12,11,10,9,8,7)
)

colnames(df) <- c("gene","sample1","sample2","sample3","sample4")

df
   gene sample1 sample2 sample3 sample4
1 gene1       1      60       7      12
2 gene1       2      50       8      11
3 gene2       3      40       9      10
4 gene2      10       3     100       9
5 gene2      20       2     110       8
6 gene3      30       1     120       7

summarise by mean

aggregate(df[,2:ncol(df)], by=df[1], mean)
   gene sample1 sample2 sample3 sample4
1 gene1     1.5      55     7.5    11.5
2 gene2    11.0      15    73.0     9.0
3 gene3    30.0       1   120.0     7.0

summarise by median

    aggregate(df[,2:ncol(df)], by=df[1], median)
   gene sample1 sample2 sample3 sample4
1 gene1     1.5      55     7.5    11.5
2 gene2    10.0       3   100.0     9.0
3 gene3    30.0       1   120.0     7.0

summarise by sum

     aggregate(df[,2:ncol(df)], by=df[1], sum)
   gene sample1 sample2 sample3 sample4
1 gene1       3     110      15      23
2 gene2      33      45     219      27
3 gene3      30       1     120       7

Kevin

ADD COMMENTlink modified 3 months ago • written 8 months ago by Kevin Blighe41k

Dear @Kevin Blighe,

How to proceed with the FDR of each gene? Example: if I do a mean of expression between the probes of the same gene, should I also do a mean of FDR value? Would you have any reference to indicate me?

Best, Leite

ADD REPLYlink written 3 months ago by Leite370

You could just fit your own linear or logistic model to the data with lm() or glm(), and the adjust the p-values with p.adjust(), or use my package: https://bioconductor.org/packages/release/data/experiment/html/RegParallel.html

Better to have the raw data CEL, TXT, GAL, or DAT files, though.

ADD REPLYlink written 3 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1760 users visited in the last hour