Question

GEO Affymetrix data for differential expression analysis?

0

Entering edit mode

3 months ago

sativus ▴ 20

Hi! I am writing as I am currently faced with a somewhat niche situation. I am working with a cohort consisting of samples from a fairly rare phenotype which have been sequenced through Illumina's RNA-sequencing platform. Through some extensive library searches I've found multiple cohorts available on GEO to be used as external sources of validation (i.e. to run the same analysis on external data as i did on my "local" data, to see if trends present in my "local" data can be observed externally as well).

These are unfortunately all affymetrix-based, but are to my knowledge the only available option. As such I will not combine the datasets or cohorts, but rather each run will be done for the respective GEO dataset separately. Thus avoiding the (severe) batch effects combining multiple external datasets from different platforms would bring. I plan to use limma for the DEG analysis, and have used Deseq2 for my own count-data.

My issue now lies in that the expression data for the arrays is already processed, more specifically through the "quantile-normalized trimmed-mean" method, which to my knowledge is what EBSeq and EdgeR uses. My question to you is thus, can this data be used for DEG analysis directly, as it is normalized count-data and will only be used within its respective cohort. Or would i be better off downloading the raw .cel files and processing these myself to ensure consistency?

affymetrix GEO DEG • 366 views

ADD COMMENT • link 3 months ago by sativus ▴ 20

0

Entering edit mode

It will be better to download raw .cel files and process them by yourself. Most of the standard approaches for DE analysis require the count data in raw form without any sort of normalization or preprocessing. In that way the approaches you used to analyze your own data and data from external source will be congruous.

ADD REPLY • link 3 months ago by bk11 ★ 2.4k

0

Entering edit mode

Thank you for the quick response!

I originally considered this idea, but after reading replies from the creators behind limma, stating that the program can handle log-normalized values, it does seem unecessary (the only major difference being that manually processing the data would give me better clarity into exactly what parameters were used).

As all datasets will be used for validation using within-group phenotypic comparison, the normalised array-expression values in the form of expression-sets should be feasible for DEG analysis using limma if have understood the package correctly?

ADD REPLY • link 3 months ago by sativus ▴ 20