Question

GEO Accession transcriptome data comparison with custom gene expression sets

0

Entering edit mode

3.3 years ago

singlecell_bio • 0

Hello, I had recently downloaded transcriptome data sets from GEO archive. I am trying to compare them to custom experimental datasets to find overlapping genes and differentially expressed genes. I previously tried using the supplementary files, the .soft files, the GEOquery package to access the transcriptome data, but none seem to have the data. I found that the .csv files had the names of all the genes and their corresponding expression levels for 0-22 days (neuroectodermal differentiation). I was not sure how to use the .csv file to compare the expression data in R. Thanks in advance.

RNA-Seq R DESeq2 GEO Transcriptome • 1.2k views

ADD COMMENT • link 3.3 years ago by singlecell_bio • 0

0

Entering edit mode

Can you please provide an example GSE number so that I can take a look.

ADD REPLY • link 3.3 years ago by Kevin Blighe 87k

0

Entering edit mode

GSE107552 GSE103715 GSE147270

ADD REPLY • link 3.3 years ago by singlecell_bio • 0

0

Entering edit mode

![This is how the data in the .csv file looks like: link:

https://ibb.co/Cw56j8c

ADD REPLY • link updated 3.3 years ago by Kevin Blighe 87k • written 3.3 years ago by singlecell_bio • 0

0

Entering edit mode

Hi, those studies are:

GSE107552: RNA-seq; data available as raw counts and FPKM expression levels
GSE103715: RNA-seq; data available as FPKM expression levels
GSE147270: Affymetrix U133 Plus 2.0 microarray; data available via GEOquery

It will be difficult to compare samples and genes across each study. For the 2 RNA-seq studies, although the data is available in FPKM expression units, batch effects will exist.

Can you elaborate more on what your ultimate goal was?

ADD REPLY • link 3.3 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you for the insight, Kevin !! The ultimate goal is to compare these datasets to another experimental transcriptome dataset (given to me by my professor). The goal is to find new candidate genes that overlap with genes expressed in ectoderm or neuroectoderm development, and establish if there are co-expressed with neural crest progenitors.

ADD REPLY • link 3.3 years ago by singlecell_bio • 0

0

Entering edit mode

I see. You just have to be concsious about how each dataset is normalised. For example, you cannot compare RNA-seq FPKM versus RMA-normalised microarray data without first attempting to 'standardise' each dataset and deal with any batch effects.

If it's impossible to directly compare datasets, we can process them independently and 'meta-analyse' the p-values from each.

ADD REPLY • link 3.3 years ago by Kevin Blighe 87k

0

Entering edit mode

I believe It’s going to be more like a meta-analysis. The p values and the fold change data would be more than enough to identify new candidates. However, I am not sure how each of the dataset can be normalized independently before comparison. I had tried using fold change functions but was unsure if there were any other package that can normalize, find the p value, and identify the fold change for the data.

ADD REPLY • link 3.3 years ago by singlecell_bio • 0