Question: GEO Accession transcriptome data comparison with custom gene expression sets
0
gravatar for sujitsilas
8 weeks ago by
sujitsilas0
sujitsilas0 wrote:

Hello, I had recently downloaded transcriptome data sets from GEO archive. I am trying to compare them to custom experimental datasets to find overlapping genes and differentially expressed genes. I previously tried using the supplementary files, the .soft files, the GEOquery package to access the transcriptome data, but none seem to have the data. I found that the .csv files had the names of all the genes and their corresponding expression levels for 0-22 days (neuroectodermal differentiation). I was not sure how to use the .csv file to compare the expression data in R. Thanks in advance.

transcriptome rna-seq deseq2 R geo • 264 views
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by sujitsilas0

Can you please provide an example GSE number so that I can take a look.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Kevin Blighe71k

GSE107552 GSE103715 GSE147270

ADD REPLYlink written 8 weeks ago by sujitsilas0

![This is how the data in the .csv file looks like: link:

https://ibb.co/Cw56j8c

ADD REPLYlink modified 8 weeks ago by Kevin Blighe71k • written 8 weeks ago by sujitsilas0

Hi, those studies are:

  • GSE107552: RNA-seq; data available as raw counts and FPKM expression levels
  • GSE103715: RNA-seq; data available as FPKM expression levels
  • GSE147270: Affymetrix U133 Plus 2.0 microarray; data available via GEOquery

It will be difficult to compare samples and genes across each study. For the 2 RNA-seq studies, although the data is available in FPKM expression units, batch effects will exist.

Can you elaborate more on what your ultimate goal was?

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Kevin Blighe71k

Thank you for the insight, Kevin !! The ultimate goal is to compare these datasets to another experimental transcriptome dataset (given to me by my professor). The goal is to find new candidate genes that overlap with genes expressed in ectoderm or neuroectoderm development, and establish if there are co-expressed with neural crest progenitors.

ADD REPLYlink written 8 weeks ago by sujitsilas0

I see. You just have to be concsious about how each dataset is normalised. For example, you cannot compare RNA-seq FPKM versus RMA-normalised microarray data without first attempting to 'standardise' each dataset and deal with any batch effects.

If it's impossible to directly compare datasets, we can process them independently and 'meta-analyse' the p-values from each.

ADD REPLYlink written 8 weeks ago by Kevin Blighe71k

I believe It’s going to be more like a meta-analysis. The p values and the fold change data would be more than enough to identify new candidates. However, I am not sure how each of the dataset can be normalized independently before comparison. I had tried using fold change functions but was unsure if there were any other package that can normalize, find the p value, and identify the fold change for the data.

ADD REPLYlink written 8 weeks ago by sujitsilas0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2090 users visited in the last hour
_