Question: How to get the gene expression matrix from GEO while getGEO returns 0 features?
0
gravatar for bioyas
8 months ago by
bioyas0
USA
bioyas0 wrote:

Hi,

I would like to download the gene expression data with GEO accession number "GSE104075" from GEO repository.

here is my code:

gset <- getGEO("GSE104075",GSEMatrix =TRUE, getGPL=TRUE, AnnotGPL=TRUE)
if (length(gset) > 1) idx <- grep("GPL21298", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

ex <- exprs(gset)


> str(ex)
logi[0 , 1:26] 
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:26] "GSM2789021" "GSM2789022" "GSM2789023" "GSM2789024" ..

Apparently, the submitters have not submitted the gene expression data on GEO. So I do not know how can I download this data.

Do you have any idea how can I get gene expression matrix?

ADD COMMENTlink modified 7 months ago by ATpoint30k • written 8 months ago by bioyas0
1
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe55k
Porto Alegre / London
Kevin Blighe55k wrote:

That is not a microarray study, so, there is no series matrix file with expression values for getGEO to download. It is a next generation sequencing study of ATAC-seq and RNA-seq. If you want to use the data, you should check the SRA accession page: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA408158

They may only have made the raw data available, though.

Kevin

ADD COMMENTlink written 8 months ago by Kevin Blighe55k

Thanks for your response. On the GEO page I can see that there is a file "GSE104075_RAW.tar" that contains the bed files. I have unzipped them and I see the files with ".bed" and ".bedGraph". Do you think I can use this files to get the gene expression data?

Do you think I can convert bed files to bam files and use feature counts to get the counts of gene expression?

Thanks

ADD REPLYlink modified 8 months ago • written 8 months ago by bioyas0
1

You should clarify what is contained within the BED and bedGraph files. Gene expression data is not typically stored in these formats. The BED and bedGraph files most likely contain the ATAC-seq data, which is typically stored in these formats.

You may also want to contact the authors directly to see if they can share the expression matrix with you. I looked briefly and could not find it.

ADD REPLYlink written 8 months ago by Kevin Blighe55k
1
gravatar for ATpoint
7 months ago by
ATpoint30k
Germany
ATpoint30k wrote:

As Kevin Blighe says this is RNA-seq. It is sometimes/often not obvious what exactly these oploaded files in the RAW section are. I personally never trust them (not saying the authors are incompetent, but one simply cannot reproduce this without exact commands etc. which are often not available). The simplest is to download the raw data, see Fast download of FASTQ files from the European Nucleotide Archive (ENA) and then use a lightweight quantifier such as salmon to quantify reads against a reference transcriptome. You can then use tximport to summarize the transcript counts to the gene level. Please use google and the search function and read the manuals of the tools. Many posts on this available.

These bed files you mention are the ATAC-seq peak summits and the bedGraphs are browser tracks to visualize the RNA-seq in a genome viewer such as the IGV. None of this will reliable/meaningful give you raw gene expression counts.

ADD COMMENTlink written 7 months ago by ATpoint30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1300 users visited in the last hour