I'm quite new to the bioinformatics world and just started working with the GEO data set GSE119600. My intention is to extract deferentially expressed genes and will start my work using the Lumi package to import ,read and normalize the raw data into R. Then I would use LIMMA package to generate the deferentially expressed genes. My questions are:
1- what file is suitable to import the raw data from? As there are 2 types of files : RAW.tar and non-normalized.txt.gz. 2- what is the general workflow for the lumi pakcage?
Thanks!
hi zelda, i'm having the same problem with raw data from illumina beadchip in GEO. How you solved your problem ?
Hi Zelda, i'm having the same problem... How do you proceed ?
Please elaborate on the problem. Please show what you have already tried, and share any warning and / or error messages that have appeared.
I''m quite new in bioinformatics world and started working with GEO dataset GSE42023. Our goals is to extract deferentially expressed genes . My question is:
The gse42023 provides me 2 types of files : RAW.tar and non-normalized.txt.gz. I know that : i need normalize this data to proceed with dea analisis, but i am really confused with this data.
Specially in non-normalized.txt , i have the genes (rows) , and the samples and p-value detection(collumns), in this case, how to proceed ?
Thanks! Example of non_normalized.txt
Unfortunately, you have the same problem as many people.
For some reason, for the Illumina microarray studies, GEO requires that authors upload data in this non-standard format. I am not sure that you can use the standard Bioconductor package, lumi, for this. Instead, you may have to process this manually, and I provide an advanced workflow here: A: illumina Arrays Illumina HumanHT-12 V3.0 expression beadchip reading data
Thank's for your response Kevin!!! I am following your workflow. But i have a question: How normalize this data ? Now i have a matrix with ID ref and p-value detection for all samples. Please forgive me for so many questions.
Hey, you need to remove the detection p-value columns. I explain this in the other post (but I do not provide the code):
"You should then extract out the Detection PVal columns and save them for later, and also set the rownames of the object to be equal to ID_REF. The final x object should be just the expression levels, and it should be a data-matrix."
The normalisation is then performed with the
neqc()
function, also mentioned in the other post.Sorry, this is a very non-standard workflow.
So, i normalize this data: ( ID_REF and the counts of genes)
Or This data: (ID_REF and just the p-value detection)?
You need to set
ID_REF
as rownames, and then remove that first column (ID_REF) from the data (in both cases).You later use the detection p-values in the following section in my other post: 'filter out control probes, those with no symbol, and those that failed' (these detection p-values will be contained in the object
detectionpvalues
)The columns of both objects also should be aligned perfectly.