I'm a new phd student and struggling to analyse microarray data in R.
If anyone could help it would be so appreciated.
Very open ended question I am afraid, what did you try? Did you read e.g. the limma user guide which is the standard package for differential analysis of arrays. Find it at https://bioconductor.org/packages/release/bioc/html/limma.html
Thankyou for your response.
Yes I have read
through it but I only understand parts of it. e.g. So I can only filter out the lowly expressed genes and do the MDS plot showing distances between expression profiles
When I try to run some of the commands in R, it won't run.
I suggest to work your way through step by step, then if something fails: google the error message. If that doesn't lead you anywhere, post a slightly more specific question, giving the exact commands and error messages you tried.
More immediate help might come from local R user groups or bioinformatics chat groups.
If you are working mainly with Bioconductor packages, then https://support.bioconductor.org/ is better.
Thankyou for your help. I will try to post a more specific question.
I've been working through the workflow below:
Do you happen to know what this command means
ann <- select(org.Mm.eg.db,keys=rownames(results.ordered),columns=c("ENTREZID","SYMBOL","GENENAME"))
Its come up as an error
Error in .testForValidKeys(x, keys, keytype, fks) :
None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.
Again, very open-ended statement. "it won't run", what does that mean? Here is an end-to-end workflow for Affy arrays, maybe that helps: https://www.bioconductor.org/packages/release/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html
If you are new and want to learn yourself then it takes time and effort, but it is doable. Many people here have no formal bioinformatics background incl myself. It takes time and dedication, there are so many resources on the internet.
I meant that when I run many of the commands in R it comes up as an error.
Sorry I'm very new to this, I have spent four days trying to work on it with no luck.
Thankyou for your advice and for sharing the workflow.
It may help to describe in detail the data that you have (file extension; source), to show the R commands that you have used, and also to show the error messages. Otherwise, how can anybody help you?
It is quite challenging because I can't upload the data on this forum. If anyone has an email address and could help me through that?
I have microarray data on mesenchymal stem cells that are differentiating into chondrocytes given to me in excel. I have the treatment conditions day 0, day 7, day 14, and day 21. And there are 3 replicates for each condition.
There are 32, 407 probenames/cells.
I want to determine the top 20 genes in MSC differentiation and which genes are differently expressed across the different conditions.
What would be the best way to start. I have imported the excel spreadsheet and used the following commands:
seqdata <- MSCs MSCs is the dataset
countdata <- seqdata[,-c(1,14)]
rownames(countdata) <- genes
rownames(countdata) <- seqdata[,1]
y <- DGEList(countdata)
group <-c( "Day 14", "Day 14","Day 14", "Day 7", "Day 7", "Day 7", "Day 21", "Day 21", "Day 21","Day 0", "Day 0", "Day 0")
y$samples$group <- group
myCPM <- cpm(countdata)
thresh <- myCPM > 0.5
keep <- rowSums(thresh) >= 2
(the plot didn't work and came up as an error)
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'plot': subscript out of bounds
design <- model.matrix(~ 0 + group)
colnames(design) <- levels(group)
v <- voom(y,design,plot = TRUE)
fit <- lmFit(v)
cont.matrix <- makeContrasts(day0Vsday21=Day 0- Day 21,levels=design)
(came up as error)
Error: unexpected numeric constant in "cont.matrix <- makeContrasts(day0V
Ok, that is not a big deal, you cannot have unquotedd spaces in variable or factor names, try:
cont.matrix <- makeContrasts(day0Vsday21=`Day 0` - `Day 21`, levels=design)
and see what happens.
For getting better advice, maybe you could use built-in data, most packages come with that. People might not want to download random excel files due to malware concerns but you could share text files via github, or use public files on e.g. google drive. But first, try to work with example data.
It still came up as an Error in makeContrasts(day0Vsday21 = Day 14 - Day 21, levels = design) :
The levels must by syntactically valid names in R, see help(make.names).
Someone sent me a RNA seq workflow and I've been using it on microarray data. I'll have to start again.
Just to finalize this, I think you need makeContrasts(day0Vsday21=Day 0 - Day 21, levels=colnames(design))
But as you are using the wrong workflow for the data, it doesn't really matter.
Btw, this is RNA-seq data, not microarray.
Do you know what the first steps of a microarray workflow is if I'm using an excel spreadsheet that has 14 columns.
ProbeName, Day 14-1, Day 14-2, Day 14-3, Day 7-1, Day 7-2, Day 7-3, Day 21-1, Day 21-2, Day 21-3, Day 0-1, Day 0-2, Day 0-3, GeneSymbol
This really pretty much depends on a lot of things, the only thing I know now is that you have a sort of time-series, but it also depends on the platform and normalization that was applied. Likely you can still use limma (without voom), and try to work your way through the limma user guide. That is the only thing I can say without having the data and provenance information.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy