Question

Bioinformatics tutoring

0

Entering edit mode

2.6 years ago

bioinformatics ▴ 40

Hi everyone,

I'm a new phd student and struggling to analyse microarray data in R.

If anyone could help it would be so appreciated.

Thankyou!

differentially genes expressed microarray • 2.2k views

ADD COMMENT • link updated 2.6 years ago by Michael 54k • written 2.6 years ago by bioinformatics ▴ 40

0

Entering edit mode

Very open ended question I am afraid, what did you try? Did you read e.g. the limma user guide which is the standard package for differential analysis of arrays. Find it at https://bioconductor.org/packages/release/bioc/html/limma.html

ADD REPLY • link 2.6 years ago by ATpoint 81k

0

Entering edit mode

Thankyou for your response.

Yes I have read through it but I only understand parts of it. e.g. So I can only filter out the lowly expressed genes and do the MDS plot showing distances between expression profiles

When I try to run some of the commands in R, it won't run.

ADD REPLY • link 2.6 years ago by bioinformatics ▴ 40

0

Entering edit mode

I suggest to work your way through step by step, then if something fails: google the error message. If that doesn't lead you anywhere, post a slightly more specific question, giving the exact commands and error messages you tried.

More immediate help might come from local R user groups or bioinformatics chat groups.

If you are working mainly with Bioconductor packages, then https://support.bioconductor.org/ is better.

ADD REPLY • link 2.6 years ago by Michael 54k

0

Entering edit mode

Thankyou for your help. I will try to post a more specific question.

I've been working through the workflow below:

https://combine-australia.github.io/RNAseq-R/06-rnaseq-day1.html#References

Do you happen to know what this command means ann <- select(org.Mm.eg.db,keys=rownames(results.ordered),columns=c("ENTREZID","SYMBOL","GENENAME"))

Its come up as an error

Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

Thankyou

ADD REPLY • link 2.6 years ago by bioinformatics ▴ 40

0

Entering edit mode

Again, very open-ended statement. "it won't run", what does that mean? Here is an end-to-end workflow for Affy arrays, maybe that helps: https://www.bioconductor.org/packages/release/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html

If you are new and want to learn yourself then it takes time and effort, but it is doable. Many people here have no formal bioinformatics background incl myself. It takes time and dedication, there are so many resources on the internet.

ADD REPLY • link 2.6 years ago by ATpoint 81k

0

Entering edit mode

I meant that when I run many of the commands in R it comes up as an error. Sorry I'm very new to this, I have spent four days trying to work on it with no luck.

Thankyou for your advice and for sharing the workflow.

ADD REPLY • link 2.6 years ago by bioinformatics ▴ 40

0

Entering edit mode

It may help to describe in detail the data that you have (file extension; source), to show the R commands that you have used, and also to show the error messages. Otherwise, how can anybody help you?

ADD REPLY • link 2.6 years ago by Kevin Blighe 87k

0

Entering edit mode

It is quite challenging because I can't upload the data on this forum. If anyone has an email address and could help me through that? I have microarray data on mesenchymal stem cells that are differentiating into chondrocytes given to me in excel. I have the treatment conditions day 0, day 7, day 14, and day 21. And there are 3 replicates for each condition. There are 32, 407 probenames/cells. I want to determine the top 20 genes in MSC differentiation and which genes are differently expressed across the different conditions.

What would be the best way to start. I have imported the excel spreadsheet and used the following commands:

library(edgeR)
library(limma)
library(Glimma)
library(org.Mm.eg.db)
library(gplots)
library(RColorBrewer)
library(NMF)

seqdata <- MSCs                   MSCs is the dataset
head(seqdata)
dim(seqdata)

countdata <- seqdata[,-c(1,14)]
head(countdata)
rownames(countdata) <- genes
rownames(countdata) <- seqdata[,1] 

y <- DGEList(countdata)
y
y$samples

group <-c( "Day 14", "Day 14","Day 14", "Day 7", "Day 7", "Day 7", "Day 21", "Day 21", "Day 21","Day 0", "Day 0", "Day 0")
group
y$samples$group <- group
y$samples

myCPM <- cpm(countdata)
head(myCPM)
thresh <- myCPM > 0.5
head(thresh)
table(rowSums(thresh))
keep <- rowSums(thresh) >= 2
summary(keep)

plot(myCPM[,100],countdata[,100]) 
plot(myCPM[,1],countdata[,1],ylim=c(0,50),xlim=c(0,3)) 
(the plot didn't work and came up as an error)

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'plot': subscript out of bounds

group
design <- model.matrix(~ 0 + group)
design

colnames(design) <- levels(group)
design

par(mfrow=c(1,1))
v <- voom(y,design,plot = TRUE)
v


fit <- lmFit(v)
names(fit)

cont.matrix <- makeContrasts(day0Vsday21=Day 0- Day 21,levels=design)
(came up as error)
Error: unexpected numeric constant in "cont.matrix <- makeContrasts(day0V
sday21=Day 0"

ADD REPLY • link updated 2.6 years ago by Michael 54k • written 2.6 years ago by bioinformatics ▴ 40

0

Entering edit mode

Ok, that is not a big deal, you cannot have unquotedd spaces in variable or factor names, try:

cont.matrix <- makeContrasts(day0Vsday21=`Day 0` - `Day 21`, levels=design)

and see what happens.

For getting better advice, maybe you could use built-in data, most packages come with that. People might not want to download random excel files due to malware concerns but you could share text files via github, or use public files on e.g. google drive. But first, try to work with example data.

ADD REPLY • link 2.6 years ago by Michael 54k

0

Entering edit mode

It still came up as an Error in makeContrasts(day0Vsday21 = Day 14 - Day 21, levels = design) : The levels must by syntactically valid names in R, see help(make.names).

Someone sent me a RNA seq workflow and I've been using it on microarray data. I'll have to start again.

ADD REPLY • link 2.6 years ago by bioinformatics ▴ 40

0

Entering edit mode

Just to finalize this, I think you need makeContrasts(day0Vsday21=Day 0 - Day 21, levels=colnames(design))

But as you are using the wrong workflow for the data, it doesn't really matter.

ADD REPLY • link 2.6 years ago by Michael 54k

0

Entering edit mode

Btw, this is RNA-seq data, not microarray.

ADD REPLY • link 2.6 years ago by Michael 54k

0

Entering edit mode

Do you know what the first steps of a microarray workflow is if I'm using an excel spreadsheet that has 14 columns. ProbeName, Day 14-1, Day 14-2, Day 14-3, Day 7-1, Day 7-2, Day 7-3, Day 21-1, Day 21-2, Day 21-3, Day 0-1, Day 0-2, Day 0-3, GeneSymbol Thankyou

ADD REPLY • link 2.6 years ago by bioinformatics ▴ 40

0

Entering edit mode

This really pretty much depends on a lot of things, the only thing I know now is that you have a sort of time-series, but it also depends on the platform and normalization that was applied. Likely you can still use limma (without voom), and try to work your way through the limma user guide. That is the only thing I can say without having the data and provenance information.

ADD REPLY • link 2.6 years ago by Michael 54k