Question

RNA Seq Analysis in R

0

Entering edit mode

3.7 years ago

anasjamshed ▴ 120

I choose microarray data GSE75693 of 30 patients with stable kidney transplantation and 15 with BKVN to identify differentially expressed genes (DEGs). I performed this in GEO2R and find R script there and Runs R script Successfully on R studio as well. The R script is :

#   Differential expression analysis with limma
library(Biobase)
library(GEOquery)
library(limma)

# load series and platform data from GEO

gset <- getGEO("GSE75693", GSEMatrix =TRUE, AnnotGPL=TRUE)
if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# make proper column names to match toptable 
fvarLabels(gset) <- make.names(fvarLabels(gset))

# group names for all samples
gsms <- paste0("000000000000000000000000000000XXXXXXXXXXXXXXX11111",
        "1111111111XXXXXXXXXXXXXXXXXXX")
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }

# eliminate samples marked as "X"
sel <- which(sml != "X")
sml <- sml[sel]
gset <- gset[ ,sel]

# log2 transform
exprs(gset) <- log2(exprs(gset))

# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=1250)

tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","t","B","logFC","Gene.symbol","Gene.title"))
write.table(tT, file=stdout(), row.names=F, sep="\t")

But I want to select DEGs based on the threshold P < 0.01 and fold change >2.0. How can I do this through R script?

RNA-Seq R MicroArray DEGs • 2.2k views

ADD COMMENT • link updated 3.4 years ago by DareDevil ★ 4.3k • written 3.7 years ago by anasjamshed ▴ 120

0

Entering edit mode

Hello anasjamshed1994!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/14070/deg-analysis-in-r-through-bioconductor

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 3.7 years ago by Ram 43k

0

Entering edit mode

Your cross-posting on a different site, especially after someone has already started helping you here, is disrespectful and unprofessional.

ADD REPLY • link 3.7 years ago by Ram 43k

0

Entering edit mode

and Kevin is my brother and I don't want to disrespect him at any cost

ADD REPLY • link 3.7 years ago by anasjamshed ▴ 120

0

Entering edit mode

brother, what can I do if I don't find a solution from the last 10 days? I already contacted authors

ADD REPLY • link 3.7 years ago by anasjamshed ▴ 120

0

Entering edit mode

In that case, there is nothing you can do but wait. You could send the authors a second email, but beyond that, there's nothing more you can do.

ADD REPLY • link 3.7 years ago by Ram 43k

score 3 · Answer 1 · 2020-12-09

3

Entering edit mode

3.4 years ago

DareDevil ★ 4.3k

Hi, First of all this is microarray data not RNAseq data. As per the code the data it takes data from here which is already normalized.

Then I don't know why do they perform normalization on the same data which is already normalized.

# log2 transform
exprs(gset) <- log2(exprs(gset))

It is always better to download the raw data and preprocess it using affy and limma packages.

A sample of the analyis can be:

#set the working directory where you have the cel files are stored
setwd(path/to/directory/)
#load required packges
library(affy)
library(limma)
library(oligo)
library(pd.hg.u133.plus.2)

#list the cel files
list.celfiles()
names = list.celfiles()

#create array with sample names
array = read.celfiles(names)

#perform rma algorithm to normalize the data
eset = rma(array)
write.exprs(eset, file = "data_normalized.txt") #this will be your normalized data by rma

#Load the target files which the information about the sample and their corresponding group
targets<-read.delim(file="targets.txt", header=T)
#create design and fit the design
design <- model.matrix(~0+ conditions)
fit <- lmFit(eset, design)
#create contrast
contrast.matrix <- makeContrasts(group1=condition1-control, group2=condition2-control, levels = design)

#fit the contrast
fit.cont <- contrasts.fit(fit, contrast.matrix)
fit.cont<- eBayes(fit.cont)

results<-decideTests(fit.cont,adjust.method="fdr",p=0.05)
summary(results)

ADD COMMENT • link 3.4 years ago by DareDevil ★ 4.3k

0

Entering edit mode

how can we make graphs(heatmaps,histogram, bar ) after this analysis?

ADD REPLY • link 3.4 years ago by anasjamshed ▴ 120

2

Entering edit mode

you need the normalized data which you will get from the command

write.exprs(eset, file = "data_normalized.txt")

you can use pheatmap for making the heatmap from the matrix, which is quite simple and easy

boxplot and histogram are inbuilt in R

boxplot(expression_matrix, main="Signal Normalized",
        xlab="Sample",
        ylab="Log2 Intensity",cex=0.8, las=2)

for volcanoplot you can use EnhancedVolcano which is developed by Kevin

ADD REPLY • link 3.4 years ago by DareDevil ★ 4.3k

0

Entering edit mode

I'd recommend ComplexHeatmap over pheatmap - it is a LOT better in terms of how the entire concept is structured.

ADD REPLY • link 3.4 years ago by Ram 43k

0

Entering edit mode

what is expression matrix?

ADD REPLY • link 3.4 years ago by anasjamshed ▴ 120

2

Entering edit mode

You can load the expression_matrix from the data_normalized.txt.

expression_matrix = read.delim("data_normalized.txt", sep="\t", header=TRUE)

ADD REPLY • link 3.4 years ago by DareDevil ★ 4.3k

0

Entering edit mode

That would be your dataset. You should know what it is if you're working on Differential Expression.

ADD REPLY • link 3.4 years ago by Ram 43k

0

Entering edit mode

Use ggplot2, ComplexHeatmap, etc. Invest some effort in learning these tools, they'll be really useful.

ADD REPLY • link 3.4 years ago by Ram 43k

score 1 · Answer 2 · 2020-07-31

1

Entering edit mode

3.7 years ago

Kevin Blighe 87k

Hi Anas, please take a look at the output of ?topTable and you will be able to figure this out.

Kevin

ADD COMMENT • link 3.7 years ago by Kevin Blighe 87k

0

Entering edit mode

no brother I can't find any method to select threshold P < 0.01 and fold change >2.0. plz help me

ADD REPLY • link 3.7 years ago by anasjamshed ▴ 120

0

Entering edit mode

I am pretty sure that the parameters that you need are p.value and lfc.

If you cannot filter directly with toptable(), then just do:

subset(tT, adj.P.Val < 0.05 & abs(logFC) > 2)

ADD REPLY • link 3.7 years ago by Kevin Blighe 87k

0

Entering edit mode

after runnning this subset(tT, adj.P.Val < 0.05 & abs(logFC) > 2) no genes are found . plz help me

ADD REPLY • link 3.7 years ago by anasjamshed ▴ 120

1

Entering edit mode

Then nothing is statistically significantly, and there is nothing that I can do to help in relation to this. Try to reduce these thresholds (0.05 and absolute 2)

ADD REPLY • link 3.7 years ago by Kevin Blighe 87k

0

Entering edit mode

DEGs were selected based on the threshold P < 0.01 and fold change >2.0. This is written in the paper so I want to select like this and output should show 504 genes

ADD REPLY • link 3.7 years ago by anasjamshed ▴ 120

0

Entering edit mode

Email the authors or re-read the paper and make sure you're classifying samples accurately.

ADD REPLY • link 3.7 years ago by Ram 43k

3

Entering edit mode

I already did all the things that's why I am asking here AND Ram if you don't want to help others then plz not criticize other

ADD REPLY • link 3.7 years ago by anasjamshed ▴ 120