Question: RNA Seq Analysis in R
0
gravatar for anasjamshed1994
7 months ago by
anasjamshed199460 wrote:

I choose microarray data GSE75693 of 30 patients with stable kidney transplantation and 15 with BKVN to identify differentially expressed genes (DEGs). I performed this in GEO2R and find R script there and Runs R script Successfully on R studio as well. The R script is :

#   Differential expression analysis with limma
library(Biobase)
library(GEOquery)
library(limma)

# load series and platform data from GEO

gset <- getGEO("GSE75693", GSEMatrix =TRUE, AnnotGPL=TRUE)
if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# make proper column names to match toptable 
fvarLabels(gset) <- make.names(fvarLabels(gset))

# group names for all samples
gsms <- paste0("000000000000000000000000000000XXXXXXXXXXXXXXX11111",
        "1111111111XXXXXXXXXXXXXXXXXXX")
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }

# eliminate samples marked as "X"
sel <- which(sml != "X")
sml <- sml[sel]
gset <- gset[ ,sel]

# log2 transform
exprs(gset) <- log2(exprs(gset))

# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=1250)

tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","t","B","logFC","Gene.symbol","Gene.title"))
write.table(tT, file=stdout(), row.names=F, sep="\t")

But I want to select DEGs based on the threshold P < 0.01 and fold change >2.0. How can I do this through R script?

rna-seq microarray degs R • 624 views
ADD COMMENTlink modified 12 weeks ago by dare_devil1.4k • written 7 months ago by anasjamshed199460

Hello anasjamshed1994!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/14070/deg-analysis-in-r-through-bioconductor

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 7 months ago by Ram32k

Your cross-posting on a different site, especially after someone has already started helping you here, is disrespectful and unprofessional.

ADD REPLYlink written 7 months ago by Ram32k

and Kevin is my brother and I don't want to disrespect him at any cost

ADD REPLYlink written 7 months ago by anasjamshed199460

brother, what can I do if I don't find a solution from the last 10 days? I already contacted authors

ADD REPLYlink written 7 months ago by anasjamshed199460

In that case, there is nothing you can do but wait. You could send the authors a second email, but beyond that, there's nothing more you can do.

ADD REPLYlink written 7 months ago by Ram32k
3
gravatar for dare_devil
12 weeks ago by
dare_devil1.4k
India
dare_devil1.4k wrote:

Hi, First of all this is microarray data not RNAseq data. As per the code the data it takes data from here which is already normalized.

Then I don't know why do they perform normalization on the same data which is already normalized.

# log2 transform
exprs(gset) <- log2(exprs(gset))

It is always better to download the raw data and preprocess it using affy and limma packages.

A sample of the analyis can be:

#set the working directory where you have the cel files are stored
setwd(path/to/directory/)
#load required packges
library(affy)
library(limma)
library(oligo)
library(pd.hg.u133.plus.2)

#list the cel files
list.celfiles()
names = list.celfiles()

#create array with sample names
array = read.celfiles(names)

#perform rma algorithm to normalize the data
eset = rma(array)
write.exprs(eset, file = "data_normalized.txt") #this will be your normalized data by rma

#Load the target files which the information about the sample and their corresponding group
targets<-read.delim(file="targets.txt", header=T)
#create design and fit the design
design <- model.matrix(~0+ conditions)
fit <- lmFit(eset, design)
#create contrast
contrast.matrix <- makeContrasts(group1=condition1-control, group2=condition2-control, levels = design)

#fit the contrast
fit.cont <- contrasts.fit(fit, contrast.matrix)
fit.cont<- eBayes(fit.cont)

results<-decideTests(fit.cont,adjust.method="fdr",p=0.05)
summary(results)
ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by dare_devil1.4k

how can we make graphs(heatmaps,histogram, bar ) after this analysis?

ADD REPLYlink written 12 weeks ago by anasjamshed199460
2

you need the normalized data which you will get from the command

write.exprs(eset, file = "data_normalized.txt")

you can use pheatmap for making the heatmap from the matrix, which is quite simple and easy

boxplot and histogram are inbuilt in R

boxplot(expression_matrix, main="Signal Normalized",
        xlab="Sample",
        ylab="Log2 Intensity",cex=0.8, las=2)

for volcanoplot you can use EnhancedVolcano which is developed by Kevin

ADD REPLYlink written 11 weeks ago by dare_devil1.4k

I'd recommend ComplexHeatmap over pheatmap - it is a LOT better in terms of how the entire concept is structured.

ADD REPLYlink written 11 weeks ago by Ram32k

what is expression matrix?

ADD REPLYlink written 11 weeks ago by anasjamshed199460
2

You can load the expression_matrix from the data_normalized.txt.

expression_matrix = read.delim("data_normalized.txt", sep="\t", header=TRUE)
ADD REPLYlink written 11 weeks ago by dare_devil1.4k

That would be your dataset. You should know what it is if you're working on Differential Expression.

ADD REPLYlink written 11 weeks ago by Ram32k

Use ggplot2, ComplexHeatmap, etc. Invest some effort in learning these tools, they'll be really useful.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Ram32k
1
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe71k
Republic of Ireland
Kevin Blighe71k wrote:

Hi Anas, please take a look at the output of ?topTable and you will be able to figure this out.

Kevin

ADD COMMENTlink modified 7 months ago • written 7 months ago by Kevin Blighe71k

no brother I can't find any method to select threshold P < 0.01 and fold change >2.0. plz help me

ADD REPLYlink written 7 months ago by anasjamshed199460

I am pretty sure that the parameters that you need are p.value and lfc.

If you cannot filter directly with toptable(), then just do:

subset(tT, adj.P.Val < 0.05 & abs(logFC) > 2)
ADD REPLYlink written 7 months ago by Kevin Blighe71k

after runnning this subset(tT, adj.P.Val < 0.05 & abs(logFC) > 2) no genes are found . plz help me

ADD REPLYlink written 7 months ago by anasjamshed199460
1

Then nothing is statistically significantly, and there is nothing that I can do to help in relation to this. Try to reduce these thresholds (0.05 and absolute 2)

ADD REPLYlink modified 7 months ago • written 7 months ago by Kevin Blighe71k

DEGs were selected based on the threshold P < 0.01 and fold change >2.0. This is written in the paper so I want to select like this and output should show 504 genes

ADD REPLYlink written 7 months ago by anasjamshed199460

Email the authors or re-read the paper and make sure you're classifying samples accurately.

ADD REPLYlink written 7 months ago by Ram32k
3

I already did all the things that's why I am asking here AND Ram if you don't want to help others then plz not criticize other

ADD REPLYlink written 7 months ago by anasjamshed199460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2367 users visited in the last hour
_