Question: How to run rlog from Deseq2 in R on HTseq counts from RNAseq data
1
gravatar for herman.pappoe.45
3.0 years ago by
United States
herman.pappoe.4510 wrote:

Hi all,

I have RNA-Seq time-series data generated by HTseq counts. There are sort of 2 replicates. The First replicate has 4 time point samples(day0,2,5,14), while the Second has 5 time point samples(day0,2,5,15,30). The data was loaded as a read.table matrix in R. The following is the script used:

setwd("Downloads")
df_A <- read.table(file='Replicate_sample_Raw_RPM_counts-2',header=T,sep='\t')
df_B <- read.table(file='HT-seq_counts.txt-2',header=T,sep='\t')
merged_df <- merge(df_A,df_B,by='geneID')
write.table(merged_df,file='merged_count.txt',row.names=F,quote=F,sep='\t')
counts<-read.csv("merged_count.txt",header=T,sep="\t")
data<-counts[-1]
rlog(data)

head(data)

  day0.x day2.x day5.x day14 day0.y day2.y day5.y day15 day30
1    358    422    241   617    145    508    389   357   594
2     11     31     44    26      8     24     41    49    49
3      7      3     33   392      2      5     25   159   155
4     26     45     74  5624     45    175     94  4604 14238
5      4     10     66   338     19     13     70   229   242
6    477    138     64    21    747    507     98    25    22

But when I try to to run rlog or rlogTransformation I get the following error:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘sizeFactors’ for signature ‘"data.frame"’

How can I improve the script to make rlog run so that I can have my HT-seq counts normalized?

rna-seq deseq2 rlog htseq counts R • 2.5k views
ADD COMMENTlink modified 3.0 years ago by informatics bot560 • written 3.0 years ago by herman.pappoe.4510
2
gravatar for informatics bot
3.0 years ago by
United States
informatics bot560 wrote:

I would highly recommend you read the DESeq2 vignette.

You first need to create a design matrix. This will be a 3 columned table, where each row represents a replicate, and each column represents, time, donor/sample, and treatment (If you have control replicates. If all samples are treated with the same stimulation (i.e. no controls), then you don't need a column for it.).

You can proceed to create a DESeq data object (your data is currently a "raw matrix") from your count table and your design table.

dds<-DESeqDataSetFromMatrix(data, design, ~ time + sample + treatment)

Your design formula needs to take into account your time points as well as the two replicated samples.

Then, estimate the size factors (partial cause of the error message), and proceed to rlog the data-set

norm<-estimateSizeFactors(dds) 
expr<-rlog(norm)

Also, you might want to look into VST normalizing your data (run it with the blind parameter = FALSE). VST normalization takes into account your experimental design when normalizing. You might want to remove the 30 day time point replicate, this will make your time-course data-set more even.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by informatics bot560

So I have tried creating a design matrix but I am not entirely sure I have designed correctly for the data I have:

        time    sample
day0.x  time1   sample1
day2.x  time2   sample2
day5.x  time3   sample3
day14.x time4   sample4
day0.y  time1   sample5
day2.y  time2   sample6
day5.y  time3   sample7
day15.y time4   sample8
day30.y time5   sample9

This is the script I used to create the design matrix:

count_table <-read.csv("merged_count.txt", header=TRUE,sep="\t")
count_table <- count_table[-1]
head(count_table)
expt_design1 <- data.frame(row.names = colnames(count_table), subject = c(rep("day0",1),rep("day2",2),rep("day5",3),rep("day14",4),rep("day0",1),rep("day2",2),rep("day5",3),rep("day15",4),rep("day30",5),time = c("t1","t2","t3","t4","t1","t2","t3","t4","t5"))
cds <- newCountDataSet(count_table, expt_design1)
head(counts(cds))
cds = estimateSizeFactors(cds)

This is the error that I am getting when trying to create the design matrix:

> expt_design1 <- data.frame(row.names = colnames(count_table), subject = c(rep("day0",1),rep("day2",2),rep("day5",3),rep("day14",4),rep("day0",1),rep("day2",2),rep("day5",3),rep("day15",4),rep("day30",5),time = c("t1","t2","t3","t4","t1","t2","t3","t4","t5"))
+ cds <- newCountDataSet(count_table, expt_design1)

Error: unexpected symbol in:
"sign1 <- data.frame(row.names = colnames(count_table), subject = c(rep("day0",1),rep("day2",2),rep("day5",3),rep("day14",4),rep("day0",1),rep("day2",2),rep("day5",3),rep("day15",4),rep("day30"
cds"

What am I doing wrong? I hope I have not misunderstood your advice nor how to create a proper matrix.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by herman.pappoe.4510
1

you are missing a closing parenthesis in your vector function...

subject=c(rep("day0",1),rep("day2",2),rep("day5",3),rep("day14",4),rep("day0",1),rep("day2",2),rep("day5",3),rep("day15",4),rep("day30",5))
ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by informatics bot560
1

you could also use:

count_table <-read.csv("merged_count.txt", header=TRUE,sep="\t", row.names=1)

and omit the second line

ADD REPLYlink written 3.0 years ago by WouterDeCoster38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1796 users visited in the last hour