Question

How to run rlog from Deseq2 in R on HTseq counts from RNAseq data

1

Entering edit mode

8.0 years ago

herman.pappoe.45 ▴ 10

Hi all,

I have RNA-Seq time-series data generated by HTseq counts. There are sort of 2 replicates. The First replicate has 4 time point samples(day0,2,5,14), while the Second has 5 time point samples(day0,2,5,15,30). The data was loaded as a read.table matrix in R. The following is the script used:

setwd("Downloads")
df_A <- read.table(file='Replicate_sample_Raw_RPM_counts-2',header=T,sep='\t')
df_B <- read.table(file='HT-seq_counts.txt-2',header=T,sep='\t')
merged_df <- merge(df_A,df_B,by='geneID')
write.table(merged_df,file='merged_count.txt',row.names=F,quote=F,sep='\t')
counts<-read.csv("merged_count.txt",header=T,sep="\t")
data<-counts[-1]
rlog(data)

head(data)

  day0.x day2.x day5.x day14 day0.y day2.y day5.y day15 day30
1    358    422    241   617    145    508    389   357   594
2     11     31     44    26      8     24     41    49    49
3      7      3     33   392      2      5     25   159   155
4     26     45     74  5624     45    175     94  4604 14238
5      4     10     66   338     19     13     70   229   242
6    477    138     64    21    747    507     98    25    22

But when I try to to run rlog or rlogTransformation I get the following error:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘sizeFactors’ for signature ‘"data.frame"’

How can I improve the script to make rlog run so that I can have my HT-seq counts normalized?

RNA-Seq HTseq counts R Deseq2 rlog • 5.8k views

ADD COMMENT • link updated 8.0 years ago by informatics bot ▴ 760 • written 8.0 years ago by herman.pappoe.45 ▴ 10

score 2 · Accepted Answer · 2016-04-21

2

Entering edit mode

8.0 years ago

informatics bot ▴ 760

I would highly recommend you read the DESeq2 vignette.

You first need to create a design matrix. This will be a 3 columned table, where each row represents a replicate, and each column represents, time, donor/sample, and treatment (If you have control replicates. If all samples are treated with the same stimulation (i.e. no controls), then you don't need a column for it.).

You can proceed to create a DESeq data object (your data is currently a "raw matrix") from your count table and your design table.

dds<-DESeqDataSetFromMatrix(data, design, ~ time + sample + treatment)

Your design formula needs to take into account your time points as well as the two replicated samples.

Then, estimate the size factors (partial cause of the error message), and proceed to rlog the data-set

norm<-estimateSizeFactors(dds) 
expr<-rlog(norm)

Also, you might want to look into VST normalizing your data (run it with the blind parameter = FALSE). VST normalization takes into account your experimental design when normalizing. You might want to remove the 30 day time point replicate, this will make your time-course data-set more even.

ADD COMMENT • link 8.0 years ago by informatics bot ▴ 760

0

Entering edit mode

So I have tried creating a design matrix but I am not entirely sure I have designed correctly for the data I have:

        time    sample
day0.x  time1   sample1
day2.x  time2   sample2
day5.x  time3   sample3
day14.x time4   sample4
day0.y  time1   sample5
day2.y  time2   sample6
day5.y  time3   sample7
day15.y time4   sample8
day30.y time5   sample9

This is the script I used to create the design matrix:

count_table <-read.csv("merged_count.txt", header=TRUE,sep="\t")
count_table <- count_table[-1]
head(count_table)
expt_design1 <- data.frame(row.names = colnames(count_table), subject = c(rep("day0",1),rep("day2",2),rep("day5",3),rep("day14",4),rep("day0",1),rep("day2",2),rep("day5",3),rep("day15",4),rep("day30",5),time = c("t1","t2","t3","t4","t1","t2","t3","t4","t5"))
cds <- newCountDataSet(count_table, expt_design1)
head(counts(cds))
cds = estimateSizeFactors(cds)

This is the error that I am getting when trying to create the design matrix:

> expt_design1 <- data.frame(row.names = colnames(count_table), subject = c(rep("day0",1),rep("day2",2),rep("day5",3),rep("day14",4),rep("day0",1),rep("day2",2),rep("day5",3),rep("day15",4),rep("day30",5),time = c("t1","t2","t3","t4","t1","t2","t3","t4","t5"))
+ cds <- newCountDataSet(count_table, expt_design1)

Error: unexpected symbol in:
"sign1 <- data.frame(row.names = colnames(count_table), subject = c(rep("day0",1),rep("day2",2),rep("day5",3),rep("day14",4),rep("day0",1),rep("day2",2),rep("day5",3),rep("day15",4),rep("day30"
cds"

What am I doing wrong? I hope I have not misunderstood your advice nor how to create a proper matrix.

ADD REPLY • link 8.0 years ago by herman.pappoe.45 ▴ 10

1

Entering edit mode

you are missing a closing parenthesis in your vector function...

subject=c(rep("day0",1),rep("day2",2),rep("day5",3),rep("day14",4),rep("day0",1),rep("day2",2),rep("day5",3),rep("day15",4),rep("day30",5))

ADD REPLY • link 8.0 years ago by informatics bot ▴ 760

1

Entering edit mode

you could also use:

count_table <-read.csv("merged_count.txt", header=TRUE,sep="\t", row.names=1)

and omit the second line

ADD REPLY • link 8.0 years ago by WouterDeCoster 47k