I have generated 8 bam files using STAR from 4 different developmental stages RNA-seq (each with 2 replicates), then I used Rsubread feature counts to get the corresponding counts data from each bam file:
library(Rsubread)
counts <- featureCounts("file.bam", annot.inbuilt="mm10",
nthreads=4, isPairedEnd=T,
countMultiMappingReads=T)
write.table(
cbind(counts$annotation$GeneID,counts$annotation$Length,counts$counts),
"file.txt",
quote=F, sep="\t", row.names=F,
col.names=c("EntrezID", "GeneLength","Count")
)
Now I tried to get these txt files prepared for DEseq2:
files <- c("E11Rep1.txt", "E11Rep2.txt", "E14Rep1.txt", "E14Rep2.txt",
"E18Rep1.txt", "E18Rep2.txt", "P56Rep1.txt", "P56Rep2.txt")
x <- readDGE(files, columns=c(1,3))
cts <- x$counts
head(cts)
Samples Tags E11Rep1 E11Rep2 E14Rep1 E14Rep2 E18Rep1 E18Rep2 P56Rep1 P56Rep2 497097 199 190 617 889
1148 1761 961 884 100503874 82 97 234
281 395 504 607 527 100038431 0 0
1 8 18 25 19 21 19888 6 19
9 17 6 9 13 1 20671 1704 1334
1544 2321 1412 2160 1657 1539 27395 7984
5541 3778 5846 3126 5332 1736 1618
coldata <- data.frame(timepoint = rep(c("E11", "E14", "E18", "P56"), each=2))
dds <- DESeqDataSetFromMatrix(countData=cts, colData=coldata, design=~timepoint)
Then I got error:
dds <- DESeqDataSetFromMatrix(countData=cts, colData=coldata, design=~timepoint) converting counts to integer mode Error in
`assays<-`(`*tmp*`, withDimnames = withDimnames, ..., value =
`*vtmp*`) : please use 'assay(x, withDimnames=FALSE)) <- value' or
'assays(x, withDimnames=FALSE)) <- value' when the dimnames on the
supplied assay(s) are not identical to the dimnames on
RangedSummarizedExperiment object 'x'
Previously this method works, it looks after update my R version into 4.0 I can't get it through today. I leaned this simple followed the instructions here.
based on the error I got I changed my code(simply by guess) to:
dds <- DESeqDataSetFromMatrix(countData=cts, colData=colnames(x), design=~timepoint)
Then I got:
dds <- DESeqDataSetFromMatrix(countData=cts, colData=colnames(x), design=~timepoint) Error in `rownames<-`(`*tmp*`, value =
colnames(countData)) : attempt to set 'rownames' on an object with
no dimensions
Here colnames(x) is:
colnames(x)
[1] "E11Rep1" "E11Rep2" "E14Rep1" "E14Rep2" "E18Rep1" "E18Rep2" "P56Rep1" "P56Rep2"
I have seen similar questions here, but no one further explaination, I have read the DEseq manual and most people advised to put all bamfiles together then do featurecounts using Rsubread. I planed to do that if I have got no answer here. Frankly speaking, I never have truly understand the structure generated from the first step in deseq manual. My way of learning is to replace the object with mine by guessing. So when it does not work, I don't know how to solve the problem. If someone can help me get understand this I appreciate a lot.
I have cleaned up your post. Please invest some more effort intoo making your post readable. Keep an eye on the preview and ensure the post stay readable as you edit it.
OK. Thanks, I will do it in this way.
You can input a vector of bam file names, and featureCounts will output a table that already has all the files as columns. To use these counts as input to edgeR or DESeq2, make the gene nanes the row names, and remove any other extra columns that aren't counts.
Thank you for this advice. I have regenerated the bam files and do something like this:
I want to save the counts and do it locally as in the manual, so that I can see what happened in each step. Can you show me how to write the counts (I ran it on server), so that I can copy it to local disc and run it locally?
Many thanks,
Cai
df <- counts$counts
will get you the counts table. You can then save it to a file like you normally would.Thank you! I see the results, it is exactly as you said.