Question

difficulties in merge multiple txt files from Rsubread feature counts to DESEQ2

0

Entering edit mode

3.7 years ago

Kai_Qi ▴ 130

I have generated 8 bam files using STAR from 4 different developmental stages RNA-seq (each with 2 replicates), then I used Rsubread feature counts to get the corresponding counts data from each bam file:

library(Rsubread)
counts <- featureCounts("file.bam", annot.inbuilt="mm10",
                        nthreads=4, isPairedEnd=T, 
                        countMultiMappingReads=T) 
 write.table(
         cbind(counts$annotation$GeneID,counts$annotation$Length,counts$counts),
         "file.txt",
         quote=F, sep="\t", row.names=F,
         col.names=c("EntrezID", "GeneLength","Count")
         )

Now I tried to get these txt files prepared for DEseq2:

files <- c("E11Rep1.txt", "E11Rep2.txt", "E14Rep1.txt", "E14Rep2.txt", 
           "E18Rep1.txt", "E18Rep2.txt", "P56Rep1.txt", "P56Rep2.txt")
x <- readDGE(files, columns=c(1,3))
cts <- x$counts
head(cts)

        Samples Tags        E11Rep1 E11Rep2 E14Rep1 E14Rep2 E18Rep1 E18Rep2 P56Rep1 P56Rep2   497097        199     190     617     889   
1148    1761     961     884   100503874      82      97     234    
281     395     504     607     527   100038431       0       0      
1       8      18      25      19      21   19888           6      19 
9      17       6       9      13       1   20671        1704    1334 
1544    2321    1412    2160    1657    1539   27395        7984   
5541    3778    5846    3126    5332    1736    1618

coldata <- data.frame(timepoint = rep(c("E11", "E14", "E18", "P56"), each=2))

dds <- DESeqDataSetFromMatrix(countData=cts, colData=coldata, design=~timepoint)

Then I got error:

dds <- DESeqDataSetFromMatrix(countData=cts, colData=coldata, design=~timepoint) converting counts to integer mode Error in
`assays<-`(`*tmp*`, withDimnames = withDimnames, ..., value =
`*vtmp*`) :    please use 'assay(x, withDimnames=FALSE)) <- value' or
'assays(x, withDimnames=FALSE)) <-   value' when the dimnames on the
supplied assay(s) are not identical to the dimnames on  
RangedSummarizedExperiment object 'x'

Previously this method works, it looks after update my R version into 4.0 I can't get it through today. I leaned this simple followed the instructions here.

based on the error I got I changed my code(simply by guess) to:

dds <- DESeqDataSetFromMatrix(countData=cts, colData=colnames(x), design=~timepoint)

Then I got:

dds <- DESeqDataSetFromMatrix(countData=cts, colData=colnames(x), design=~timepoint) Error in `rownames<-`(`*tmp*`, value =
colnames(countData)) :    attempt to set 'rownames' on an object with
no dimensions

Here colnames(x) is:

colnames(x) 

[1] "E11Rep1" "E11Rep2" "E14Rep1" "E14Rep2" "E18Rep1" "E18Rep2" "P56Rep1" "P56Rep2"

I have seen similar questions here, but no one further explaination, I have read the DEseq manual and most people advised to put all bamfiles together then do featurecounts using Rsubread. I planed to do that if I have got no answer here. Frankly speaking, I never have truly understand the structure generated from the first step in deseq manual. My way of learning is to replace the object with mine by guessing. So when it does not work, I don't know how to solve the problem. If someone can help me get understand this I appreciate a lot.

R RNA-Seq rna-seq DEseq2 • 2.2k views

ADD COMMENT • link 3.7 years ago by Kai_Qi ▴ 130

1

Entering edit mode

I have cleaned up your post. Please invest some more effort intoo making your post readable. Keep an eye on the preview and ensure the post stay readable as you edit it.

ADD REPLY • link 3.7 years ago by Ram 43k

0

Entering edit mode

OK. Thanks, I will do it in this way.

ADD REPLY • link 3.7 years ago by Kai_Qi ▴ 130

0

Entering edit mode

You can input a vector of bam file names, and featureCounts will output a table that already has all the files as columns. To use these counts as input to edgeR or DESeq2, make the gene nanes the row names, and remove any other extra columns that aren't counts.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Thank you for this advice. I have regenerated the bam files and do something like this:

library(Rsubread)
bamfile <- c("E11Rep1.bam", "E11Rep2.bam", "E14Rep1.bam", "E14Rep2.bam", "E18Rep1.bam", "E18Rep2.bam", "AdultRep1.bam", "AdultRep2.bam")
counts <- featureCounts(bamfile, annot.inbuilt="mm10", nthreads=4, isPairedEnd=T, countMultiMappingReads=T)

I want to save the counts and do it locally as in the manual, so that I can see what happened in each step. Can you show me how to write the counts (I ran it on server), so that I can copy it to local disc and run it locally?

Many thanks,

Cai

ADD REPLY • link 3.7 years ago by Kai_Qi ▴ 130

1

Entering edit mode

df <- counts$counts will get you the counts table. You can then save it to a file like you normally would.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Thank you! I see the results, it is exactly as you said.

ADD REPLY • link 3.7 years ago by Kai_Qi ▴ 130

h.mon · Answer 1 · 2020-08-13

0

Entering edit mode

3.7 years ago

swbarnes2 14k

Can you explain why you are using cbind there? If counts is a table, can't you just excerpt the columns you want, instead of ripping them apart and joining them together?

I can't tell from how your post is formatted if your counts file is right, but you should have genes as row names, and samples as column names, and it's not clear to me what those "sample" and "tags" columns are doing there.

readDGE is an edgeR function, not a deseq2 function. Are you sure it's what you should be using?

I don't think your coldata is making a proper data frame; it looks like just one column of data; you haven't connected the values to sample names.

I strongly suggest you stop what you are doing with your data, and pick either edger or DESeq, and work all the way through their sample data, stopping to look one in a while at what the objects you are making look like.

ADD COMMENT • link 3.7 years ago by swbarnes2 14k

1

Entering edit mode

1.

Can you explain why you are using cbind there? If counts is a table, can't you just excerpt the columns you want, instead of ripping them apart and joining them together?

I use cbind to get the counts from each individual bamfiles, and use edgeR and limma to do the differential gene expression, as my friends told me. But in that manual, both me and my friend can not figure out what the design matrix and contr.matrix mean (I mean the number of 0, 1 in the matrix). Also I am thinking analyze isoform level change which the limma manual does not do. Here is the manual I used.

2.

readDGE is an edgeR function, not a deseq2 function. Are you sure it's what you should be using?

I have went over the deseq manual, but always wondered how can I import my individual txt files to deseq at the very first step. During this process, I feel the readDGE function can generate a list (which is x in my above code). So I think using x$count is similar to the countdata in DEseq2 (Actually I checked it, it is in same shape), in this way, I am able to put it into DEseq to do further analysis. All I need is to add colnames and design. So I use readDGE.

In the DEseq manual, the example package already contains the data, so it is hard to simulate my situation.

ADD REPLY • link updated 3.7 years ago by h.mon 35k • written 3.7 years ago by Kai_Qi ▴ 130

0

Entering edit mode

cbind doesn't get anything from anything. And there are no counts in bam files. Are you using edgeR or deseq? You have to pick one. readDGE makes a DGE object, not a list. Look, your code clearly is not working. Copy-pasting more code you do not understand is not going to solve anything. Go through a tutorial, so you can see what working input files look like.

ADD REPLY • link 3.7 years ago by swbarnes2 14k

0

Entering edit mode

Thank you for your prompt reply. But I did not use cbind to get counts. From the code above, I use featurecounts to get a file called "counts", then extract the geneID and counts, then using cbind to make the count-file for each individual sample.

My real question is if there is a way to use individual RSubread generated counts files for DEseq, and if yes, how to import them to Deseq2. My previous trying is to use readDGE and then extract the counts from DGE object.

I will go through DEseq2 if there is answer is no. Sometimes our server capacity is too low (100G) while my bamfiles are about 30G each, so I have to remove them when the grace period is about to end. Under this circumstances, using txt files from Rsubread is more convenient.Now it looks I have to generate bam files again.

ADD REPLY • link 3.7 years ago by Kai_Qi ▴ 130

0

Entering edit mode

Of course you can. Read the tutorial, learn what a count file is supposed to look like, make your count data look like that.

ADD REPLY • link 3.7 years ago by swbarnes2 14k

0

Entering edit mode

Thank you very much! I have looked at the structure of my count matrix. They look like the same. So I reinstall the old version of R (3.6.3). It does not show any error:

> dds <- DESeqDataSetFromMatrix(countData=cts, colData=samples, design= ~ timepoint)
converting counts to integer mode

I am going to go with this object and run the following steps. Thank you!

ADD REPLY • link 3.7 years ago by Kai_Qi ▴ 130