couting rpkm by edgr package
1
1
Entering edit mode
8.4 years ago
zizigolu ★ 4.3k

Hi,

I have a raw counts file extracted by featurecounts tool and also a gene length matrix. I was going to count rpkm for my 76 samples then I did like below

rawcounts <- read.table("rawcounts.txt", header = T, sep = "\t")
head(rawcounts[,1:4])
     Geneid SRR018347 SSR019035 SRR074122
1 AT1G01010        31        37         4
2 AT1G01020       153       211        19
3 AT1G01030        10        13         0
4 AT1G01040        88       106         3
5 AT1G01046         0         0         0
6 AT1G01050       228       272       195

dim(rawcounts)
[1] 33602    77

gene-length <- read.table("gene-length.txt")
head(gene-length[,1:4])
       V1   V2   V3   V4
V1 Length 1688 1774 1905

dim(gene-length )
[1]     1 33603

d <- DGEList(counts=rawcounts)
cpm(d)
cpm(d,log=TRUE)

d$genes$Length <- gene-length
rpkm(d)

Warning message:
In Ops.factor(left, right) : '/' not meaningful for factors

Even I tried with transposed gene length matrix but again I got error

What is my fault please?

RNA-Seq rpkm R • 6.0k views
ADD COMMENT
1
Entering edit mode

The "not meaningful for factors" and the output of head(gene-length[,1:4]) show you what the problem is.

ADD REPLY
0
Entering edit mode

In addition, pay attention to dim(rawcounts) vs dim(gene-length).

ADD REPLY
0
Entering edit mode

Thank you,

I tried many corrections for example I added a column as genes length to my raw counts file the header became like below

 head(rawcounts[,1:4])
      genes Length SRR018347 SSR019035
1 AT1G01010   1688        31        37
2 AT1G01020   1774       153       211
3 AT1G01030   1905        10        13
4 AT1G01040   6254        88       106
5 AT1G01046    207         0         0
6 AT1G01050    976       228       272

> dim(rawcounts)
[1] 32550    78

dim(gene-length)

> dim(mycounts1)
[1] 32550     1

Then I tried

d <- DGEList(counts=rwacounts)
d$genes$Length <- gene-length

rpkm(d)
9036  NA
9037  NA
9038  NA
9039  NA
9040  NA
9998  NA
9999  NA
10000 NA
 [ reached getOption("max.print") -- omitted 22550 rows ]
Warning message:
In Ops.factor(left, right) : '/' not meaningful for factors

I tried with or with adding gene-length column. Then I tried like below

y <- DGEList(counts=rawcounts,genes=data.frame(Length=gene-length))
y <- calcNormFactors(y)
RPKM <- rpkm(y)
Error in rpkm.DGEList(y) : Gene lengths not found

But I read the gene-length already even I tried by adding the length as an additional column to raw counts. Actually I can't figure out what is my fault. For example in d$genes$Length <- c(1000,2000,500,1500,3000), what is genes?

Please help me because this time really it is not a lazy post. I promise I am trying for many hours, I have gene length, I have raw counts but I cant figure out what to do

ADD REPLY
3
Entering edit mode

Suppose you start with:

> head(gene-length[,1:4])

       V1   V2   V3   V4
V1 Length 1688 1774 1905

Note that V1 is "Length", that likely made the whole dataframe a factor rather than numeric. So:

geneLength <- as.numeric(t(gene-length[1, -1]))

or something along those lines.

ADD REPLY
0
Entering edit mode

thanks a lot Devon,

your tip worked well

ADD REPLY
0
Entering edit mode

I just noticed: your variable name gene-length is invalid, and if you did not get an error means you are posting untested code, different from the one you are having real problems. This may result in a wild goose chase.

ADD REPLY
0
Entering edit mode

but gene-length variable only was a name to ask my question clearly, if not what I posted below is the code I did finally in which I named length of gene as mycounts1. if you mean the same mycounts1??

ADD REPLY
1
Entering edit mode
8.4 years ago
zizigolu ★ 4.3k

The short code I used to counts rpkm with edgeR package

# rpkm normalization by edgeR package
# I need a file containing gene length that I extracted from featurecounts tool output
# reading gene length file
mycounts1 <- read.csv("length.txt", stringsAsFactors = FALSE)
# read as numeric
mycounts1 <- as.numeric(mycounts1$Length)
# transposing the gene length file
tmycounts1 <- t(mycounts1)
# setting as numeric
sapply(tmycounts1, class)
# reading the rawcounts file
mycounts <- read.table("rawcounts.txt", header = T, sep = "\t")
# watching the head of uploaded file
head(mycounts[,1:4])
# watching the dimension of matrix
dim(mycounts)
# assigning the column one as mycounts rowname
rownames(mycounts)<- mycounts[,1]
# read as matrix
as.matrix(mycounts)
# viewing mycounts class
class(mycounts)
# setting as numeric
sapply(mycounts, class)
# removing the column on
mycounts <- mycounts[,-1]
# watching the head of rawcounts file
head(mycounts[,1:4])
# applying edgeR function
library(edgeR)
# assigning the rawcounts to a new variable
d <- DGEList(counts=mycounts)
# reading gene length file as a matrix
d$genes$Length <- c(tmycounts1)
# rpkm function
mycounts <- rpkm(d)
# writing the result
write.table(mycounts, file = "rpkm.txt", dec = ".", sep = "\t", quote = FALSE)
ADD COMMENT
0
Entering edit mode

Could I please ask, what does the gene lengths file look like?

Should I have a "rawcounts.txt" file that looks like this:

                 Sample1     Sample2       Sample3
Gene1      10               30                40
Gene2      20               40                50
Gene3      30               60                60

and then what should the "lengths.txt" file look like?

I made two lengths.txt files, one that looks like this:

Length,1000,2000,3000

(where each item in the list corresponds to the corresponding line in the raw counts file (so gene1 length = 1000, gene2 length = 2000, gene3 length = 3000), but I ran into errors, and then I also made a lengths.txt file like this:

Length
1000
2000
3000

Again, every row in this file corresponds to the gene length in the raw counts file (so Gene1 has length 1000, Gene2 has length 2000, Gene3 has length 3000). i'm running into some problems, but before I look more into what I'm doing wrong, I just want to check I have the file formats right.

Thanks.

ADD REPLY
0
Entering edit mode
gene-length <- read.table("gene-length.txt")

head(gene-length[,1:4])

       V1   V2   V3   V4
V1 Length 1688 1774 1905
dim(gene-length )

[1]     1 33603

as you see my read-length file has a row and 33606 columns corresponding to the number of my genes

ADD REPLY

Login before adding your answer.

Traffic: 3296 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6