Question: Warning message: In y/gene.length.kb : longer object length is not a multiple of shorter object length? Any ideas?
1
gravatar for tud55122
3.4 years ago by
tud5512220
United States
tud5512220 wrote:

Hi,

I'm new to RNA-seq analysis. I'm using EdgeR to generate RPKMs.Everything works fine but the last step, there is a warning message saying that:

Warning message:

In y/gene.length.kb :
  longer object length is not a multiple of shorter object length

When I checked the RPKMs generated, the values are kinda of skewed. There are high variations even after the same conditions but the raw counts look fine.

Any idea? Thanks, Hang

edger rna-seq rpkm • 1.7k views
ADD COMMENTlink modified 3.4 years ago by Michael Dondrup46k • written 3.4 years ago by tud5512220

Thanks for your reply, guys. Do you know how to fix the problem?

Here are the scripts I used

d = DGEList(counts=counts, group=samples$condition)
d = calcNormFactors(d)
length.genes=read.table("gene_length_mouse.txt",sep="\t",header=T)
rpkm.gene=rpkm(d, length.genes[length.genes$Gene %in% rownames(d),2],normalized.lib.sizes=TRUE, log=F)

Thanks, Hang

ADD REPLYlink modified 3.4 years ago by Michael Dondrup46k • written 3.4 years ago by tud5512220

Looks like d contains some rows not in your text file. Where you subset length.genes[] with the %in% command, you need to also subset d with the converse. You can only get RPKM for genes where you have the length in the text file. And for that matter, be extra careful the two lists are sorted the same! Maybe make them unified in another command and dont use the %in% command inside the rpkm function.

ADD REPLYlink written 3.4 years ago by karl.stamm3.5k
2
gravatar for Shab86
3.4 years ago by
Shab86240
Helsinki
Shab86240 wrote:

Could it be that you are providing different number of gene lengths than there are genes in your matrix?

ADD COMMENTlink written 3.4 years ago by Shab86240

That's just what the error says. The "y" vector is a different length than the "gene.length.kb". It talks about multiples, because in R you're allowed to divide a vector of different lengths, because the shorter one is recycled, like (1,2,3,4,5,6) / (1,2) = (1/1,2/2,3/1,4/2,5/1,6/2)

ADD REPLYlink written 3.4 years ago by karl.stamm3.5k

Thanks for your reply, guys. Do you know how to fix the problem?

Here are the scripts I used d = DGEList(counts=counts, group=samples$condition) d = calcNormFactors(d) length.genes=read.table("gene_length_mouse.txt",sep="\t",header=T) rpkm.gene=rpkm(d, length.genes[length.genes$Gene %in% rownames(d),2],normalized.lib.sizes=TRUE, log=F)

ADD REPLYlink written 3.4 years ago by tud5512220
0
gravatar for Michael Dondrup
3.4 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

The problem is this:

length.genes$Gene %in% rownames(d)

and the file "gene_length_mouse.txt".

If some genes in the gene length file cannot be matched, then you end up with an unusable length vector. In addition I would like to stress that it might be better to have now gene lengths than bad ones. Gene lengths are rather a confounding factor, the combined exon lengths might be better to use here.

ADD COMMENTlink written 3.4 years ago by Michael Dondrup46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1171 users visited in the last hour