Question

Unexpected negative correlation between gene-length and counts

1

Entering edit mode

5 months ago

Ben ▴ 30

I'm wondering if anyone has coming across anything like the following? I'm looking at gene-length bias in the DepMap RNA-seq counts matrix processed by the GTEx pipeline, and noticed that there is a negative relationship between gene length and counts at high gene lengths. At low gene lengths, the expected gene-length bias is seen with counts increasing with increasing gene length. At high gene lengths (>50k), this relationship inverts and counts starts decreasing with increasing gene length. Anyone seen this before or have any ideas why this may be?

Thanks :)

Left is pre-normalisation w/ EDASeq, right is post-normalisation. bias

rna-seq bias normalisation • 826 views

ADD COMMENT • link updated 5 months ago by rfran010 ★ 1.7k • written 5 months ago by Ben ▴ 30

0

Entering edit mode

What do we see in the plots? What is "gene counts" (and on which scale is this, log2?) and what are the lines? Are these samples? Code for this would help.

ADD REPLY • link 5 months ago by ATpoint 89k

0

Entering edit mode

Sorry, these were generated with EDASeq biasPlot() function.

They are loess lines of log-counts against gene length. Yes each line is an individual sample. Low count genes are filtered with limma's filterByExpr() function.

ADD REPLY • link 5 months ago by Ben ▴ 30

0

Entering edit mode

If these are genomic lengths, you wouldn't expect longer genes to have more reads per se.

If these are transcript lengths, how many genes are you actually measuring larger than 50k? See a couple of reference examples here. For me, there's only a few genes that are longer, so naturally there's more variability which could explain differences in bias observed. I usually adjust for this with log transformation.

enter image description here

ADD REPLY • link 5 months ago by rfran010 ★ 1.7k