Unexpected negative correlation between gene-length and counts
0
1
Entering edit mode
3 months ago
Ben ▴ 30

I'm wondering if anyone has coming across anything like the following? I'm looking at gene-length bias in the DepMap RNA-seq counts matrix processed by the GTEx pipeline, and noticed that there is a negative relationship between gene length and counts at high gene lengths. At low gene lengths, the expected gene-length bias is seen with counts increasing with increasing gene length. At high gene lengths (>50k), this relationship inverts and counts starts decreasing with increasing gene length. Anyone seen this before or have any ideas why this may be?

Thanks :)

Left is pre-normalisation w/ EDASeq, right is post-normalisation. bias

rna-seq bias normalisation • 661 views
ADD COMMENT
0
Entering edit mode

What do we see in the plots? What is "gene counts" (and on which scale is this, log2?) and what are the lines? Are these samples? Code for this would help.

ADD REPLY
0
Entering edit mode

Sorry, these were generated with EDASeq biasPlot() function.

They are loess lines of log-counts against gene length. Yes each line is an individual sample. Low count genes are filtered with limma's filterByExpr() function.

ADD REPLY
0
Entering edit mode

If these are genomic lengths, you wouldn't expect longer genes to have more reads per se.

If these are transcript lengths, how many genes are you actually measuring larger than 50k? See a couple of reference examples here. For me, there's only a few genes that are longer, so naturally there's more variability which could explain differences in bias observed. I usually adjust for this with log transformation.

enter image description here

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 1766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6