Why goseq tutorial calculates median for gene length not sum?!
1
1
Entering edit mode
9.1 years ago
Parham ★ 1.6k

Hi,

I wonder why for calculating gene lengths, they calculate the median of the transcripts in goseq manual? Its under section 5.3. I suppose it should be sum rather than median. The manual can be found here.

http://www.bioconductor.org/packages/release/bioc/vignettes/goseq/inst/doc/goseq.pdf

Cheers!

median goseq • 2.3k views
ADD COMMENT
2
Entering edit mode
9.1 years ago

The goal is to determine if there's a bias by gene length. In order to do this, one needs to derive some sort of gene length measure. There are a couple ways to do that:

  1. Union gene model: The total non-redundant exonic length of a gene
  2. Estimated length: Derived from using expectation maximization, where you then have an estimate of the expected gene length within each sample
  3. Median transcript length: What's used here, which is just the median of the annotated transcript lengths.

If you actually summed the transcripts, as you suggest, then you'd get odd results for genes with many isoforms. The result also wouldn't match any biologically plausible length (e.g., if a gene has 20 isoforms, each ~1kb then your method would yield a length of 20kb rather than a more reasonable ~1kb estimate).

ADD COMMENT
0
Entering edit mode

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6