3.8 years ago by
United States
Methods that implement effective length correction all avoid generating negative effective lengths. Actually, they do this in quite a few different ways (depending on the tool).
Actually, however, it might be most useful to think of the "effective length" as a property of both a transcript and a specific read, rather than a transcript alone. Consider a transcript with length m and a read that maps to this transcript with length (total distance between leftmost and rightmost mapped base) n. In the case that n > m (e.g. the read overhangs the transcript) we can assume n = m --- this is a rare case and likely an artifact of mapping or misannotation etc. Then, this particular read can start in m-n + 1 different locations. So, from the perspective of this particular read, the effective length of the transcript is m - n + 1. Now, a transcript will typically have many reads mapping to it, and we can define (as Li et al. do in RSEM) the expect effective length of the transcript as simply the expected value of the effective length of a transcript, averaged over all reads that map to that transcript. There are other approximations of effective length that have different properties in terms of e.g. computational convenience, but I find the notion of expected effective length to be the most straightforward to understand. Moreover, in this case, you can see why the quantity is never negative; any read that maps to a transcript must have at least one potential start site, though often there could have been many. I think this perspective also helps show why the effective length makes sense to consider rather than the raw length. You can read a slightly longer explanation (with nice math typesetting) at my blog.
•
link
written
3.8 years ago by
Rob ♦ 4.6k