Different Gene Lengths and Expected Gene Lengths from Sample to Sample

1

Entering edit mode

3.8 years ago

jmannhei ▴ 10

Hi all, I have come across something I have never seen before. I am working with some data from an outside source which appears to be processed RNA-seq files. Like other processed RNA-seq files I have ran into they are tab delimited files with columns for gene length, expected gene length, TPM, and counts for each probe identifier. Here is where things get weird, for any two samples and for the same probe set identifier the gene lengths are different and difference can be quite large! I have never seen this, the gene lengths have always been the same when working from sample to sample the expected lengths may vary a little bit. This ultimately has an effect on how the TPM is calculated and just makes me wonder what I am I missing. Does anybody have a clue why this might be the case.

rnaseq • 917 views

ADD COMMENT • link updated 3.8 years ago by swbarnes2 15k • written 3.8 years ago by jmannhei ▴ 10

0

Entering edit mode

What are the exact column names in the file? Do you know which tool was used to generate these files? I think I've seen RSEM do this with ENCODE GTF files (different length values for the same gene in different samples) so I am interested in your question as well.