Why doesn't TCGA .rsem.isoforms.results have a length column?
0
0
Entering edit mode
3.8 years ago

Hi,

I'm analyzing TCGA data and need the rsem output files. I thought those would be the files I download from GDC, which end with .rsem.isoforms.results. However, this is their format (first 5 lines of one of them):

isoform_id  raw_count   scaled_estimate
uc011lsn.1  0.00    0
uc010unu.1  16.09   4.35780241451249e-07
uc010uoa.1  4.00    1.08327337941115e-07
uc002bgz.2  21.91   4.34097970946371e-07

First, I'm not sure why the files have the RSEM output names (.rsem.isoforms.results), but are not actually RSEM outputs, according to this post, because they don't have the length column. Is there a way I can download the correct RSEM outputs from GDC legacy data? Second, I'm not sure if I can fix this manually? From what I understood, the length column is just the length of the transcript, so can I then just find out the length of each isoform ID (e.g. from UCSC) and add a corresponding column?

These files were downloaded from GDC API using Python.

I would really appreciate the help of someone more experienced.

TCGA RNA-Seq alternative splicing RSEM R • 866 views
ADD COMMENT

Login before adding your answer.

Traffic: 2697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6