I'm analyzing TCGA data and need the rsem output files. I thought those would be the files I download from GDC, which end with .rsem.isoforms.results. However, this is their format (first 5 lines of one of them):
isoform_id raw_count scaled_estimate uc011lsn.1 0.00 0 uc010unu.1 16.09 4.35780241451249e-07 uc010uoa.1 4.00 1.08327337941115e-07 uc002bgz.2 21.91 4.34097970946371e-07
First, I'm not sure why the files have the RSEM output names (.rsem.isoforms.results), but are not actually RSEM outputs, according to this post, because they don't have the length column. Is there a way I can download the correct RSEM outputs from GDC legacy data? Second, I'm not sure if I can fix this manually? From what I understood, the length column is just the length of the transcript, so can I then just find out the length of each isoform ID (e.g. from UCSC) and add a corresponding column?
These files were downloaded from GDC API using Python.
I would really appreciate the help of someone more experienced.