Entering edit mode
28 days ago
cthangav
▴
110
I'm trying to use this tool: https://github.com/cpreid2/gdc-rnaseq-tool using the provided test file:
python3 gdc-rnaseq-tool.py Test_Manifest.txt
But when I run this command no files get downloaded. I see that the repository is archived an may not work anymore. Does this tool work for anyone?
This tool was developed 5 years ago. GDC has changed RNA-Seq quantification file format 2.5 years ago during Gencode v36 update
Oh I see, thank you. Do you recall what part of the format they changed? Could this python script still be used if it was adjusted to the new format?
The old version has count, FPKM, and UQFPKM in 3 different files. The new format has all 3 and some other normalization together in one file, as different columns. I am sure you would be able to modify the code. But it's so simple that if you have bandwidth to read the code, you probably can easily just cut those columns and aggregate yourself.
Also there are some 3rd party GDC tools like TCGAbiolinks/ GenomicsDataCommons around there. I am not sure if anyone of them can do what you want already.
Thank you this is very helpful. I am trying to combine a large number of samples from a manifest file into an expression matrix and it looks like TCGAbiolinks has a method for doing this: https://bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/download_prepare.html
From what I can see GenomicsDataCommons has a download tool but it does not appear to have a way of merginng samples into an expression matrix of genes and samples. https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Data_Download_and_Upload/