Hi All,
I am planning to conduct differential gene exression analysis on TCGA-COAD/READ samples. Currently the samples are ported to GDC portal and its difficult for me to handle.
My question is GDC portal shows ~ 600 samples for Colon under - data.category = "Transcriptome Profiling", data.type = "Gene expression quantification", workflow.type = "HTSeq - FPKM-UQ" . So how can i download these samples as a MATRIX file so that i can conduct Normal V/s Tumor comparison ?
Secondly How to download clinical data for the samples , every time .JASON file gets downloaded and i really don't know how to handle this. Please keep in mind i am not so good at programming/ IT skills and i am a biologist. I would appreciate if you can share a protocol on this. Thanks a lot for your support!!!
Regards, Dav
Thanks a lot Efstathios, i will keep that in mind. Going further, I have successfully downloaded the files, now i have a file "gdc_download_20171109_051908 " in that i have around 645 sub files, where in each file is .zip when i extract this i will be having something like ENSG00000242268.2 0.0 ENSG00000270112.3 0.0 ENSG00000167578.15 90864.4084112 Now i have one more problem, if i need to have all 645 files as a matrix how can i go about this? should i manually copy paste each of the file? Please help. Regards. Dav
Dear David,
have you followed the above commands exactly ?
because, after this you don't have to do anything with zip files and related stuff-you will have your RangedSummarizedExperiment ready with the raw counts and the phenotype data.
Dear Efstathios, I have done exactly the same, except used TCGA-COAD. and i am using Centos 7 as OS.
I am also Getting the following error Please help, i am stuck here :(
What version of the R package TCGABiolinks do you have ? probably you would have to install the github version after firstly remove any prior installed TCGABiolinks library:
also check:
Hi Efstathios, I have uninstalled TCGAbiolinks and installed and this time it actually worked, Thanks a lot. But, after executing the command I am not able to find any matrix file in the folder, it downloaded 521 files and each file has a .zip file Codes are as follows
Can i save the matrix file ?
Thanks, i am sorry if i am annoying.
Can this matrix file be saved as .csv format?
But i do not understand the logic why someone would like to inspect a csv with more than 600 columns, and near 60.000 rows ? You have some purpose for this ? And don't proceed directly in R with the manipulation of this object ?
I tried my best but could,t save the matrix file & clinical data file :(