Question: Download TCGA RNA-Seq Data: RSEM and RPKM
0
gravatar for Z.0121
7 days ago by
Z.01210
Z.01210 wrote:

Hi, I am new in Bioinformatics. Could you guys tell me how to download TCGA-BLCA RNA SEQ RSEM and RPKM by using R or using the TCGA Data Portal or any other effective way?

Thank you!

rna-seq • 128 views
ADD COMMENTlink modified 7 days ago by genomax64k • written 7 days ago by Z.01210
0
gravatar for Kevin Blighe
7 days ago by
Kevin Blighe39k
Republic of Ireland
Kevin Blighe39k wrote:

Hello, what have you tried so far? In a nutshell (Basically), you must do this:

  1. use the filters at the Data Portal to select the files that you want to download
  2. download the file manifest
  3. use GDC Data Transfer Tool with the file manifest to download the files

If this process is too difficult, then use a package like TCGAbiolinks to download the data for you. It is a comprehensive package that covers the needs of the most basic TCGA users.

ADD COMMENTlink modified 7 days ago • written 7 days ago by Kevin Blighe39k

I have already use the TCGAbiolinks to download the data. But when I using the GDCprepare command, it comes a error like this:

GDCpreparequery.blca.trans.pro) Error in [.data.frame(query$results[[1]], query$results[[1]]$cases %in% : undefined columns selected

Could you please tell me where I did wrong?

Thank you!

ADD REPLYlink written 7 days ago by Z.01210

Errors with TCGAbiolinks should be posted on the GitHub page as an 'Issue'. There are a few bugs in the program that are going unfixed.

ADD REPLYlink written 7 days ago by Kevin Blighe39k

For the RSEM and RPKM data with paired normal and tumor tissue, did I use the right command like below? Do I need any other filter?

query.blca.trans.pro <- GDCquery(project = "TCGA-BLCA", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", experimental.strategy = "RNA-Seq")

ADD REPLYlink written 7 days ago by Z.01210

In addition, hope you can help to check whether it's right method I used to download the RSEM and RPKM with paired normal and tumor tissue. Thank you!

query.blca.trans.pro <- GDCquery(project = "TCGA-BLCA", 
                                data.category = "Transcriptome Profiling", 
                                data.type = "Gene Expression Quantification",
                                experimental.strategy = "RNA-Seq")
GDCdownloadquery.blca.trans.pro, method = "api", files.per.chunk = 10)
ADD REPLYlink modified 7 days ago by Kevin Blighe39k • written 7 days ago by Z.01210

Did you get this from a tutorial?

ADD REPLYlink written 7 days ago by Kevin Blighe39k

I wrote this code refer to this website. But it does not specify which argument are necessary for the the RSEM and RPKM with paired normal and tumor tissue.

https://bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/download_prepare.html#examples

ADD REPLYlink written 7 days ago by Z.01210

I cannot be 100% certain for third-party packages like TCGAbiolinks. You should really contact the developers to have 100% certainty about what you are downloading. If you take a look at the issues page, however, you can see that there are many unresolved issues: https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues

This is why I never use these third-party programs. If I need TCGA data, I obtain it direct from the Genomics Data Commons Data Portal: https://portal.gdc.cancer.gov/

ADD REPLYlink written 7 days ago by Kevin Blighe39k

Thank you so much for your help!

ADD REPLYlink written 7 days ago by Z.01210

You are welcome. I know that working with the TCGA data is not easy. However, please ask further questions if you want. If I were you, I would obtain the data direct from the Data Portal via the method that I mentioned earlier (look up). You can then post a new question if you require more help.

ADD REPLYlink written 7 days ago by Kevin Blighe39k

I have already tried to download the data from TCGA Data Portal but I don't know how to import that file to R? Do you have any suggestions? Thank you in advance!

ADD REPLYlink written 6 days ago by Z.01210

Which files have you obtained?

ADD REPLYlink written 6 days ago by Kevin Blighe39k

I got the Transcriptome Profiling RNA Seq Data which named like this 2d6fc33e-c553-427e-9e1c-8008f694b0ce.htseq.counts; 5b39b3d7-2aa0-4dc8-b814-53e42fcc86fb.FPKM-UQ.txt;b142177f-f89e-4e5a-834b-a75e7ab0b618.FPKM.txt; 16c74720-66a9-4c6b-9450-3d12a28ca214.htseq.counts.

Thank you in advance!

ADD REPLYlink written 6 days ago by Z.01210

The files with htseq.counts in their name contain raw counts. You should be able to bind these together into a single data-frame in R. There, you can input them to EdgeR or DESeq2 for normalisation.

ADD REPLYlink modified 6 days ago • written 6 days ago by Kevin Blighe39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 913 users visited in the last hour