Download TCGA RNA-Seq Data: RSEM and RPKM
1
0
Entering edit mode
5.2 years ago
Z.0121 • 0

Hi, I am new in Bioinformatics. Could you guys tell me how to download TCGA-BLCA RNA SEQ RSEM and RPKM by using R or using the TCGA Data Portal or any other effective way?

Thank you!

rna-seq • 5.6k views
ADD COMMENT
0
Entering edit mode
5.2 years ago

Hello, what have you tried so far? In a nutshell (Basically), you must do this:

  1. use the filters at the Data Portal to select the files that you want to download
  2. download the file manifest
  3. use GDC Data Transfer Tool with the file manifest to download the files

If this process is too difficult, then use a package like TCGAbiolinks to download the data for you. It is a comprehensive package that covers the needs of the most basic TCGA users.

ADD COMMENT
0
Entering edit mode

I have already use the TCGAbiolinks to download the data. But when I using the GDCprepare command, it comes a error like this:

GDCpreparequery.blca.trans.pro) Error in [.data.frame(query$results[[1]], query$results[[1]]$cases %in% : undefined columns selected

Could you please tell me where I did wrong?

Thank you!

ADD REPLY
0
Entering edit mode

Errors with TCGAbiolinks should be posted on the GitHub page as an 'Issue'. There are a few bugs in the program that are going unfixed.

ADD REPLY
0
Entering edit mode

For the RSEM and RPKM data with paired normal and tumor tissue, did I use the right command like below? Do I need any other filter?

query.blca.trans.pro <- GDCquery(project = "TCGA-BLCA", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", experimental.strategy = "RNA-Seq")

ADD REPLY
0
Entering edit mode

In addition, hope you can help to check whether it's right method I used to download the RSEM and RPKM with paired normal and tumor tissue. Thank you!

query.blca.trans.pro <- GDCquery(project = "TCGA-BLCA", 
                                data.category = "Transcriptome Profiling", 
                                data.type = "Gene Expression Quantification",
                                experimental.strategy = "RNA-Seq")
GDCdownloadquery.blca.trans.pro, method = "api", files.per.chunk = 10)
ADD REPLY
0
Entering edit mode

Did you get this from a tutorial?

ADD REPLY
0
Entering edit mode

I wrote this code refer to this website. But it does not specify which argument are necessary for the the RSEM and RPKM with paired normal and tumor tissue.

https://bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/download_prepare.html#examples

ADD REPLY
0
Entering edit mode

I cannot be 100% certain for third-party packages like TCGAbiolinks. You should really contact the developers to have 100% certainty about what you are downloading. If you take a look at the issues page, however, you can see that there are many unresolved issues: https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues

This is why I never use these third-party programs. If I need TCGA data, I obtain it direct from the Genomics Data Commons Data Portal: https://portal.gdc.cancer.gov/

ADD REPLY
0
Entering edit mode

Thank you so much for your help!

ADD REPLY
0
Entering edit mode

You are welcome. I know that working with the TCGA data is not easy. However, please ask further questions if you want. If I were you, I would obtain the data direct from the Data Portal via the method that I mentioned earlier (look up). You can then post a new question if you require more help.

ADD REPLY
0
Entering edit mode

I have already tried to download the data from TCGA Data Portal but I don't know how to import that file to R? Do you have any suggestions? Thank you in advance!

ADD REPLY
0
Entering edit mode

Which files have you obtained?

ADD REPLY
0
Entering edit mode

I got the Transcriptome Profiling RNA Seq Data which named like this 2d6fc33e-c553-427e-9e1c-8008f694b0ce.htseq.counts; 5b39b3d7-2aa0-4dc8-b814-53e42fcc86fb.FPKM-UQ.txt;b142177f-f89e-4e5a-834b-a75e7ab0b618.FPKM.txt; 16c74720-66a9-4c6b-9450-3d12a28ca214.htseq.counts.

Thank you in advance!

ADD REPLY
0
Entering edit mode

The files with htseq.counts in their name contain raw counts. You should be able to bind these together into a single data-frame in R. There, you can input them to EdgeR or DESeq2 for normalisation.

ADD REPLY

Login before adding your answer.

Traffic: 1673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6