Question

TCGA data from Xena browser and Broad (GDAC Firehose)

0

Entering edit mode

4.3 years ago

immunogirl2 ▴ 20

I am an immunologist with very little background in bioinformatics; I can use basic functions in R. So, please bear with me. 1. I have classified TCGA breast cancer patient IDs based on their immune profiles into two groups. 2. I have downloaded TCGA breast cancer RNA-seq data from Xena and Firehose (level 3 normalised and non-normalised). Now, I want to arrange gene expression data into two groups based on my classification in STEP 1. The simple thing i could come up with is to open Firehose data in excel and copy-paste act patients gene expression data one by one into new excel sheet. But, due to data size (cell numbers) i am going crazy. Please help me out here and suggest a simple way to do this in R. I have all the patients IDs already copy-pasted into two groups in excel sheet. Thanks in advance.

RNA-Seq R TCGA • 1.6k views

ADD COMMENT • link updated 4.3 years ago by Kevin Blighe 87k • written 4.3 years ago by immunogirl2 ▴ 20

score 2 · Accepted Answer · 2020-01-03

2

Entering edit mode

4.3 years ago

Kevin Blighe 87k

Hey, you just need to do:

Save patient ID lists as TSV or CSV and then read into R via read.table(), read.csv(), fread(), or something else. Eventually you should save these in a vector in R as, e.g., group1IDs and group2IDs
Read the expression data into R - it should already be downloaded as TSV (I think)
Subset the expression data based on the patient IDs

Small queries relating to each step can be found via a search in your search engine of choice.

Kevin

ADD COMMENT • link 4.3 years ago by Kevin Blighe 87k

1

Entering edit mode

Thank a lot. I'll try as you suggested.

ADD REPLY • link 4.3 years ago by immunogirl2 ▴ 20

1

Entering edit mode

It worked fine. Thanks again.

ADD REPLY • link 4.3 years ago by immunogirl2 ▴ 20