Question: Drug resistant vs. Drug sensitive data retreival from TCGA
0
gravatar for fawazfebin
4 months ago by
fawazfebin40
fawazfebin40 wrote:

Hi Can anyone guide me on how to retrieve drug resistant and drug sensitive patient data from the TCGA database ? Great thanks in advance.

ADD COMMENTlink modified 10 days ago • written 4 months ago by fawazfebin40
7
gravatar for Kevin Blighe
4 months ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

To get you started, please take a look at my previous answers:

You should be able to infer (from the available clinical data) the patients who relapsed while taking therapeutic agents.

Kevin

ADD COMMENTlink written 4 months ago by Kevin Blighe56k
2

Kevin is right, but I think you need to be able to make sure that you are careful about making sure that you understand the metadata.

For example, I think the clinical data is usually after the 1st bio-specimen collection. So, if you were looking for pre-existing changes (before treatment), then that could be an issue. In other words, you might need to be careful about think whether gene expression changes are pre-treatment or post-treatment (and whether patients have had multiple drugs and/or multiple follow-ups, etc.).

ADD REPLYlink modified 4 months ago • written 4 months ago by Charles Warden7.6k
1

Thank you Charles for the additional guidance.

ADD REPLYlink written 4 months ago by fawazfebin40

Thanks for the guidance ,Kevin. I did get the drug information from the clinical_drug file. Information on disease relapse was found in the clinical_follow_up file. How can I combine the information and get the barcodes of the patients, who relapsed after treatment with a particular kind of drug?

ADD REPLYlink written 4 months ago by fawazfebin40
1

You should be able to connect the different clinical data spreadsheets via the UUID and Barcode (?)

ADD REPLYlink written 4 months ago by Kevin Blighe56k
1

I merged the clinical_drug and follow_up file using the merge function.

merged_data <- merge(follow_up,clinical_drug,by = "bcr_patient_barcode")

Hopefully I can get the corresponding barcodes and do a differential expression analysis with gene expression data? Great thanks!

ADD REPLYlink modified 4 months ago • written 4 months ago by fawazfebin40
1

Yes, that looks good. Can you match this to the expression data? The expression data should have the same barcode (?).

ADD REPLYlink written 4 months ago by Kevin Blighe56k

Yes, extracted the barcodes for a particular drug and grouped as 'resistant' and 'sensitive' based on new_tumour_event. There are multiple follow_up_barcodes and multiple drugs for the same patient barcode. And also there are two or three FPKM files for the same barcode when explored for expression data in the GDC portal.

Can you please guide me on how to select the appropriate barcodes and FPKM files? Hope the FPKM files can be further analyzed for differential expression using edgeR (?).

ADD REPLYlink written 3 months ago by fawazfebin40
1

Please try to obtain HTseq raw count files, and then normalise those in EdgeR. You cannot use FPKM expression values for differential expression analysis.

ADD REPLYlink modified 3 months ago • written 3 months ago by Kevin Blighe56k

Ok. Thank you Kevin.

ADD REPLYlink written 3 months ago by fawazfebin40

The raw count files were fed into edgeR and normalised. The two groups are not well separated in the MDS plot. Should I remove the cases that not separated and then proceed? Great thanks in advance.

MDSplot

ADD REPLYlink modified 3 months ago • written 3 months ago by fawazfebin40
1

No, you should not remove them without major justification. Can you generate a PCA bi-plot for PC1 versus 2?

ADD REPLYlink modified 3 months ago • written 3 months ago by Kevin Blighe56k

PCA was conducted on the raw counts matrix(matrix_5FU) with 14 different cases.

p.5FU <- pca(matrix_5FU[,2:14], removeVar = 0.1)

-- removing the lower 10% of variables based on variance

screeplot(p.5FU)

Warning messages: 1: Removed 2 rows containing missing values (geom_path). 2: Removed 2 rows containing missing values (geom_point).

Screeplot

Biplot

ADD REPLYlink modified 3 months ago • written 3 months ago by fawazfebin40
1

Okay, that sample on the right is definitely an outlier by PCA. What was your input to the pca() function, though? If you are using EdgeR, it should be the log CPM expression values.

ADD REPLYlink written 3 months ago by Kevin Blighe56k

Sorry, it was the raw count matrix which was given as input. I am attaching the sreeplot and biplot of PCA which was done on log CPM expression values.

> matrix_5FU <- read.delim('5FU.csv',sep = ',',header = TRUE)

> Group <- c(1,1,1,1,1,1,1,2,2,2,2,2,2)

> gns5FU <- select(org.Hs.eg.db, keys=rownames(matrix_5FU),columns=c("SYMBOL","GENENAME"), keytype="ENTREZID")
'select()' returned 1:1 mapping between keys and columns

> y.5FU <- DGEList(counts=matrix_5FU[,2:14], genes=gns5FU,group = Group)

> CPM.5FU.log <- cpm(y.5FU,log = TRUE) 

> screeplot(p.5FU.log)

> biplot(p.5FU.log)

Screeplot

Biplot

ADD REPLYlink modified 3 months ago by genomax80k • written 3 months ago by fawazfebin40

Can you please guide me on the interpretation of the newly created biplot? I wasn't able to label the cases in the plot as well. Thanks!

ADD REPLYlink written 3 months ago by fawazfebin40
1

It looks like you are using PCAtools, so, you can set labels via the lab parameter.

I did not reply to your earlier comment because I am giving you the opportunity to make your own interpretation. One could argue that the sample on the right is an outlier that may affect your statistical interpretations; however, for now, I would not remove anything from the dataset.

ADD REPLYlink modified 3 months ago • written 3 months ago by Kevin Blighe56k

I didnt want to remove the samples, but needed an expert opinion! Great thanks for your time, Kevin.

Is there any command to fetch raw HTseq counts from TCGA corresponding to a considerable number of patient barcodes?

ADD REPLYlink modified 3 months ago • written 3 months ago by fawazfebin40

fawazfebin : Please use these directions to post images. How to add images to a Biostars post

ADD REPLYlink written 3 months ago by genomax80k

Sure. Please excuse me for the inconvenience caused.

ADD REPLYlink written 3 months ago by fawazfebin40

@ fawazfebin, could you please tell me how you grouped the drugs into 'resistant' and 'sensitive' , based on which information and from which data , i have downloaded the follow up dataset from GDAC but i cant found these informations

ADD REPLYlink written 4 weeks ago by Chaimaa180
1

Under the 'Clinical' files you can find different .txt files which gives you information about the drug used (clinical_drug .txt file) and disease recurrence ( follow_up .txt file). The 'new_tumor_event' cases can be grouped as 'resistant' and the cases without a 'new_tumor_event' and that are 'tumor_free' can be grouped as sensitive.

ADD REPLYlink written 18 days ago by fawazfebin40

@fawazfebin, hi dear, i'm confused, do you mean the informations in this column "new_ tumor_ event_ after_ initial_ treatment or the one which have "WITH TUMOR" and "TUMOR FREE" as shown in the below figure? and after you group them into those 2 groups, how you label them for further analysis as 0 and 1 or as categorical variables , say if i want to use lasso or logistic regression to find the features related between these clinical informations and gene expressio data? Appreciate your help!

Figure

ADD REPLYlink modified 10 days ago • written 10 days ago by Chaimaa180
1
gravatar for fawazfebin
10 days ago by
fawazfebin40
fawazfebin40 wrote:

Hi @Chaimaa

I used both of the columns for categorisation. Cases with ' new_tumor_event_indicator' as 'YES' & tumor_status' as 'WITH TUMOR' were grouped as 'resistant'. The sensitive cases would correspond to the absence of new tumor event & tumor free status. R commands are available to select rows with specific criteria in a particular column. After getting the barcodes for each group, you can use TCGAbiolinks for downloading and analysing the data. Or else you can use edgeR for differentially expression analysis. Hope this helps!

ADD COMMENTlink written 10 days ago by fawazfebin40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 967 users visited in the last hour