Question: deleting TCGA replicated samples
gravatar for .
7 weeks ago by
.0 wrote:

hi everyone, im working on TCGA data. i want to have unique samples, but there are replicates in my samples and i dont know how to do this. i dont know whether getting median for the replicated samples is appropriate or not(because for solid tumors the 2 samples i try to get median for, might have completely different spatial heterogeneity).

rna-seq tcga aggregate • 123 views
ADD COMMENTlink modified 6 weeks ago • written 7 weeks ago by .0

thanks for answering. but because i want to integrate my data with protein data, i have to use a part of the TCGA barcode(the third part that is for "participant") e.g: TCGA-02-0001-01C-01D-0182-01: in this barcode 0001 is for participant that i should get.

ADD REPLYlink written 7 weeks ago by .0

Specifically what data are you working on? Where do you get the data from? Could you post an example of a duplicated sample id?

ADD REPLYlink written 7 weeks ago by kristoffer.vittingseerup3.0k
gravatar for MatthewP
7 weeks ago by
MatthewP390 wrote:

Don't get median value. You need to select one of them. Maybe you need to get full tcga barcode or more other information like is_ffpe or not to help you select only sample.

This page explains TCGA barcode. You need to download other relative files like _MANIFEST.txt_, _metadata file_ where you can get more information about your sample/data.
Example full barcode from metadata

  "associated_entities": [
      "entity_id": "90e6e8a1-98b3-4f38-92ef-df460d78d657", 
      "case_id": "ada19f65-5256-4c79-b3b9-7b9da69be437", 
      "entity_submitter_id": "TCGA-E7-A97Q-01A-11R-A38B-07", 
      "entity_type": "aliquot"
ADD COMMENTlink written 7 weeks ago by MatthewP390
gravatar for .
6 weeks ago by
.0 wrote:

hi. i got the mRNA data from TCGA by R code(the FPKM data), and the protein data from TCPA; and an example of my duplicated data is like below: TCGA-HZ-A9TJ-01A-11R-A41I-07 TCGA-HZ-A9TJ-06A-11R-A41B-07

TCGA-H6-A45N-01A-11R-A26U-07 TCGA-H6-A45N-11A-12R-A26U-07

the R code that i got data with is below: library(TCGAbiolinks) library(dplyr) library(DT) library(SummarizedExperiment)


query1 <- GDCquery(project = "TCGA-PAAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - Counts")

workflow.type = "HTSeq - FPKM-UQ"

df <- GDCprepare(query1, save=TRUE, save.filename = "TCGA-PAAD_dataframe.rda", summarizedExperiment = FALSE)
write.csv(df, file = "count.csv")


query <- GDCquery(project = "TCGA-PAAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - Counts")

Download a list of barcodes with platform IlluminaHiSeq_RNASeqV2


Prepare expression matrix with geneID in the rows and samples (barcode) in the columns

rsem.genes.results as values

PAADRnaseqSE <- GDCprepare(query)

PAADMatrix <- assay(PAADRnaseqSE,"HTSeq - Counts") # or PAADMatrix <- assay(PAADRnaseqSE,"raw_count")

For gene expression if you need to see a boxplot correlation and AAIC plot to define outliers you can run

PAADRnaseq_CorOutliers <- TCGAanalyze_Preprocessing(PAADRnaseqSE)

quantile filter of genes

dataFilt <- TCGAanalyze_Filtering(tabDF = PAADRnaseq_CorOutliers, method = "quantile", qnt.cut = 0.25)

selection of normal samples "NT"

samplesNT <- TCGAquery_SampleTypes(barcode = colnames(dataFilt), typesample = c("NT"))

selection of tumor samples "TP"

samplesTP <- TCGAquery_SampleTypes(barcode = colnames(dataFilt), typesample = c("TP"))

Diff.expr.analysis (DEA)

dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,samplesNT], mat2 = dataFilt[,samplesTP], Cond1type = "Normal", Cond2type = "Tumor", fdr.cut = 0.01 , logFC.cut = 1, method = "glmLRT")

DEGs table with expression values in normal and tumor samples

dataDEGsFiltLevel <- TCGAanalyze_LevelTab(dataDEGs,"Tumor","Normal", dataFilt[,samplesTP],dataFilt[,samplesNT]) write.csv(dataDEGsFiltLevel, file = "DEGs.csv")

ADD COMMENTlink written 6 weeks ago by .0

Please update your question with code instead of supplying it as a answer. Also use the propper formatting of code instead of pasting it to ensure readability.

ADD REPLYlink written 6 weeks ago by kristoffer.vittingseerup3.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 811 users visited in the last hour