Question: deleting TCGA replicated samples
0
gravatar for .
8 months ago by
.0
.0 wrote:

hi everyone, im working on TCGA data. i want to have unique samples, but there are replicates in my samples and i dont know how to do this. i dont know whether getting median for the replicated samples is appropriate or not(because for solid tumors the 2 samples i try to get median for, might have completely different spatial heterogeneity).

rna-seq tcga aggregate • 302 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by .0

thanks for answering. but because i want to integrate my data with protein data, i have to use a part of the TCGA barcode(the third part that is for "participant") e.g: TCGA-02-0001-01C-01D-0182-01: in this barcode 0001 is for participant that i should get.

ADD REPLYlink written 8 months ago by .0

Specifically what data are you working on? Where do you get the data from? Could you post an example of a duplicated sample id?

ADD REPLYlink written 8 months ago by kristoffer.vittingseerup3.4k
0
gravatar for MatthewP
8 months ago by
MatthewP740
China
MatthewP740 wrote:

Don't get median value. You need to select one of them. Maybe you need to get full tcga barcode or more other information like is_ffpe or not to help you select only sample.

This page explains TCGA barcode. You need to download other relative files like _MANIFEST.txt_, _metadata file_ where you can get more information about your sample/data.
Example full barcode from metadata

  "associated_entities": [
    {
      "entity_id": "90e6e8a1-98b3-4f38-92ef-df460d78d657", 
      "case_id": "ada19f65-5256-4c79-b3b9-7b9da69be437", 
      "entity_submitter_id": "TCGA-E7-A97Q-01A-11R-A38B-07", 
      "entity_type": "aliquot"
    }
  ],
ADD COMMENTlink written 8 months ago by MatthewP740
0
gravatar for .
8 months ago by
.0
.0 wrote:

hi. i got the mRNA data from TCGA by R code(the FPKM data), and the protein data from TCPA; and an example of my duplicated data is like below: TCGA-HZ-A9TJ-01A-11R-A41I-07 TCGA-HZ-A9TJ-06A-11R-A41B-07

TCGA-H6-A45N-01A-11R-A26U-07 TCGA-H6-A45N-11A-12R-A26U-07

the R code that i got data with is below: library(TCGAbiolinks) library(dplyr) library(DT) library(SummarizedExperiment)

1

query1 <- GDCquery(project = "TCGA-PAAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - Counts")

workflow.type = "HTSeq - FPKM-UQ"

df <- GDCprepare(query1, save=TRUE, save.filename = "TCGA-PAAD_dataframe.rda", summarizedExperiment = FALSE)
write.csv(df, file = "count.csv")

2

query <- GDCquery(project = "TCGA-PAAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - Counts")

Download a list of barcodes with platform IlluminaHiSeq_RNASeqV2

GDCdownload(query)

Prepare expression matrix with geneID in the rows and samples (barcode) in the columns

rsem.genes.results as values

PAADRnaseqSE <- GDCprepare(query)

PAADMatrix <- assay(PAADRnaseqSE,"HTSeq - Counts") # or PAADMatrix <- assay(PAADRnaseqSE,"raw_count")

For gene expression if you need to see a boxplot correlation and AAIC plot to define outliers you can run

PAADRnaseq_CorOutliers <- TCGAanalyze_Preprocessing(PAADRnaseqSE)

quantile filter of genes

dataFilt <- TCGAanalyze_Filtering(tabDF = PAADRnaseq_CorOutliers, method = "quantile", qnt.cut = 0.25)

selection of normal samples "NT"

samplesNT <- TCGAquery_SampleTypes(barcode = colnames(dataFilt), typesample = c("NT"))

selection of tumor samples "TP"

samplesTP <- TCGAquery_SampleTypes(barcode = colnames(dataFilt), typesample = c("TP"))

Diff.expr.analysis (DEA)

dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,samplesNT], mat2 = dataFilt[,samplesTP], Cond1type = "Normal", Cond2type = "Tumor", fdr.cut = 0.01 , logFC.cut = 1, method = "glmLRT")

DEGs table with expression values in normal and tumor samples

dataDEGsFiltLevel <- TCGAanalyze_LevelTab(dataDEGs,"Tumor","Normal", dataFilt[,samplesTP],dataFilt[,samplesNT]) write.csv(dataDEGsFiltLevel, file = "DEGs.csv")

<h6>#########################################</h6>
ADD COMMENTlink written 8 months ago by .0

Please update your question with code instead of supplying it as a answer. Also use the propper formatting of code instead of pasting it to ensure readability.

ADD REPLYlink written 8 months ago by kristoffer.vittingseerup3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1154 users visited in the last hour