Question: How to extract the list of genes from TCGA CNV data
0
gravatar for aouichechaimaa
27 days ago by
aouichechaimaa100 wrote:

Hi guys, I have got a CNV data from TCGA as shown below and my goal to extract the list of the genes by using GISTIC Could anyone plz told me how can i deal with this data by using GISTIC and how to apply GISTIC.

  • TCGA-2A-A8VL-10A-01D-A379-01 1 51598 1500664 226 0.1646
  • TCGA-2A-A8VL-10A-01D-A379-01 1 1617778 1653196 12 -0.4115
  • TCGA-2A-A8VL-10A-01D-A379-01 1 1653256 15362197 7748 0.0056
  • TCGA-2A-A8VL-10A-01D-A379-01 1 15362212 15362449 6 0.7626

https://postimg.cc/image/g4kusd56j/

cnv genes tcga • 391 views
ADD COMMENTlink modified 27 days ago by Kevin Blighe19k • written 27 days ago by aouichechaimaa100
3
gravatar for Kevin Blighe
27 days ago by
Kevin Blighe19k
University College London Cancer Institute
Kevin Blighe19k wrote:

I'm not sure that you need GISTIC to do this. You just need GenomicRanges and Ensembl's biomaRt:

Step 1

Create a reference dataset of all genes and save it as a GenomicRanges object

library(biomaRt)
library(GenomicRanges)

#Set up an gene annotation template to use
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
genes <- getBM(attributes=c("hgnc_symbol","chromosome_name","start_position","end_position"), mart=mart)
genes <- genes[genes[,1]!="" & genes[,2] %in% c(1:22,"X","Y"),]
xidx <- which(genes[,2]=="X")
yidx <- which(genes[,2]=="Y")
genes[xidx, 2] <- 23
genes[yidx, 2] <- 24
genes[,2] <- sapply(genes[,2],as.integer)
genes <- genes[order(genes[,3]),]
genes <- genes[order(genes[,2]),]
colnames(genes) <- c("GeneSymbol","Chr","Start","End")
genes_GR <- makeGRangesFromDataFrame(genes,keep.extra.columns = TRUE)

Step 2

Store your own data as a GenomicRanges object:

colnames(df) <- c("Barcode", "chr", "start", "end", "extra1", "extra2")
df
                       Barcode chr    start      end extra1  extra2
1 TCGA-2A-A8VL-10A-01D-A379-01   1    51598  1500664    226  0.1646
2 TCGA-2A-A8VL-10A-01D-A379-01   1  1617778  1653196     12 -0.4115
3 TCGA-2A-A8VL-10A-01D-A379-01   1  1653256 15362197   7748  0.0056
4 TCGA-2A-A8VL-10A-01D-A379-01   1 15362212 15362449      6  0.7626

df_GR <- makeGRangesFromDataFrame(df, keep.extra.columns = TRUE)
df_GR
GRanges object with 4 ranges and 3 metadata columns:
      seqnames               ranges strand |                      Barcode
         <Rle>            <IRanges>  <Rle> |                     <factor>
  [1]        1 [   51598,  1500664]      * | TCGA-2A-A8VL-10A-01D-A379-01
  [2]        1 [ 1617778,  1653196]      * | TCGA-2A-A8VL-10A-01D-A379-01
  [3]        1 [ 1653256, 15362197]      * | TCGA-2A-A8VL-10A-01D-A379-01
  [4]        1 [15362212, 15362449]      * | TCGA-2A-A8VL-10A-01D-A379-01
         extra1    extra2
      <integer> <numeric>
  [1]       226    0.1646
  [2]        12   -0.4115
  [3]      7748    0.0056
  [4]         6    0.7626
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Step 3

Overlap your regions with the reference dataset that you created

hits <- findOverlaps(genes_GR, df_GR, type="within")
df_ann <- cbind(df[subjectHits(hits),],genes[queryHits(hits),])
head(df_ann)
                         Barcode chr start     end extra1 extra2 GeneSymbol Chr
1   TCGA-2A-A8VL-10A-01D-A379-01   1 51598 1500664    226 0.1646     OR4G4P   1
1.1 TCGA-2A-A8VL-10A-01D-A379-01   1 51598 1500664    226 0.1646    OR4G11P   1
1.2 TCGA-2A-A8VL-10A-01D-A379-01   1 51598 1500664    226 0.1646      OR4F5   1
     Start    End
1    52473  54936
1.1  62948  63887
1.2  69091  70008

Step 4

To make filtering easier, we can further manipulate the data to show the precise co-ordinates of each gene and the region co-ordinates in which it was found to have CNA / CNV:

AberrantRegion <- paste0(df_ann[,1],":", df_ann[,3],"-", df_ann[,4])
GeneRegion <- paste0(df_ann[,7],":", df_ann[,8],"-", df_ann[,9])
Final_regions <- cbind(df_ann[,c(6,2,5)], AberrantRegion, GeneRegion)
Final_regions[seq(1, nrow(Final_regions), 20),]
      extra2 chr extra1                                AberrantRegion
1     0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
1.20  0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
1.40  0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
1.60  0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
3.18  0.0056   1   7748 TCGA-2A-A8VL-10A-01D-A379-01:1653256-15362197
3.38  0.0056   1   7748 TCGA-2A-A8VL-10A-01D-A379-01:1653256-15362197
3.58  0.0056   1   7748 TCGA-2A-A8VL-10A-01D-A379-01:1653256-15362197
3.78  0.0056   1   7748 TCGA-2A-A8VL-10A-01D-A379-01:1653256-15362197
3.98  0.0056   1   7748 TCGA-2A-A8VL-10A-01D-A379-01:1653256-15362197
3.118 0.0056   1   7748 TCGA-2A-A8VL-10A-01D-A379-01:1653256-15362197
3.138 0.0056   1   7748 TCGA-2A-A8VL-10A-01D-A379-01:1653256-15362197
3.158 0.0056   1   7748 TCGA-2A-A8VL-10A-01D-A379-01:1653256-15362197
               GeneRegion
1          OR4G4P:1-52473
1.20    TUBB8P11:1-808672
1.40    FAM132A:1-1177826
1.60    TMEM240:1-1470554
3.18      MMEL1:1-2522078
3.38      AJAP1:1-4714792
3.58     TAS1R1:1-6615241
3.78    RPL7P11:1-8810489
3.98     CLSTN1:1-9789084
3.118    PEX14:1-10532345
3.138   FBXO44:1-11714432
3.158 TNFRSF1B:1-12227060
ADD COMMENTlink modified 27 days ago • written 27 days ago by Kevin Blighe19k

@Kevin Blighe Thank you too much for your reply but I'm not familiar with the language of your source code and I'm sure that I have to use GISTIC as mentioned in many papers that i have already read in order to extract a more reliable list of genes and their loci

ADD REPLYlink written 27 days ago by aouichechaimaa100

if anyone familiar with GISTIC plz give me some guidance

ADD REPLYlink written 27 days ago by aouichechaimaa100
1

Please explain the exact source from where you downloaded your data. There are various types of copy number data in various states of processing available from different web-sites. The data is usually available as somatic copy number alteration data (SCNA), not copy number variation (CNV) - CNVs are more spoken of in terms of the germline and 'natural' variation in copy number. GISTIC is just one of a few programs that are commonly used to analyse SCNA data for the purposes of 'summarising' the regions by grouping them into large regions more amenable for downstream analysis and interpretation.

Have you taken a look at the documentatation to see what exactly you require to execute the program? - ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm

The code that I put above is code using base R functions (with the exception of biomaRt) that can be used to identify the genes overlapping any type of segment data (chr : startbp : endbp).

Thank you.

ADD REPLYlink written 26 days ago by Kevin Blighe19k
1

To give you an idea, take a look my most recent publication where a main part was to analyse the somatic CNA ata that is available. I used a mixture of programs to do it, including GISTIC2.0: Racial differences in endometrial cancer molecular portraits in The Cancer Genome Atlas.

Look at the section headed 'Somatic mutation and copy number aberrations' in the mathods

ADD REPLYlink written 26 days ago by Kevin Blighe19k

@Kevin Blighe Thanks a lot for your interest and quick reply.

ADD REPLYlink written 26 days ago by aouichechaimaa100

@Kevin Blighe, I checked your paper and thread through the section" Somatic mutation and copy number aberrations' in the methods"

I got The CNV data level3 from this website:http://gdac.broadinstitute.org and specifically from this link http://firebrowse.org/?cohort=PRAD&download_dialog=true.

I'm interested in getting the list of the significant genes from this data by using GISTIC.

As i know from many papers Gistic2.0 can be used to analyze the copy number dataset (Level 3) for the identification of recurrent regions of copy number alteration and the copy number of genes, and this my goal too.

So the data i got have no gene names.

Additionally, i found that the link from where i got my data have GISTIC results http://firebrowse.org/?cohort=PRAD&download_dialog=true# but i really don't know which file i will download and how to deal with them.

Finally, my main goal is to get a matrix of m*n in which m represent the list of patients(Samples) and n represent the list of genes from this dataset.

@Kevin Blighe Plz, i hope you can help me out Bro to do this job, and i really really appreciate your help!

ADD REPLYlink modified 26 days ago • written 26 days ago by aouichechaimaa100
1

No problem bro.

Yes, recurrent somatic copy number alterations (recSCNA) are the target (for analysis), and the genes that overlap these. I am almost certain that I also started with the data from Broad Firebrowse, but I will double check when I arrive home.

I will go through the analysis steps one-by-one for you.

ADD REPLYlink written 26 days ago by Kevin Blighe19k
1

Hello, so, here are the steps that I did, for endometrial cancer:

First downloaded copy number data from here: http://firebrowse.org/?cohort=UCEC# (access the data by clicking on the green bar for SNP6 CopyNum). Specifically, the file that you should obtain is for hg19 and will have 'scna' an 'minus_germline_cnv' as parts of its filename. In my case, the file was called UCEC.Level_3_segmented_scna_minus_germline_cnv_hg19.seg.txt

Then, there are a series of steps that you should do in order to process this data:

1

separate out the tumour from nomals

require(data.table)
CN <- read.table("PRAD.snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg19__seg.seg.txt", sep="\t", stringsAsFactors=FALSE, header=TRUE)

#   Tumor types range from 01 - 09
#   Normal types from 10 - 19
#   Control samples from 20 - 29
types <- gsub("TCGA-[A-Z0-9]*-[A-Z0-9]*-", "", (gsub("-[0-9A-Z]*-[0-9A-Z]*-01$", "", CN[,1])))
unique(types)
matched.normal <- c("10A", "10B", "11A", "11B")
tumor <- c("01A", "01B", "06A")

#Divide up the data into tumor and normal
tumor.indices <- which(types %in% tumor)
matched.normal.indices <- which(types %in% matched.normal)
CN.tumor <- CN[tumor.indices,]
CN.matched.normal <- CN[matched.normal.indices,]

#Change IDs to allow for matching to metadata
CN.tumor[,1] <- gsub("-[0-9A-Z]*-[0-9A-Z]*-[0-9A-Z]*-01$", "", CN.tumor[,1])
CN.matched.normal[,1] <- gsub("-[0-9A-Z]*-[0-9A-Z]*-[0-9A-Z]*-01$", "", CN.matched.normal[,1])

--------------------

2

Create a dataset of 'normal' CNV that will be used for further filtering out of of normal CNV from our tumour data

require(gaia)

#Retrieve probes meta file from broadinstitute website
gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/"
markersMatrix <- read.delim(paste0(gdac.root,"genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt"), as.is=TRUE, header=FALSE)
colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start")

#Change sex chromosome names
unique(markersMatrix$Chromosome)
markersMatrix[which(markersMatrix$Chromosome=="X"),"Chromosome"] <- 23
markersMatrix[which(markersMatrix$Chromosome=="Y"),"Chromosome"] <- 24
markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome, as.integer)

#Create a marker ID
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))

#Remove duplicates
markersMatrix <- markersMatrix[-which(duplicated(markerID)),]

#Filter markersMatrix for common CNV
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))
commonCNV <- read.delim(paste0(gdac.root,"CNV.hg19.bypos.111213.txt"), as.is=TRUE)
commonCNV[,2] <- sapply(commonCNV[,2], as.integer)
commonCNV[,3] <- sapply(commonCNV[,3], as.integer)
commonID <- apply(commonCNV, 1, function(x) paste0(x[2], ":", x[3]))
table(commonID %in% markerID)
table(markerID %in% commonID)
markersMatrix_fil <- markersMatrix[!markerID %in% commonID,]

#Create the markers object
markers_obj <- load_markers(markersMatrix_fil)

-------------------------------

3

Determine recurrent aberrations in normal and tumour

#All combined
#Prepare CNV matrix
cnvMatrix <- CN.tumor

#Add label (0 for loss, 1 for gain)
#A segment mean of 0.3 is defined as the cut-off
cnvMatrix <- cbind(cnvMatrix, Label=NA)
cnvMatrix[cnvMatrix$Segment_Mean < -0.3,"Label"] <- 0
cnvMatrix[cnvMatrix$Segment_Mean > 0.3,"Label"] <- 1
cnvMatrix <- cnvMatrix[!is.na(cnvMatrix$Label),]

#Remove segment mean as we now go by the binary classification of gain or loss
cnvMatrix <- cnvMatrix[,-6]
colnames(cnvMatrix) <- c("Sample.Name", "Chromosome", "Start", "End", "Num.of.Markers", "Aberration")

#Substitute Chromosomes "X" and "Y" with "23" and "24"
xidx <- which(cnvMatrix$Chromosome=="X")
yidx <- which(cnvMatrix$Chromosome=="Y")
cnvMatrix[xidx,"Chromosome"] <- 23
cnvMatrix[yidx,"Chromosome"] <- 24
cnvMatrix$Chromosome <- sapply(cnvMatrix$Chromosome, as.integer)

#Run GAIA, which looks for recurrent aberrations in your input file
n <- length(unique(cnvMatrix[,1]))
cnv_obj <- load_cnv(cnvMatrix, markers_obj, n)
results.all <- runGAIA(cnv_obj, markers_obj, output_file_name="Tumor.All.txt", aberrations=-1, chromosomes=-1, num_iterations=10, threshold=0.15)

---------------------------

4

Annotation

[Refer to the code that I posted in my original answer]

Kevin

ADD REPLYlink modified 15 days ago • written 26 days ago by Kevin Blighe19k

@ Kevin Blighe Hello Bro, can you help me again, I have installed R and I'm running your code line by line. I arrived in the line

#Create the markers object
markers_obj <- load_markers(markersMatrix_fil) 
Error in load_markers(markersMatrix_fil) : 
  could not find function "load_markers", plz fix this line to continue the other lines plz
ADD REPLYlink modified 19 days ago by genomax47k • written 19 days ago by aouichechaimaa100
2

Hey dude, ensure that the gaia package loaded correctly. load_markers() is part of that package.

ADD REPLYlink written 19 days ago by Kevin Blighe19k

Do you mean the line require (gaia)? I 'm following your code line by line and now i fail to continue...

ADD REPLYlink written 19 days ago by aouichechaimaa100
1

That's indeed what Kevin means.

ADD REPLYlink written 19 days ago by WouterDeCoster28k
1

Yep, bro, ensure that gaia installed correctly (source("https://bioconductor.org/biocLite.R"); biocLite("gaia"))

load_markers
Erro: objeto 'load_markers' não encontrado

Load package and then try again:

require(gaia)
Carregando pacotes exigidos: gaia
Warning message:
package ‘gaia’ was built under R version 3.2.5 


load_markers

function (marker_matrix) 
{
    message("Loading Marker Informations")
    chromosomes <- sort(unique(marker_matrix[, 2]))
    chromosome_marker_list <- list()
    end_position <- FALSE
    if (ncol(marker_matrix) == 4) {
        end_position <- TRUE
    }
    for (i in as.numeric(chromosomes)) {
        chr_ids <- which(marker_matrix[, 2] == i)
        tmp_matrix <- matrix(0, 2, length(chr_ids))
        tmp_matrix[1, ] <- marker_matrix[chr_ids, 3]
        if (end_position) {
            tmp_matrix[2, ] <- marker_matrix[chr_ids, 4]
        }
        else {
            tmp_matrix[2, ] <- tmp_matrix[1, ]
        }
        chromosome_marker_list[[i]] <- tmp_matrix
        names(chromosome_marker_list)[[i]] <- i
        message(".", appendLF = FALSE)
    }
    message("\nDone")
    return(chromosome_marker_list)
}
<environment: namespace:gaia>
ADD REPLYlink modified 19 days ago • written 19 days ago by Kevin Blighe19k

Kevin Blighe Thanks Bro, i have installed all the packages successfully. Now, i have the following 2 issues in these corresponding lines

markers_obj <- load_markers(markersMatrix_fil)

Loading Marker Informations ......................Error in chromosome_marker_list[[i]] <- tmp_matrix :

attempt to select more than one element in integerOneIndex

In addition: Warning message:

In load_markers(markersMatrix_fil) : NAs introduced by coercion

colnames(df) <- c("Barcode", "chr", "start", "end", "extra1", "extra2")

Error in colnames<-(*tmp*, value = c("Barcode", "chr", "start", "end", :

attempt to set 'colnames' on an object with less than two dimensions

ADD REPLYlink modified 19 days ago • written 19 days ago by aouichechaimaa100

Hey Chief, let me take a look later.

ADD REPLYlink written 18 days ago by Kevin Blighe19k

@ Kevin Blighe, sure Bro, take your time and thanks a lot for your reply.

ADD REPLYlink written 18 days ago by aouichechaimaa100
1

Just to be sure, you are running these commands first, starting with the data that you download from Broad FireBrowse:

Then, you run the steps that I originally mentioned:

It is very important that the load_markers function does not give any warnings. A successful execution of this function will produce:

markers_obj <- load_markers(markersMatrix_fil)
Loading Marker Informations
........................
Done

Sometimes, available memory is an issue.

ADD REPLYlink written 18 days ago by Kevin Blighe19k

@Kevin Blighe Yes Bro I'm following your steps one by one and by the correct order.

ADD REPLYlink written 18 days ago by aouichechaimaa100
1

Which OS? How much RAM?

Can you definitely 100% confirm that each of these steps has completed successfully:

require(gaia)

#Retrieve probes meta file from broadinstitute website
gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/"
markersMatrix <- read.delim(paste0(gdac.root,"genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt"), as.is=TRUE, header=FALSE)
colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start")

#Change sex chromosome names
unique(markersMatrix$Chromosome)
markersMatrix[which(markersMatrix$Chromosome=="X"),"Chromosome"] <- 23
markersMatrix[which(markersMatrix$Chromosome=="Y"),"Chromosome"] <- 24
markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome, as.integer)

#Create a marker ID
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))

#Remove duplicates
markersMatrix <- markersMatrix[-which(duplicated(markerID)),]

#Filter markersMatrix for common CNV
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))
commonCNV <- read.delim(paste0(gdac.root,"CNV.hg19.bypos.111213.txt"), as.is=TRUE)
commonCNV[,2] <- sapply(commonCNV[,2], as.integer)
commonCNV[,3] <- sapply(commonCNV[,3], as.integer)
commonID <- apply(commonCNV, 1, function(x) paste0(x[2], ":", x[3]))
table(commonID %in% markerID)
table(markerID %in% commonID)
markersMatrix_fil <- markersMatrix[!markerID %in% commonID,]

#Create the markers object
markers_obj <- load_markers(markersMatrix_fil)
ADD REPLYlink written 18 days ago by Kevin Blighe19k

@ Kevin Blighe I do every step and run it carefully, i can see the output variables in the Global environment of R right corner.

Can i share my data with you and you can test your code on it?

My OS is windows10 and RAM is 12GB and my data size is 6.87 MB.

ADD REPLYlink modified 18 days ago • written 18 days ago by aouichechaimaa100

@ Kevin Blighe Bro, i have fixed the first error. Now, these 2 parts are not working

    cnv_obj <- load_cnv(cnvMatrix, markers_obj, n)
Loading Copy Number Data
Error in matrix(0, length(samples), ncol(markers_list[[chromosomes[i]]])) : 
  non-numeric matrix extent



results.all <- runGAIA(cnv_obj, markers_obj, output_file_name="Tumor.All.txt", aberrations=-1, chromosomes=-1, num_iterations=10, threshold=0.15)

Performing Data Preprocessing
Error in runGAIA(cnv_obj, markers_obj, output_file_name = "Tumor.All.txt",  : 
  object 'cnv_obj' not found

    colnames(df) <- c("Barcode", "chr", "start", "end", "extra1", "extra2")
Error in `colnames<-`(`*tmp*`, value = c("Barcode", "chr", "start", "end",  : 
  attempt to set 'colnames' on an object with less than two dimensions
ADD REPLYlink written 18 days ago by aouichechaimaa100
1

There must be something wrong with your cnvMatrix. In my example, I us the endometrial cancer (UCEC) data. Which cancer's data do you have? Just check the output of each command in Steps 1 and 3 in order to ensure that it works.

If you want to tell me the file that you downloaded, then I can also take a look here.

Peace bro.

ADD REPLYlink written 18 days ago by Kevin Blighe19k

@ Kevin Blighe ok the cancer Data which i'm working on is the prostate adenocarcinoma (PRAD).

ADD REPLYlink written 18 days ago by aouichechaimaa100

@ Kevin Blighe, Hi Bro i know that i disturb you a lot I apologize for that and i really really appreciate your help! Did y check your code with my data, i really need to do this job so pls give help me out.

ADD REPLYlink written 17 days ago by aouichechaimaa100
1

Hey chief - no problem. I went here: http://firebrowse.org/?cohort=PRAD&download_dialog=true

I then downloaded the file: 'genome_wide_snp_6-FFPE-segmented_scna_minus_germline_cnv_hg19' and then used that as the data input.

I then followed all of my code and it worked fine. The only thing that I notice is that there is no copy number data for normal samples in the PRAD dataset, but this is not important because the germline copy number is already subtracted by Broad Firebrowse.

cnv_obj <- load_cnv(cnvMatrix, markers_obj, n)
Loading Copy Number Data
................................................
Done

Then:

results.all <- runGAIA(cnv_obj, markers_obj, output_file_name="Tumor.All.txt", aberrations=-1, chromosomes=-1, num_iterations=10, threshold=0.15)

Performing Data Preprocessing

Done

Computing Discontinuity Matrix
........................
Done
Computing Probability Distribution
................................................................................................
Done
Assessing the Significance of Observed Data
................................................
Done
Writing Tumor.All.txt.igv.gistic File for Integrative Genomics Viewer (IGV) Tool
................................................
Done
Running Homogeneous peel-off Algorithm With Significance Threshold of 0.15 and Homogenous Threshold of 0.12
................................................
Done

Writing Output File 'Tumor.All.txt' Containing the Significant Regions

File 'Tumor.All.txt' Saved

The problem that you have may be an issue with a new version of one of the packages. Dude, to help, I have put the output files and my R session online for you. You can download them (3 files) here: https://app.box.com/s/992llscqeabyw3rzrjg5stxd6945r73r (you may have to open an account).

The Tumor.All.txt file contains the significant recurrent somatic copy number alterations (recSCNA). You will want to input these back into R and then annotate them using the first steps that I mention in this thread. You will have to rename the columns to have at least "chr", "start", "end"

ADD REPLYlink written 17 days ago by Kevin Blighe19k

@Kevin Blighe well done Bro!

But when i only downloaded the FFPE file like you and so my question is why do you use FFPE and not "genome_wide_snp_6-segmented_scna_minus_germline_cnv_hg19"? Can you plz answer my last question and i really really appreciated your help.

BEST WISHES.

ADD REPLYlink modified 15 days ago • written 15 days ago by aouichechaimaa100
1

Hey bro,

I just chose a file 'at random'. I was only testing. You should use the file more suitable to your experiment.

FFPE tissue is, of course, lower quality.

ADD REPLYlink written 15 days ago by Kevin Blighe19k
1

Wait. Allow me to re-run that using the non-FFPE

ADD REPLYlink modified 15 days ago • written 15 days ago by Kevin Blighe19k

OK Bro i can wait .

ADD REPLYlink written 15 days ago by aouichechaimaa100
1

Here you go, chief (same files but for PRAD.snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg19__seg.seg.txt): https://app.box.com/s/992llscqeabyw3rzrjg5stxd6945r73r

Dude, the PRAD dataset is very big, so, the Rdata file is ~140 megabytes

ADD REPLYlink written 15 days ago by Kevin Blighe19k

@Kevin Blighe i'm very thankful and happy for the great help you gave me!

Plz can you also do the annotation Step for me I tried in my computer but it show me this error

    df_ann <- cbind(df[subjectHits(hits),],genes[queryHits(hits),])
Error: cannot allocate vector of size 124.9 Mb
ADD REPLYlink written 13 days ago by aouichechaimaa100

I have a size issue actually.

ADD REPLYlink written 13 days ago by aouichechaimaa100
1

Getting it to you now bro - hang on.

ADD REPLYlink written 12 days ago by Kevin Blighe19k
1

Bro, I have uploaded the new Rdata file, and also a list of the final annotated regions. In the 'type' column, the following is true:

  • deletion=0
  • amplification=1

https://app.box.com/s/992llscqeabyw3rzrjg5stxd6945r73r

And that is it: recurrent somatic copy number alterations (recSCNA) in the TCGA PRAD dataset. If you want to cite how the data was processed, then just cite my recent publication: https://www.ncbi.nlm.nih.gov/pubmed/29682207

ADD REPLYlink written 12 days ago by Kevin Blighe19k

@ Kevin Blighe ,Thanks a lot Bro. Yes definitely i will cite your recent publication in my manuscript and Plz if you have any other publications share with me via e-mail or link. Your topic is of great interest for me . go ahead......

ADD REPLYlink written 12 days ago by aouichechaimaa100
1

No problem bro. I have not published too much, but I have worked a lot in private (outside academia). Stay in touch chief.

ADD REPLYlink written 11 days ago by Kevin Blighe19k
1

Yo Bro, please use the formatting bar (especially the code option) to present your post better. It has been done for you this time.
code_formatting

ADD REPLYlink modified 19 days ago • written 19 days ago by WouterDeCoster28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 870 users visited in the last hour