Question: How to extract the list of genes from TCGA CNV data
0
gravatar for Chaimaa
9 months ago by
Chaimaa150
Chaimaa150 wrote:

Hi guys, I have got a CNV data from TCGA as shown below and my goal to extract the list of the genes by using GISTIC Could anyone plz told me how can i deal with this data by using GISTIC and how to apply GISTIC.

  • TCGA-2A-A8VL-10A-01D-A379-01 1 51598 1500664 226 0.1646
  • TCGA-2A-A8VL-10A-01D-A379-01 1 1617778 1653196 12 -0.4115
  • TCGA-2A-A8VL-10A-01D-A379-01 1 1653256 15362197 7748 0.0056
  • TCGA-2A-A8VL-10A-01D-A379-01 1 15362212 15362449 6 0.7626

https://postimg.cc/image/g4kusd56j/

cnv genes tcga • 2.4k views
ADD COMMENTlink modified 9 months ago by Kevin Blighe37k • written 9 months ago by Chaimaa150
9
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe37k
Republic of Ireland
Kevin Blighe37k wrote:

Update October 14, 2018:

This thread has become a sort of focal point, so, I wanted to make it absolutely clear the process to follow. If your goal is to obtain somatic copy number alterations (sCNA) for a group of TCGA patients and/or identify recurrent sCNA in these patients, then follow these steps:

  • Part I - download pre-computed GISTIC 2.0 sCNA data for any TCGA cohort from Broad Institute's Firebrowse server and identify recurrent sCNA regions in these with GAIA
  • Part II - plot recurrent sCNA gains and losses from GAIA
  • Part III - annotate the recurrent sCNA regions (this post, just below)
  • Part IV - generate heatmap of recurrent sCNA regions over your cohort

Partial credits: TCGAbiolinks

----------------------

--------------------

.

Part III

A

Create a reference dataset of all genes and save it as a GenomicRanges object

library(biomaRt)
library(GenomicRanges)

#Set up an gene annotation template to use
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
genes <- getBM(attributes=c("hgnc_symbol","chromosome_name","start_position","end_position"), mart=mart)
genes <- genes[genes[,1]!="" & genes[,2] %in% c(1:22,"X","Y"),]
xidx <- which(genes[,2]=="X")
yidx <- which(genes[,2]=="Y")
genes[xidx, 2] <- 23
genes[yidx, 2] <- 24
genes[,2] <- sapply(genes[,2],as.integer)
genes <- genes[order(genes[,3]),]
genes <- genes[order(genes[,2]),]
colnames(genes) <- c("GeneSymbol","Chr","Start","End")
genes_GR <- makeGRangesFromDataFrame(genes,keep.extra.columns = TRUE)

B

Store your own data as a GenomicRanges object:

NB - if following from Part II, the df used here is the RecCNV object, produced in Part II

colnames(df) <- c("Barcode", "chr", "start", "end", "extra1", "extra2")
df
                       Barcode chr    start      end extra1  extra2
1 TCGA-2A-A8VL-10A-01D-A379-01   1    51598  1500664    226  0.1646
2 TCGA-2A-A8VL-10A-01D-A379-01   1  1617778  1653196     12 -0.4115
3 TCGA-2A-A8VL-10A-01D-A379-01   1  1653256 15362197   7748  0.0056

df_GR <- makeGRangesFromDataFrame(df, keep.extra.columns = TRUE)
df_GR
GRanges object with 4 ranges and 3 metadata columns:
      seqnames               ranges strand |                      Barcode
         <Rle>            <IRanges>  <Rle> |                     <factor>
  [1]        1 [   51598,  1500664]      * | TCGA-2A-A8VL-10A-01D-A379-01
  [2]        1 [ 1617778,  1653196]      * | TCGA-2A-A8VL-10A-01D-A379-01
  [3]        1 [ 1653256, 15362197]      * | TCGA-2A-A8VL-10A-01D-A379-01
         extra1    extra2
      <integer> <numeric>
  [1]       226    0.1646
  [2]        12   -0.4115
  [3]      7748    0.0056
  -------

C

Overlap your regions with the reference dataset that you created

hits <- findOverlaps(genes_GR, df_GR, type="within")
df_ann <- cbind(df[subjectHits(hits),],genes[queryHits(hits),])
head(df_ann)
                         Barcode chr start     end extra1 extra2 GeneSymbol Chr
1   TCGA-2A-A8VL-10A-01D-A379-01   1 51598 1500664    226 0.1646     OR4G4P   1
1.1 TCGA-2A-A8VL-10A-01D-A379-01   1 51598 1500664    226 0.1646    OR4G11P   1
1.2 TCGA-2A-A8VL-10A-01D-A379-01   1 51598 1500664    226 0.1646      OR4F5   1
     Start    End
1    52473  54936
1.1  62948  63887
1.2  69091  70008

D

To make filtering easier, we can further manipulate the data to show the precise co-ordinates of each gene and the region co-ordinates in which it was found to have CNA / CNV:

AberrantRegion <- paste0(df_ann[,1],":", df_ann[,3],"-", df_ann[,4])
GeneRegion <- paste0(df_ann[,7],":", df_ann[,8],"-", df_ann[,9])
Final_regions <- cbind(df_ann[,c(6,2,5)], AberrantRegion, GeneRegion)
Final_regions[seq(1, nrow(Final_regions), 20),]
      extra2 chr extra1                                AberrantRegion
1     0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
1.20  0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
1.40  0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
1.60  0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
               GeneRegion
1          OR4G4P:1-52473
1.20    TUBB8P11:1-808672
1.40    FAM132A:1-1177826
1.60    TMEM240:1-1470554
ADD COMMENTlink modified 5 weeks ago • written 9 months ago by Kevin Blighe37k

@Kevin Blighe Thank you too much for your reply but I'm not familiar with the language of your source code and I'm sure that I have to use GISTIC as mentioned in many papers that i have already read in order to extract a more reliable list of genes and their loci

ADD REPLYlink written 9 months ago by Chaimaa150

if anyone familiar with GISTIC plz give me some guidance

ADD REPLYlink written 9 months ago by Chaimaa150
1

Please explain the exact source from where you downloaded your data. There are various types of copy number data in various states of processing available from different web-sites. The data is usually available as somatic copy number alteration data (SCNA), not copy number variation (CNV) - CNVs are more spoken of in terms of the germline and 'natural' variation in copy number. GISTIC is just one of a few programs that are commonly used to analyse SCNA data for the purposes of 'summarising' the regions by grouping them into large regions more amenable for downstream analysis and interpretation.

Have you taken a look at the documentatation to see what exactly you require to execute the program? - ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm

The code that I put above is code using base R functions (with the exception of biomaRt) that can be used to identify the genes overlapping any type of segment data (chr : startbp : endbp).

Thank you.

ADD REPLYlink written 9 months ago by Kevin Blighe37k
1

To give you an idea, take a look my most recent publication where a main part was to analyse the somatic CNA ata that is available. I used a mixture of programs to do it, including GISTIC2.0: Racial differences in endometrial cancer molecular portraits in The Cancer Genome Atlas.

Look at the section headed 'Somatic mutation and copy number aberrations' in the mathods

ADD REPLYlink written 9 months ago by Kevin Blighe37k

@Kevin Blighe Thanks a lot for your interest and quick reply.

ADD REPLYlink written 9 months ago by Chaimaa150

@Kevin Blighe, I checked your paper and thread through the section" Somatic mutation and copy number aberrations' in the methods"

I got The CNV data level3 from this website:http://gdac.broadinstitute.org and specifically from this link http://firebrowse.org/?cohort=PRAD&download_dialog=true.

I'm interested in getting the list of the significant genes from this data by using GISTIC.

As i know from many papers Gistic2.0 can be used to analyze the copy number dataset (Level 3) for the identification of recurrent regions of copy number alteration and the copy number of genes, and this my goal too.

So the data i got have no gene names.

Additionally, i found that the link from where i got my data have GISTIC results http://firebrowse.org/?cohort=PRAD&download_dialog=true# but i really don't know which file i will download and how to deal with them.

Finally, my main goal is to get a matrix of m*n in which m represent the list of patients(Samples) and n represent the list of genes from this dataset.

@Kevin Blighe Plz, i hope you can help me out Bro to do this job, and i really really appreciate your help!

ADD REPLYlink modified 9 months ago • written 9 months ago by Chaimaa150
1

No problem.

Yes, recurrent somatic copy number alterations (recSCNA) are the target (for analysis), and the genes that overlap these. I am almost certain that I also started with the data from Broad Firebrowse, but I will double check when I arrive home.

I will go through the analysis steps one-by-one for you.

ADD REPLYlink modified 3 months ago • written 9 months ago by Kevin Blighe37k

@ Kevin Blighe how to save these final regions into a file?

ADD REPLYlink written 4 months ago by Chaimaa150
1

Hey, you can do something like:

write.table(Final_regions, "FinalRegions.csv", sep=",", quote=FALSE, row.names=FALSE)
ADD REPLYlink written 4 months ago by Kevin Blighe37k

So in this case is the df used in part A-D have only Tumor copy number data or normal as well? Thank you very much for your many great posts!

ADD REPLYlink written 5 weeks ago by jschombe0

Hey, you're welcome. This post (above) is purely about annotating any type of segment data (requires minimum of chr start end). In the context of the related posts, it is recurrent somatic copy number alteration (recurrent sCNA) segments that have had the normal subtracted out. The pipeline is a mess but goes in this order:

ADD REPLYlink written 5 weeks ago by Kevin Blighe37k

I have completed the first part of the pipeline and produced tumor.all.igv.gistic. However this file does not contain individual data rather it gives me a measure of the significance of the copy number alteration for regions of the genome. To produce a FinalRegions.csv file that matches yours should I be using the cn.tumor dataframe (as shown in Part I) as the dataframe I submit in the code "df_GR <- makeGRangesFromDataFrame(df, keep.extra.columns = TRUE)"? Could you also tell me how you restructured your CNV data from the structure of df_ann or GenomicRegions to a matrix that can be read into the ComplexHeatmap package which requires barcode as columns and genomic regions as rows? Thanks very much for your help with this.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by jschombe0

I see. The df in the code above is the same object as RecCNV from Part II. Do you have that object? I will update the code above to reflect this.

The files that are output by the runGAIA function are the ones that I use in Part IV to produce the heatmaps, but I have not put the code on Biostars (it is complex...).

ADD REPLYlink written 5 weeks ago by Kevin Blighe37k

I understand. I will save RecCNV to file and then run the code producing FinalRegions. Will the data structure of FinalRegions be barcodes as columns and genes as rows? If not, do you recommend using the reshape package to produce such a matrix? Thanks very much for your help with this.

ADD REPLYlink written 5 weeks ago by jschombe0

In my project code, the output is like this:

GeneSymbol  Aberration  q-value AberrantRegion  GeneRegion
RNU6-904P   Del 0.0280807421052632  2:141915235-142013612   2:141924826-141924926
PRR14   Amp 0.0684173368421053  16:30618633-30986671    16:30662038-30667761
FBRS    Amp 0.0684173368421053  16:30618633-30986671    16:30669752-30682135
RNU6-416P   Amp 0.0684173368421053  16:30618633-30986671    16:30686621-30686723
SRCAP   Amp 0.0684173368421053  16:30618633-30986671    16:30709530-30755602
RNU6-1043P  Amp 0.0684173368421053  16:30618633-30986671    16:30712658-30712756
SNORA30 Amp 0.0684173368421053  16:30618633-30986671    16:30721858-30721986
PHKG2   Amp 0.0684173368421053  16:30618633-30986671    16:30759591-30772490

My project code is different from the above, slightly, because I originally gave the above answer as an independent answer. That was my worry about linking up these steps on Biostars, i.e., they were each independent answers and don't 'harmoniously' link up together. Even a moderately skilled R user should be able to get the exact data that they want, though.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Kevin Blighe37k
2
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe37k
Republic of Ireland
Kevin Blighe37k wrote:

Hello, so, here are the steps that I did, for endometrial cancer:

First downloaded copy number data from here: http://firebrowse.org/?cohort=UCEC# (access the data by clicking on the green bar for SNP6 CopyNum). Specifically, the file that you should obtain is for hg19 and will have 'scna' and 'minus_germline_cnv' as parts of its filename.

In my case, the file was called UCEC.Level_3_segmented_scna_minus_germline_cnv_hg19.seg.txt

Then, there are a series of steps that you should do in order to process this data:

Part I

A

separate out the tumour from nomals

require(data.table)
CN <- read.table("UCEC.Level_3_segmented_scna_minus_germline_cnv_hg19.seg.txt", sep="\t", stringsAsFactors=FALSE, header=TRUE)

#   Tumor types range from 01 - 09
#   Normal types from 10 - 19
#   Control samples from 20 - 29
types <- gsub("TCGA-[A-Z0-9]*-[A-Z0-9]*-", "", (gsub("-[0-9A-Z]*-[0-9A-Z]*-01$", "", CN[,1])))
unique(types)

# NB - in the next 2 lines, it is important to account for all types that are listed from unique(types) command
matched.normal <- c("10A", "10B", "11A", "11B")
tumor <- c("01A", "01B", "06A")

#Divide up the data into tumor and normal
tumor.indices <- which(types %in% tumor)
matched.normal.indices <- which(types %in% matched.normal)
CN.tumor <- CN[tumor.indices,]
CN.matched.normal <- CN[matched.normal.indices,]

#Change IDs to allow for matching to metadata
CN.tumor[,1] <- gsub("-[0-9A-Z]*-[0-9A-Z]*-[0-9A-Z]*-01$", "", CN.tumor[,1])
CN.matched.normal[,1] <- gsub("-[0-9A-Z]*-[0-9A-Z]*-[0-9A-Z]*-01$", "", CN.matched.normal[,1])

--------------------

B

Create a markers object

require(gaia)

#Retrieve probes meta file from Broad Institute
gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/"
markersMatrix <- read.delim(paste0(gdac.root,"genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt"), as.is=TRUE, header=FALSE)
colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start")

#Change sex chr names
unique(markersMatrix$Chromosome)
markersMatrix[which(markersMatrix$Chromosome=="X"),"Chromosome"] <- 23
markersMatrix[which(markersMatrix$Chromosome=="Y"),"Chromosome"] <- 24
markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome, as.integer)

#Create a marker ID
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))

#Remove duplicates
markersMatrix <- markersMatrix[-which(duplicated(markerID)),]

#Filter markersMatrix for common CNV
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))
commonCNV <- read.delim(paste0(gdac.root,"CNV.hg19.bypos.111213.txt"), as.is=TRUE)
commonCNV[,2] <- sapply(commonCNV[,2], as.integer)
commonCNV[,3] <- sapply(commonCNV[,3], as.integer)
commonID <- apply(commonCNV, 1, function(x) paste0(x[2], ":", x[3]))
table(commonID %in% markerID)
table(markerID %in% commonID)
markersMatrix_fil <- markersMatrix[!markerID %in% commonID,]

#Create the markers object
markers_obj <- load_markers(markersMatrix_fil)

-------------------------------

C

# Determine recurrent aberrations in tumour

#Prepare CNV matrix
cnvMatrix <- CN.tumor

#Add label (0 for loss, 1 for gain)
#A segment mean of 0.3 is defined as the cut-off
cnvMatrix <- cbind(cnvMatrix, Label=NA)
cnvMatrix[cnvMatrix$Segment_Mean < -0.3,"Label"] <- 0
cnvMatrix[cnvMatrix$Segment_Mean > 0.3,"Label"] <- 1
cnvMatrix <- cnvMatrix[!is.na(cnvMatrix$Label),]

#Remove segment mean as we now go by the binary classification of gain or loss
cnvMatrix <- cnvMatrix[,-6]
colnames(cnvMatrix) <- c("Sample.Name", "Chromosome", "Start", "End", "Num.of.Markers", "Aberration")

#Substitute Chromosomes "X" and "Y" with "23" and "24"
xidx <- which(cnvMatrix$Chromosome=="X")
yidx <- which(cnvMatrix$Chromosome=="Y")
cnvMatrix[xidx,"Chromosome"] <- 23
cnvMatrix[yidx,"Chromosome"] <- 24
cnvMatrix$Chromosome <- sapply(cnvMatrix$Chromosome, as.integer)

#Run GAIA, which looks for recurrent aberrations in your input file
n <- length(unique(cnvMatrix[,1]))
cnv_obj <- load_cnv(cnvMatrix, markers_obj, n)
results.all <- runGAIA(cnv_obj, markers_obj, output_file_name="Tumor.All.txt", aberrations=-1, chromosomes=-1, num_iterations=10, threshold=0.15)
ADD COMMENTlink modified 5 weeks ago • written 9 months ago by Kevin Blighe37k

@ Kevin Blighe Hello Bro, can you help me again, I have installed R and I'm running your code line by line. I arrived in the line

#Create the markers object
markers_obj <- load_markers(markersMatrix_fil) 
Error in load_markers(markersMatrix_fil) : 
  could not find function "load_markers", plz fix this line to continue the other lines plz
ADD REPLYlink modified 9 months ago by genomax62k • written 9 months ago by Chaimaa150
2

Hey dude, ensure that the gaia package loaded correctly. load_markers() is part of that package.

ADD REPLYlink written 9 months ago by Kevin Blighe37k

Do you mean the line require (gaia)? I 'm following your code line by line and now i fail to continue...

ADD REPLYlink written 9 months ago by Chaimaa150
1

That's indeed what Kevin means.

ADD REPLYlink written 9 months ago by WouterDeCoster36k
1

Yep, ensure that gaia installed correctly (source("https://bioconductor.org/biocLite.R"); biocLite("gaia"))

load_markers
Erro: objeto 'load_markers' não encontrado

Load package and then try again:

require(gaia)
Carregando pacotes exigidos: gaia
Warning message:
package ‘gaia’ was built under R version 3.2.5 


load_markers

function (marker_matrix) 
{
    message("Loading Marker Informations")
    chromosomes <- sort(unique(marker_matrix[, 2]))
    chromosome_marker_list <- list()
    end_position <- FALSE
    if (ncol(marker_matrix) == 4) {
        end_position <- TRUE
    }
    for (i in as.numeric(chromosomes)) {
        chr_ids <- which(marker_matrix[, 2] == i)
        tmp_matrix <- matrix(0, 2, length(chr_ids))
        tmp_matrix[1, ] <- marker_matrix[chr_ids, 3]
        if (end_position) {
            tmp_matrix[2, ] <- marker_matrix[chr_ids, 4]
        }
        else {
            tmp_matrix[2, ] <- tmp_matrix[1, ]
        }
        chromosome_marker_list[[i]] <- tmp_matrix
        names(chromosome_marker_list)[[i]] <- i
        message(".", appendLF = FALSE)
    }
    message("\nDone")
    return(chromosome_marker_list)
}
<environment: namespace:gaia>
ADD REPLYlink modified 3 months ago • written 9 months ago by Kevin Blighe37k

Kevin Blighe Thanks Bro, i have installed all the packages successfully. Now, i have the following 2 issues in these corresponding lines

markers_obj <- load_markers(markersMatrix_fil)

Loading Marker Informations ......................Error in chromosome_marker_list[[i]] <- tmp_matrix :

attempt to select more than one element in integerOneIndex

In addition: Warning message:

In load_markers(markersMatrix_fil) : NAs introduced by coercion

colnames(df) <- c("Barcode", "chr", "start", "end", "extra1", "extra2")

Error in colnames<-(*tmp*, value = c("Barcode", "chr", "start", "end", :

attempt to set 'colnames' on an object with less than two dimensions

ADD REPLYlink modified 9 months ago • written 9 months ago by Chaimaa150

Hey Chief, let me take a look later.

ADD REPLYlink written 9 months ago by Kevin Blighe37k

@ Kevin Blighe, sure Bro, take your time and thanks a lot for your reply.

ADD REPLYlink written 9 months ago by Chaimaa150
1

Just to be sure, you are running these commands first, starting with the data that you download from Broad FireBrowse:

Then, you run the steps that I originally mentioned:

It is very important that the load_markers function does not give any warnings. A successful execution of this function will produce:

markers_obj <- load_markers(markersMatrix_fil)
Loading Marker Informations
........................
Done

Sometimes, available memory is an issue.

ADD REPLYlink written 9 months ago by Kevin Blighe37k

@Kevin Blighe Yes Bro I'm following your steps one by one and by the correct order.

ADD REPLYlink written 9 months ago by Chaimaa150
1

Which OS? How much RAM?

Can you definitely 100% confirm that each of these steps has completed successfully:

require(gaia)

#Retrieve probes meta file from broadinstitute website
gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/"
markersMatrix <- read.delim(paste0(gdac.root,"genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt"), as.is=TRUE, header=FALSE)
colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start")

#Change sex chromosome names
unique(markersMatrix$Chromosome)
markersMatrix[which(markersMatrix$Chromosome=="X"),"Chromosome"] <- 23
markersMatrix[which(markersMatrix$Chromosome=="Y"),"Chromosome"] <- 24
markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome, as.integer)

#Create a marker ID
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))

#Remove duplicates
markersMatrix <- markersMatrix[-which(duplicated(markerID)),]

#Filter markersMatrix for common CNV
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))
commonCNV <- read.delim(paste0(gdac.root,"CNV.hg19.bypos.111213.txt"), as.is=TRUE)
commonCNV[,2] <- sapply(commonCNV[,2], as.integer)
commonCNV[,3] <- sapply(commonCNV[,3], as.integer)
commonID <- apply(commonCNV, 1, function(x) paste0(x[2], ":", x[3]))
table(commonID %in% markerID)
table(markerID %in% commonID)
markersMatrix_fil <- markersMatrix[!markerID %in% commonID,]

#Create the markers object
markers_obj <- load_markers(markersMatrix_fil)
ADD REPLYlink written 9 months ago by Kevin Blighe37k

@ Kevin Blighe I do every step and run it carefully, i can see the output variables in the Global environment of R right corner.

Can i share my data with you and you can test your code on it?

My OS is windows10 and RAM is 12GB and my data size is 6.87 MB.

ADD REPLYlink modified 9 months ago • written 9 months ago by Chaimaa150

@ Kevin Blighe Bro, i have fixed the first error. Now, these 2 parts are not working

    cnv_obj <- load_cnv(cnvMatrix, markers_obj, n)
Loading Copy Number Data
Error in matrix(0, length(samples), ncol(markers_list[[chromosomes[i]]])) : 
  non-numeric matrix extent



results.all <- runGAIA(cnv_obj, markers_obj, output_file_name="Tumor.All.txt", aberrations=-1, chromosomes=-1, num_iterations=10, threshold=0.15)

Performing Data Preprocessing
Error in runGAIA(cnv_obj, markers_obj, output_file_name = "Tumor.All.txt",  : 
  object 'cnv_obj' not found

    colnames(df) <- c("Barcode", "chr", "start", "end", "extra1", "extra2")
Error in `colnames<-`(`*tmp*`, value = c("Barcode", "chr", "start", "end",  : 
  attempt to set 'colnames' on an object with less than two dimensions
ADD REPLYlink written 9 months ago by Chaimaa150
1

There must be something wrong with your cnvMatrix. In my example, I us the endometrial cancer (UCEC) data. Which cancer's data do you have? Just check the output of each command in Steps 1 and 3 in order to ensure that it works.

If you want to tell me the file that you downloaded, then I can also take a look here.

Peace.

ADD REPLYlink modified 3 months ago • written 9 months ago by Kevin Blighe37k

@ Kevin Blighe ok the cancer Data which i'm working on is the prostate adenocarcinoma (PRAD).

ADD REPLYlink written 9 months ago by Chaimaa150

@ Kevin Blighe, Hi Bro i know that i disturb you a lot I apologize for that and i really really appreciate your help! Did y check your code with my data, i really need to do this job so pls give help me out.

ADD REPLYlink written 9 months ago by Chaimaa150
1

Hey chief - no problem. I went here: http://firebrowse.org/?cohort=PRAD&download_dialog=true

I then downloaded the file: 'genome_wide_snp_6-FFPE-segmented_scna_minus_germline_cnv_hg19' and then used that as the data input.

I then followed all of my code and it worked fine. The only thing that I notice is that there is no copy number data for normal samples in the PRAD dataset, but this is not important because the germline copy number is already subtracted by Broad Firebrowse.

cnv_obj <- load_cnv(cnvMatrix, markers_obj, n)
Loading Copy Number Data
................................................
Done

Then:

results.all <- runGAIA(cnv_obj, markers_obj, output_file_name="Tumor.All.txt", aberrations=-1, chromosomes=-1, num_iterations=10, threshold=0.15)

Performing Data Preprocessing

Done

Computing Discontinuity Matrix
........................
Done
Computing Probability Distribution
................................................................................................
Done
Assessing the Significance of Observed Data
................................................
Done
Writing Tumor.All.txt.igv.gistic File for Integrative Genomics Viewer (IGV) Tool
................................................
Done
Running Homogeneous peel-off Algorithm With Significance Threshold of 0.15 and Homogenous Threshold of 0.12
................................................
Done

Writing Output File 'Tumor.All.txt' Containing the Significant Regions

File 'Tumor.All.txt' Saved

The problem that you have may be an issue with a new version of one of the packages. Dude, to help, I have put the output files and my R session online for you. You can download them (3 files) here: https://app.box.com/s/992llscqeabyw3rzrjg5stxd6945r73r (you may have to open an account).

The Tumor.All.txt file contains the significant recurrent somatic copy number alterations (recSCNA). You will want to input these back into R and then annotate them using the first steps that I mention in this thread. You will have to rename the columns to have at least "chr", "start", "end"

ADD REPLYlink written 9 months ago by Kevin Blighe37k

@Kevin Blighe well done Bro!

But when i only downloaded the FFPE file like you and so my question is why do you use FFPE and not "genome_wide_snp_6-segmented_scna_minus_germline_cnv_hg19"? Can you plz answer my last question and i really really appreciated your help.

BEST WISHES.

ADD REPLYlink modified 9 months ago • written 9 months ago by Chaimaa150
1

Hey,

I just chose a file 'at random'. I was only testing. You should use the file more suitable to your experiment.

FFPE tissue is, of course, lower quality.

ADD REPLYlink modified 3 months ago • written 9 months ago by Kevin Blighe37k
1

Wait. Allow me to re-run that using the non-FFPE

ADD REPLYlink modified 9 months ago • written 9 months ago by Kevin Blighe37k

OK Bro i can wait .

ADD REPLYlink written 9 months ago by Chaimaa150
1

Here you go, chief (same files but for PRAD.snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg19__seg.seg.txt): https://app.box.com/s/992llscqeabyw3rzrjg5stxd6945r73r

Dude, the PRAD dataset is very big, so, the Rdata file is ~140 megabytes

ADD REPLYlink written 9 months ago by Kevin Blighe37k

@Kevin Blighe i'm very thankful and happy for the great help you gave me!

Plz can you also do the annotation Step for me I tried in my computer but it show me this error

    df_ann <- cbind(df[subjectHits(hits),],genes[queryHits(hits),])
Error: cannot allocate vector of size 124.9 Mb
ADD REPLYlink written 9 months ago by Chaimaa150

I have a size issue actually.

ADD REPLYlink written 9 months ago by Chaimaa150
1

Getting it to you now - hang on.

ADD REPLYlink modified 3 months ago • written 9 months ago by Kevin Blighe37k
1

I have uploaded the new Rdata file, and also a list of the final annotated regions. In the 'type' column, the following is true:

  • deletion=0
  • amplification=1

https://app.box.com/s/992llscqeabyw3rzrjg5stxd6945r73r

And that is it: recurrent somatic copy number alterations (recSCNA) in the TCGA PRAD dataset. If you want to cite how the data was processed, then just cite my recent publication: https://www.ncbi.nlm.nih.gov/pubmed/29682207

ADD REPLYlink modified 3 months ago • written 9 months ago by Kevin Blighe37k

@ Kevin Blighe ,Thanks a lot Bro. Yes definitely i will cite your recent publication in my manuscript and Plz if you have any other publications share with me via e-mail or link. Your topic is of great interest for me . go ahead......

ADD REPLYlink written 9 months ago by Chaimaa150
1

No problem. I have not published too much, but I have worked a lot in private (outside academia). Stay in touch chief.

ADD REPLYlink modified 11 weeks ago • written 9 months ago by Kevin Blighe37k

@Kevin Blighe Hi bro,
Finally, I got a server and start installing the packages but I failed to install "Gaia" package

**package ‘gaia’ is not available (for R version 3.2.3)**
ADD REPLYlink modified 3 months ago by RamRS20k • written 4 months ago by Chaimaa150
1

Perhaps open a new question, and show that it has a definitive relationship to bioinformatics.

ADD REPLYlink written 4 months ago by Kevin Blighe37k

Ok, Kevin Blighe Bro, but I think this problem has no relation to bioinformatics!

  source("https://www.bioconductor.org/biocLite.R")
    Error in file(filename, "r", encoding = encoding) :
      cannot open the connection
    In addition: Warning message:
    In file(filename, "r", encoding = encoding) :
      unable to connect to 'bioconductor.org' on port 80

. The probem seems related to the server may be

ADD REPLYlink written 4 months ago by Chaimaa150

Try: source("https://bioconductor.org/biocLite.R") (without www)

If that still does not work, then try: source("http://bioconductor.org/biocLite.R") (without http)

If that does not work, then you can post on Stack Exchange

ADD REPLYlink modified 3 months ago • written 4 months ago by Kevin Blighe37k
1

@Kevin Blighe I tried with http and with https and nothing work!
I will try in Stack Exchange thanks always for your great help Kevin

ADD REPLYlink modified 3 months ago by RamRS20k • written 4 months ago by Chaimaa150
1

Please use the formatting bar (especially the code option) to present your post better. It has been done for you this time.
code_formatting

ADD REPLYlink modified 3 months ago • written 9 months ago by WouterDeCoster36k
 @Kevin Blighe Hi, Kevin Bro, Is there any possibility to output The Tumor.All.txt file with the samples too,  means the significant recurrent somatic copy number alterations (recSCNA) with their corresponding samples from the original input file or Barcode name. 
    Then, i can annotate this file? 

    Plz, reply me?
ADD REPLYlink modified 3 months ago • written 3 months ago by Chaimaa150
1

Hi, it would be dishonest and mal-practice for me to continually help you to that extent, and that would defeat the purpose of this website. You will have to find a way using your local resources, preferably in conjunction with your supervisor.

ADD REPLYlink modified 3 months ago • written 3 months ago by Kevin Blighe37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 813 users visited in the last hour