Question: How to extract the list of genes from TCGA CNV data
0
gravatar for Chaimaa
7 months ago by
Chaimaa130
Chaimaa130 wrote:

Hi guys, I have got a CNV data from TCGA as shown below and my goal to extract the list of the genes by using GISTIC Could anyone plz told me how can i deal with this data by using GISTIC and how to apply GISTIC.

  • TCGA-2A-A8VL-10A-01D-A379-01 1 51598 1500664 226 0.1646
  • TCGA-2A-A8VL-10A-01D-A379-01 1 1617778 1653196 12 -0.4115
  • TCGA-2A-A8VL-10A-01D-A379-01 1 1653256 15362197 7748 0.0056
  • TCGA-2A-A8VL-10A-01D-A379-01 1 15362212 15362449 6 0.7626

https://postimg.cc/image/g4kusd56j/

cnv genes tcga • 1.9k views
ADD COMMENTlink modified 7 months ago by Kevin Blighe33k • written 7 months ago by Chaimaa130
7
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe33k
Republic of Ireland
Kevin Blighe33k wrote:

Update October 14, 2018:

This thread has become a sort of focal point, so, I wanted to make it absolutely clear the process to follow. If your goal is to obtain somatic copy number alterations (sCNA) for a group of TCGA patients and/or identify recurrent sCNA in these patients, then follow these steps:

  • Part I - download pre-computed GISTIC 2.0 sCNA data for any TCGA cohort from Broad Institute's Firebrowse server and identify recurrent sCNA regions in these with GAIA
  • Part II - plot recurrent sCNA gains and losses from GAIA
  • Part III - annotate the recurrent sCNA regions (this post, just below)
  • Part IV - generate heatmap of recurrent sCNA regions over your cohort

Partial credits: TCGAbiolinks

----------------------

--------------------

.

A

Create a reference dataset of all genes and save it as a GenomicRanges object

library(biomaRt)
library(GenomicRanges)

#Set up an gene annotation template to use
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")
genes <- getBM(attributes=c("hgnc_symbol","chromosome_name","start_position","end_position"), mart=mart)
genes <- genes[genes[,1]!="" & genes[,2] %in% c(1:22,"X","Y"),]
xidx <- which(genes[,2]=="X")
yidx <- which(genes[,2]=="Y")
genes[xidx, 2] <- 23
genes[yidx, 2] <- 24
genes[,2] <- sapply(genes[,2],as.integer)
genes <- genes[order(genes[,3]),]
genes <- genes[order(genes[,2]),]
colnames(genes) <- c("GeneSymbol","Chr","Start","End")
genes_GR <- makeGRangesFromDataFrame(genes,keep.extra.columns = TRUE)

B

Store your own data as a GenomicRanges object:

colnames(df) <- c("Barcode", "chr", "start", "end", "extra1", "extra2")
df
                       Barcode chr    start      end extra1  extra2
1 TCGA-2A-A8VL-10A-01D-A379-01   1    51598  1500664    226  0.1646
2 TCGA-2A-A8VL-10A-01D-A379-01   1  1617778  1653196     12 -0.4115
3 TCGA-2A-A8VL-10A-01D-A379-01   1  1653256 15362197   7748  0.0056

df_GR <- makeGRangesFromDataFrame(df, keep.extra.columns = TRUE)
df_GR
GRanges object with 4 ranges and 3 metadata columns:
      seqnames               ranges strand |                      Barcode
         <Rle>            <IRanges>  <Rle> |                     <factor>
  [1]        1 [   51598,  1500664]      * | TCGA-2A-A8VL-10A-01D-A379-01
  [2]        1 [ 1617778,  1653196]      * | TCGA-2A-A8VL-10A-01D-A379-01
  [3]        1 [ 1653256, 15362197]      * | TCGA-2A-A8VL-10A-01D-A379-01
         extra1    extra2
      <integer> <numeric>
  [1]       226    0.1646
  [2]        12   -0.4115
  [3]      7748    0.0056
  -------

C

Overlap your regions with the reference dataset that you created

hits <- findOverlaps(genes_GR, df_GR, type="within")
df_ann <- cbind(df[subjectHits(hits),],genes[queryHits(hits),])
head(df_ann)
                         Barcode chr start     end extra1 extra2 GeneSymbol Chr
1   TCGA-2A-A8VL-10A-01D-A379-01   1 51598 1500664    226 0.1646     OR4G4P   1
1.1 TCGA-2A-A8VL-10A-01D-A379-01   1 51598 1500664    226 0.1646    OR4G11P   1
1.2 TCGA-2A-A8VL-10A-01D-A379-01   1 51598 1500664    226 0.1646      OR4F5   1
     Start    End
1    52473  54936
1.1  62948  63887
1.2  69091  70008

D

To make filtering easier, we can further manipulate the data to show the precise co-ordinates of each gene and the region co-ordinates in which it was found to have CNA / CNV:

AberrantRegion <- paste0(df_ann[,1],":", df_ann[,3],"-", df_ann[,4])
GeneRegion <- paste0(df_ann[,7],":", df_ann[,8],"-", df_ann[,9])
Final_regions <- cbind(df_ann[,c(6,2,5)], AberrantRegion, GeneRegion)
Final_regions[seq(1, nrow(Final_regions), 20),]
      extra2 chr extra1                                AberrantRegion
1     0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
1.20  0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
1.40  0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
1.60  0.1646   1    226    TCGA-2A-A8VL-10A-01D-A379-01:51598-1500664
               GeneRegion
1          OR4G4P:1-52473
1.20    TUBB8P11:1-808672
1.40    FAM132A:1-1177826
1.60    TMEM240:1-1470554
ADD COMMENTlink modified 3 days ago • written 7 months ago by Kevin Blighe33k

@Kevin Blighe Thank you too much for your reply but I'm not familiar with the language of your source code and I'm sure that I have to use GISTIC as mentioned in many papers that i have already read in order to extract a more reliable list of genes and their loci

ADD REPLYlink written 7 months ago by Chaimaa130

if anyone familiar with GISTIC plz give me some guidance

ADD REPLYlink written 7 months ago by Chaimaa130
1

Please explain the exact source from where you downloaded your data. There are various types of copy number data in various states of processing available from different web-sites. The data is usually available as somatic copy number alteration data (SCNA), not copy number variation (CNV) - CNVs are more spoken of in terms of the germline and 'natural' variation in copy number. GISTIC is just one of a few programs that are commonly used to analyse SCNA data for the purposes of 'summarising' the regions by grouping them into large regions more amenable for downstream analysis and interpretation.

Have you taken a look at the documentatation to see what exactly you require to execute the program? - ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm

The code that I put above is code using base R functions (with the exception of biomaRt) that can be used to identify the genes overlapping any type of segment data (chr : startbp : endbp).

Thank you.

ADD REPLYlink written 7 months ago by Kevin Blighe33k
1

To give you an idea, take a look my most recent publication where a main part was to analyse the somatic CNA ata that is available. I used a mixture of programs to do it, including GISTIC2.0: Racial differences in endometrial cancer molecular portraits in The Cancer Genome Atlas.

Look at the section headed 'Somatic mutation and copy number aberrations' in the mathods

ADD REPLYlink written 7 months ago by Kevin Blighe33k

@Kevin Blighe Thanks a lot for your interest and quick reply.

ADD REPLYlink written 7 months ago by Chaimaa130

@Kevin Blighe, I checked your paper and thread through the section" Somatic mutation and copy number aberrations' in the methods"

I got The CNV data level3 from this website:http://gdac.broadinstitute.org and specifically from this link http://firebrowse.org/?cohort=PRAD&download_dialog=true.

I'm interested in getting the list of the significant genes from this data by using GISTIC.

As i know from many papers Gistic2.0 can be used to analyze the copy number dataset (Level 3) for the identification of recurrent regions of copy number alteration and the copy number of genes, and this my goal too.

So the data i got have no gene names.

Additionally, i found that the link from where i got my data have GISTIC results http://firebrowse.org/?cohort=PRAD&download_dialog=true# but i really don't know which file i will download and how to deal with them.

Finally, my main goal is to get a matrix of m*n in which m represent the list of patients(Samples) and n represent the list of genes from this dataset.

@Kevin Blighe Plz, i hope you can help me out Bro to do this job, and i really really appreciate your help!

ADD REPLYlink modified 7 months ago • written 7 months ago by Chaimaa130
1

No problem.

Yes, recurrent somatic copy number alterations (recSCNA) are the target (for analysis), and the genes that overlap these. I am almost certain that I also started with the data from Broad Firebrowse, but I will double check when I arrive home.

I will go through the analysis steps one-by-one for you.

ADD REPLYlink modified 6 weeks ago • written 7 months ago by Kevin Blighe33k

@ Kevin Blighe how to save these final regions into a file?

ADD REPLYlink written 9 weeks ago by Chaimaa130
1

Hey, you can do something like:

write.table(Final_regions, "FinalRegions.csv", sep=",", quote=FALSE, row.names=FALSE)
ADD REPLYlink written 9 weeks ago by Kevin Blighe33k
2
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe33k
Republic of Ireland
Kevin Blighe33k wrote:

Hello, so, here are the steps that I did, for endometrial cancer:

First downloaded copy number data from here: http://firebrowse.org/?cohort=UCEC# (access the data by clicking on the green bar for SNP6 CopyNum). Specifically, the file that you should obtain is for hg19 and will have 'scna' and 'minus_germline_cnv' as parts of its filename.

In my case, the file was called UCEC.Level_3_segmented_scna_minus_germline_cnv_hg19.seg.txt

Then, there are a series of steps that you should do in order to process this data:

A

separate out the tumour from nomals

require(data.table)
CN <- read.table("UCEC.Level_3_segmented_scna_minus_germline_cnv_hg19.seg.txt", sep="\t", stringsAsFactors=FALSE, header=TRUE)

#   Tumor types range from 01 - 09
#   Normal types from 10 - 19
#   Control samples from 20 - 29
types <- gsub("TCGA-[A-Z0-9]*-[A-Z0-9]*-", "", (gsub("-[0-9A-Z]*-[0-9A-Z]*-01$", "", CN[,1])))
unique(types)

# NB - in the next 2 lines, it is important to account for all types that are listed from unique(types) command
matched.normal <- c("10A", "10B", "11A", "11B")
tumor <- c("01A", "01B", "06A")

#Divide up the data into tumor and normal
tumor.indices <- which(types %in% tumor)
matched.normal.indices <- which(types %in% matched.normal)
CN.tumor <- CN[tumor.indices,]
CN.matched.normal <- CN[matched.normal.indices,]

#Change IDs to allow for matching to metadata
CN.tumor[,1] <- gsub("-[0-9A-Z]*-[0-9A-Z]*-[0-9A-Z]*-01$", "", CN.tumor[,1])
CN.matched.normal[,1] <- gsub("-[0-9A-Z]*-[0-9A-Z]*-[0-9A-Z]*-01$", "", CN.matched.normal[,1])

--------------------

B

Create a markers object

require(gaia)

#Retrieve probes meta file from Broad Institute
gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/"
markersMatrix <- read.delim(paste0(gdac.root,"genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt"), as.is=TRUE, header=FALSE)
colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start")

#Change sex chr names
unique(markersMatrix$Chromosome)
markersMatrix[which(markersMatrix$Chromosome=="X"),"Chromosome"] <- 23
markersMatrix[which(markersMatrix$Chromosome=="Y"),"Chromosome"] <- 24
markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome, as.integer)

#Create a marker ID
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))

#Remove duplicates
markersMatrix <- markersMatrix[-which(duplicated(markerID)),]

#Filter markersMatrix for common CNV
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))
commonCNV <- read.delim(paste0(gdac.root,"CNV.hg19.bypos.111213.txt"), as.is=TRUE)
commonCNV[,2] <- sapply(commonCNV[,2], as.integer)
commonCNV[,3] <- sapply(commonCNV[,3], as.integer)
commonID <- apply(commonCNV, 1, function(x) paste0(x[2], ":", x[3]))
table(commonID %in% markerID)
table(markerID %in% commonID)
markersMatrix_fil <- markersMatrix[!markerID %in% commonID,]

#Create the markers object
markers_obj <- load_markers(markersMatrix_fil)

-------------------------------

C

# Determine recurrent aberrations in tumour

#Prepare CNV matrix
cnvMatrix <- CN.tumor

#Add label (0 for loss, 1 for gain)
#A segment mean of 0.3 is defined as the cut-off
cnvMatrix <- cbind(cnvMatrix, Label=NA)
cnvMatrix[cnvMatrix$Segment_Mean < -0.3,"Label"] <- 0
cnvMatrix[cnvMatrix$Segment_Mean > 0.3,"Label"] <- 1
cnvMatrix <- cnvMatrix[!is.na(cnvMatrix$Label),]

#Remove segment mean as we now go by the binary classification of gain or loss
cnvMatrix <- cnvMatrix[,-6]
colnames(cnvMatrix) <- c("Sample.Name", "Chromosome", "Start", "End", "Num.of.Markers", "Aberration")

#Substitute Chromosomes "X" and "Y" with "23" and "24"
xidx <- which(cnvMatrix$Chromosome=="X")
yidx <- which(cnvMatrix$Chromosome=="Y")
cnvMatrix[xidx,"Chromosome"] <- 23
cnvMatrix[yidx,"Chromosome"] <- 24
cnvMatrix$Chromosome <- sapply(cnvMatrix$Chromosome, as.integer)

#Run GAIA, which looks for recurrent aberrations in your input file
n <- length(unique(cnvMatrix[,1]))
cnv_obj <- load_cnv(cnvMatrix, markers_obj, n)
results.all <- runGAIA(cnv_obj, markers_obj, output_file_name="Tumor.All.txt", aberrations=-1, chromosomes=-1, num_iterations=10, threshold=0.15)
ADD COMMENTlink modified 20 days ago • written 7 months ago by Kevin Blighe33k

@ Kevin Blighe Hello Bro, can you help me again, I have installed R and I'm running your code line by line. I arrived in the line

#Create the markers object
markers_obj <- load_markers(markersMatrix_fil) 
Error in load_markers(markersMatrix_fil) : 
  could not find function "load_markers", plz fix this line to continue the other lines plz
ADD REPLYlink modified 7 months ago by genomax59k • written 7 months ago by Chaimaa130
2

Hey dude, ensure that the gaia package loaded correctly. load_markers() is part of that package.

ADD REPLYlink written 7 months ago by Kevin Blighe33k

Do you mean the line require (gaia)? I 'm following your code line by line and now i fail to continue...

ADD REPLYlink written 7 months ago by Chaimaa130
1

That's indeed what Kevin means.

ADD REPLYlink written 7 months ago by WouterDeCoster35k
1

Yep, ensure that gaia installed correctly (source("https://bioconductor.org/biocLite.R"); biocLite("gaia"))

load_markers
Erro: objeto 'load_markers' não encontrado

Load package and then try again:

require(gaia)
Carregando pacotes exigidos: gaia
Warning message:
package ‘gaia’ was built under R version 3.2.5 


load_markers

function (marker_matrix) 
{
    message("Loading Marker Informations")
    chromosomes <- sort(unique(marker_matrix[, 2]))
    chromosome_marker_list <- list()
    end_position <- FALSE
    if (ncol(marker_matrix) == 4) {
        end_position <- TRUE
    }
    for (i in as.numeric(chromosomes)) {
        chr_ids <- which(marker_matrix[, 2] == i)
        tmp_matrix <- matrix(0, 2, length(chr_ids))
        tmp_matrix[1, ] <- marker_matrix[chr_ids, 3]
        if (end_position) {
            tmp_matrix[2, ] <- marker_matrix[chr_ids, 4]
        }
        else {
            tmp_matrix[2, ] <- tmp_matrix[1, ]
        }
        chromosome_marker_list[[i]] <- tmp_matrix
        names(chromosome_marker_list)[[i]] <- i
        message(".", appendLF = FALSE)
    }
    message("\nDone")
    return(chromosome_marker_list)
}
<environment: namespace:gaia>
ADD REPLYlink modified 6 weeks ago • written 7 months ago by Kevin Blighe33k

Kevin Blighe Thanks Bro, i have installed all the packages successfully. Now, i have the following 2 issues in these corresponding lines

markers_obj <- load_markers(markersMatrix_fil)

Loading Marker Informations ......................Error in chromosome_marker_list[[i]] <- tmp_matrix :

attempt to select more than one element in integerOneIndex

In addition: Warning message:

In load_markers(markersMatrix_fil) : NAs introduced by coercion

colnames(df) <- c("Barcode", "chr", "start", "end", "extra1", "extra2")

Error in colnames<-(*tmp*, value = c("Barcode", "chr", "start", "end", :

attempt to set 'colnames' on an object with less than two dimensions

ADD REPLYlink modified 7 months ago • written 7 months ago by Chaimaa130

Hey Chief, let me take a look later.

ADD REPLYlink written 7 months ago by Kevin Blighe33k

@ Kevin Blighe, sure Bro, take your time and thanks a lot for your reply.

ADD REPLYlink written 7 months ago by Chaimaa130
1

Just to be sure, you are running these commands first, starting with the data that you download from Broad FireBrowse:

Then, you run the steps that I originally mentioned:

It is very important that the load_markers function does not give any warnings. A successful execution of this function will produce:

markers_obj <- load_markers(markersMatrix_fil)
Loading Marker Informations
........................
Done

Sometimes, available memory is an issue.

ADD REPLYlink written 7 months ago by Kevin Blighe33k

@Kevin Blighe Yes Bro I'm following your steps one by one and by the correct order.

ADD REPLYlink written 7 months ago by Chaimaa130
1

Which OS? How much RAM?

Can you definitely 100% confirm that each of these steps has completed successfully:

require(gaia)

#Retrieve probes meta file from broadinstitute website
gdac.root <- "ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/"
markersMatrix <- read.delim(paste0(gdac.root,"genome.info.6.0_hg19.na31_minus_frequent_nan_probes_sorted_2.1.txt"), as.is=TRUE, header=FALSE)
colnames(markersMatrix) <- c("Probe.Name", "Chromosome", "Start")

#Change sex chromosome names
unique(markersMatrix$Chromosome)
markersMatrix[which(markersMatrix$Chromosome=="X"),"Chromosome"] <- 23
markersMatrix[which(markersMatrix$Chromosome=="Y"),"Chromosome"] <- 24
markersMatrix$Chromosome <- sapply(markersMatrix$Chromosome, as.integer)

#Create a marker ID
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))

#Remove duplicates
markersMatrix <- markersMatrix[-which(duplicated(markerID)),]

#Filter markersMatrix for common CNV
markerID <- apply(markersMatrix, 1, function(x) paste0(x[2], ":", x[3]))
commonCNV <- read.delim(paste0(gdac.root,"CNV.hg19.bypos.111213.txt"), as.is=TRUE)
commonCNV[,2] <- sapply(commonCNV[,2], as.integer)
commonCNV[,3] <- sapply(commonCNV[,3], as.integer)
commonID <- apply(commonCNV, 1, function(x) paste0(x[2], ":", x[3]))
table(commonID %in% markerID)
table(markerID %in% commonID)
markersMatrix_fil <- markersMatrix[!markerID %in% commonID,]

#Create the markers object
markers_obj <- load_markers(markersMatrix_fil)
ADD REPLYlink written 7 months ago by Kevin Blighe33k

@ Kevin Blighe I do every step and run it carefully, i can see the output variables in the Global environment of R right corner.

Can i share my data with you and you can test your code on it?

My OS is windows10 and RAM is 12GB and my data size is 6.87 MB.

ADD REPLYlink modified 7 months ago • written 7 months ago by Chaimaa130

@ Kevin Blighe Bro, i have fixed the first error. Now, these 2 parts are not working

    cnv_obj <- load_cnv(cnvMatrix, markers_obj, n)
Loading Copy Number Data
Error in matrix(0, length(samples), ncol(markers_list[[chromosomes[i]]])) : 
  non-numeric matrix extent



results.all <- runGAIA(cnv_obj, markers_obj, output_file_name="Tumor.All.txt", aberrations=-1, chromosomes=-1, num_iterations=10, threshold=0.15)

Performing Data Preprocessing
Error in runGAIA(cnv_obj, markers_obj, output_file_name = "Tumor.All.txt",  : 
  object 'cnv_obj' not found

    colnames(df) <- c("Barcode", "chr", "start", "end", "extra1", "extra2")
Error in `colnames<-`(`*tmp*`, value = c("Barcode", "chr", "start", "end",  : 
  attempt to set 'colnames' on an object with less than two dimensions
ADD REPLYlink written 7 months ago by Chaimaa130
1

There must be something wrong with your cnvMatrix. In my example, I us the endometrial cancer (UCEC) data. Which cancer's data do you have? Just check the output of each command in Steps 1 and 3 in order to ensure that it works.

If you want to tell me the file that you downloaded, then I can also take a look here.

Peace.

ADD REPLYlink modified 6 weeks ago • written 7 months ago by Kevin Blighe33k

@ Kevin Blighe ok the cancer Data which i'm working on is the prostate adenocarcinoma (PRAD).

ADD REPLYlink written 7 months ago by Chaimaa130

@ Kevin Blighe, Hi Bro i know that i disturb you a lot I apologize for that and i really really appreciate your help! Did y check your code with my data, i really need to do this job so pls give help me out.

ADD REPLYlink written 7 months ago by Chaimaa130
1

Hey chief - no problem. I went here: http://firebrowse.org/?cohort=PRAD&download_dialog=true

I then downloaded the file: 'genome_wide_snp_6-FFPE-segmented_scna_minus_germline_cnv_hg19' and then used that as the data input.

I then followed all of my code and it worked fine. The only thing that I notice is that there is no copy number data for normal samples in the PRAD dataset, but this is not important because the germline copy number is already subtracted by Broad Firebrowse.

cnv_obj <- load_cnv(cnvMatrix, markers_obj, n)
Loading Copy Number Data
................................................
Done

Then:

results.all <- runGAIA(cnv_obj, markers_obj, output_file_name="Tumor.All.txt", aberrations=-1, chromosomes=-1, num_iterations=10, threshold=0.15)

Performing Data Preprocessing

Done

Computing Discontinuity Matrix
........................
Done
Computing Probability Distribution
................................................................................................
Done
Assessing the Significance of Observed Data
................................................
Done
Writing Tumor.All.txt.igv.gistic File for Integrative Genomics Viewer (IGV) Tool
................................................
Done
Running Homogeneous peel-off Algorithm With Significance Threshold of 0.15 and Homogenous Threshold of 0.12
................................................
Done

Writing Output File 'Tumor.All.txt' Containing the Significant Regions

File 'Tumor.All.txt' Saved

The problem that you have may be an issue with a new version of one of the packages. Dude, to help, I have put the output files and my R session online for you. You can download them (3 files) here: https://app.box.com/s/992llscqeabyw3rzrjg5stxd6945r73r (you may have to open an account).

The Tumor.All.txt file contains the significant recurrent somatic copy number alterations (recSCNA). You will want to input these back into R and then annotate them using the first steps that I mention in this thread. You will have to rename the columns to have at least "chr", "start", "end"

ADD REPLYlink written 7 months ago by Kevin Blighe33k

@Kevin Blighe well done Bro!

But when i only downloaded the FFPE file like you and so my question is why do you use FFPE and not "genome_wide_snp_6-segmented_scna_minus_germline_cnv_hg19"? Can you plz answer my last question and i really really appreciated your help.

BEST WISHES.

ADD REPLYlink modified 7 months ago • written 7 months ago by Chaimaa130
1

Hey,

I just chose a file 'at random'. I was only testing. You should use the file more suitable to your experiment.

FFPE tissue is, of course, lower quality.

ADD REPLYlink modified 6 weeks ago • written 7 months ago by Kevin Blighe33k
1

Wait. Allow me to re-run that using the non-FFPE

ADD REPLYlink modified 7 months ago • written 7 months ago by Kevin Blighe33k

OK Bro i can wait .

ADD REPLYlink written 7 months ago by Chaimaa130
1

Here you go, chief (same files but for PRAD.snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg19__seg.seg.txt): https://app.box.com/s/992llscqeabyw3rzrjg5stxd6945r73r

Dude, the PRAD dataset is very big, so, the Rdata file is ~140 megabytes

ADD REPLYlink written 7 months ago by Kevin Blighe33k

@Kevin Blighe i'm very thankful and happy for the great help you gave me!

Plz can you also do the annotation Step for me I tried in my computer but it show me this error

    df_ann <- cbind(df[subjectHits(hits),],genes[queryHits(hits),])
Error: cannot allocate vector of size 124.9 Mb
ADD REPLYlink written 7 months ago by Chaimaa130

I have a size issue actually.

ADD REPLYlink written 7 months ago by Chaimaa130
1

Getting it to you now - hang on.

ADD REPLYlink modified 6 weeks ago • written 7 months ago by Kevin Blighe33k
1

I have uploaded the new Rdata file, and also a list of the final annotated regions. In the 'type' column, the following is true:

  • deletion=0
  • amplification=1

https://app.box.com/s/992llscqeabyw3rzrjg5stxd6945r73r

And that is it: recurrent somatic copy number alterations (recSCNA) in the TCGA PRAD dataset. If you want to cite how the data was processed, then just cite my recent publication: https://www.ncbi.nlm.nih.gov/pubmed/29682207

ADD REPLYlink modified 6 weeks ago • written 7 months ago by Kevin Blighe33k

@ Kevin Blighe ,Thanks a lot Bro. Yes definitely i will cite your recent publication in my manuscript and Plz if you have any other publications share with me via e-mail or link. Your topic is of great interest for me . go ahead......

ADD REPLYlink written 7 months ago by Chaimaa130
1

No problem. I have not published too much, but I have worked a lot in private (outside academia). Stay in touch chief.

ADD REPLYlink modified 20 days ago • written 7 months ago by Kevin Blighe33k

@Kevin Blighe Hi bro,
Finally, I got a server and start installing the packages but I failed to install "Gaia" package

**package ‘gaia’ is not available (for R version 3.2.3)**
ADD REPLYlink modified 6 weeks ago by RamRS19k • written 8 weeks ago by Chaimaa130
1

Perhaps open a new question, and show that it has a definitive relationship to bioinformatics.

ADD REPLYlink written 8 weeks ago by Kevin Blighe33k

Ok, Kevin Blighe Bro, but I think this problem has no relation to bioinformatics!

  source("https://www.bioconductor.org/biocLite.R")
    Error in file(filename, "r", encoding = encoding) :
      cannot open the connection
    In addition: Warning message:
    In file(filename, "r", encoding = encoding) :
      unable to connect to 'bioconductor.org' on port 80

. The probem seems related to the server may be

ADD REPLYlink written 8 weeks ago by Chaimaa130

Try: source("https://bioconductor.org/biocLite.R") (without www)

If that still does not work, then try: source("http://bioconductor.org/biocLite.R") (without http)

If that does not work, then you can post on Stack Exchange

ADD REPLYlink modified 5 weeks ago • written 8 weeks ago by Kevin Blighe33k
1

@Kevin Blighe I tried with http and with https and nothing work!
I will try in Stack Exchange thanks always for your great help Kevin

ADD REPLYlink modified 6 weeks ago by RamRS19k • written 8 weeks ago by Chaimaa130
1

Please use the formatting bar (especially the code option) to present your post better. It has been done for you this time.
code_formatting

ADD REPLYlink modified 6 weeks ago • written 7 months ago by WouterDeCoster35k
 @Kevin Blighe Hi, Kevin Bro, Is there any possibility to output The Tumor.All.txt file with the samples too,  means the significant recurrent somatic copy number alterations (recSCNA) with their corresponding samples from the original input file or Barcode name. 
    Then, i can annotate this file? 

    Plz, reply me?
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Chaimaa130
1

Hi, it would be dishonest and mal-practice for me to continually help you to that extent, and that would defeat the purpose of this website. You will have to find a way using your local resources, preferably in conjunction with your supervisor.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Kevin Blighe33k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1140 users visited in the last hour