Question: Pruning CNA Data from TCGA
0
gravatar for jrlarsen
11 months ago by
jrlarsen0
jrlarsen0 wrote:

Hello,

I downloaded the CNA data from TCGA (GDC) which is pre-segmented by CBS using the DNAcopy library from Bioconductors (Level 3). I am currently analyzing the data but cannot find a way eliminate noise in the form of very short segments that do not match the surrounding segments of longer probe length. In other words, I have consecutive segments on chromosome 2 where the first has 122511 probes with segment mean .0235, 3 probes with segment mean -1.5194, and 9606 probes with segment mean .0224. These short segments (low number of probes) that drastically differ from their neighbor segments that are much longer are all over my data from the TCGA and I do not know how to remove them properly after segmentation (since that is how the data comes). I have read up on pruning methods via dynamic programming and square mean, but they seem to take place prior to segmentation. I can use any help you are willing to give me, I am lost and dont know what to do next.

Thank you

Edit:Grammar

ADD COMMENTlink modified 11 months ago by pbpanigrahi180 • written 11 months ago by jrlarsen0
0
gravatar for pbpanigrahi
11 months ago by
pbpanigrahi180
pbpanigrahi180 wrote:

Gistic2 is a popular tool people use for identifying regions of the genome that are significantly amplified or deleted across a set of samples.

It uses parameters such as -maxseg, -maxspace and -js to control the segments to use.

-maxseg: Maximum number of segments allowed for a sample in the input data. Samples with more segments than this threshold are excluded from the analysis. (DEFAULT=2500)

-js: Smallest number of markers to allow in segments from the segmented data. Segments that contain fewer than this number of markers are joined to the neighboring segment that is closest in copy number. (DEFAULT=4)

-maxspace: Maximum allowed spacing between pseudo-markers, in bases. Pseudo-markers are generated when the markers file input is omitted. Segments that contain fewer than this number of markers are joined to the neighboring segment that is closest in copy number. (DEFAULT=10,00)

Gistic2 is widely used tool for array based CNA identification tool. cBioportal uses Gistic2 for this. If you don't prefer to use Gistic2, then probably above parameters can be tried out.

Hope this helps

ADD COMMENTlink modified 11 months ago • written 11 months ago by pbpanigrahi180

This is extremely helpful,thank you! This is the first I have heard of GISTIC and looks like what I need. I cannot find a library for R that runs GISTIC, only via the terminal. Do you know any way to run it directly through R? R is the platform I am running everything through.

Edit: Grammar

ADD REPLYlink modified 11 months ago • written 11 months ago by jrlarsen0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1004 users visited in the last hour