Question: Comparing GISTIC2 G-Scores
0
gravatar for noodle
4 weeks ago by
noodle40
noodle40 wrote:

Hi there,

I am using GISTIC2 to identify recurrent SCNAs (somatic copy number alterations) on a cohort of primary tumour samples matched with metastatic tumour samples.

I ran GISTIC2 separately on the primary tumours and then metastatic tumours. For each recurrent amplification and deletion identified, there is a G-score.The G-score is calculated based on the amplitude of the aberration as well as the frequency of its occurrence across samples.

Now for each amplification and deletion identified in the metastatic tumours, I would like to compare the G-score to the G-score in the primary tumours to identify regions distinct to mets.

Would anyone have any experience in doing this?

I was thinking of utilising ChIP-Seq differential binding methods or I have seen a paper using a gaussian process latent difference model. They provide the math but I am not sure in practise where to start with it.

My first approach was as follows:

The GISTIC2 G score data looks like this:

head(prim.gistic@gis.scores, n=3)

variant  chr  start    end     fdr   G_Score   Avg_amplitude   frequency
Amp      1    1582202  1635250  0    0.183352   0.379261       0.230769
Amp      1    1636274  1638925  0    0.204790   0.374585       0.256410
Amp      1    1650969  1659000  0    0.212745   0.398385       0.256410



head(mets.gistic@gis.scores, n=3)
variant  chr  start     end       fdr   G_Score  Avg_amplitude frequency
Amp       1    1582202  1644747    0     0.075575    0.228630  0.205128
Amp       1    1647900  1650845    0     0.142388    0.358165  0.205128
Amp       1    1666300  1689800    0     0.183941    0.438726  0.205128

As you can see the start, end position for each line is not the same in the primary and mets. Therefore I tried using subsetOverlaps from GRanges to find intervals that overlap, with the intention of comparing mets gscore vs prim gscore. However, only one gscore is in the return GRanges object.

> prim.gr <- makeGRangesFromDataFrame( df = prim.gistic@gis.scores, 
keep.extra.columns = TRUE, 
seqnames.field = "chr",
start.field = "start",
end.field = "end" )

> mets.gr <- makeGRangesFromDataFrame( df = mets.gistic@gis.scores, 
keep.extra.columns = TRUE, 
seqnames.field = "chr",
start.field = "start",
end.field = "end" )

> subsetByOverlapsmets.gr, prim.gr)
GRanges object with 36805 ranges and 5 metadata columns:

seqnames      ranges        strand | variant  fdr   G_Score  Avg_amplitude  frequency
<Rle>  <IRanges>  <Rle> |  <character> <numeric>  <numeric> <numeric> <numeric>
 [1]        1   1582202-1644747  * |   Amp    0     0.075575     0.22863      0.205128
 [2]        1   1647900-1650845  * |   Amp    0     0.142388     0.358165     0.205128
 [3]        1   1666300-1689800  * |   Amp    0     0.183941     0.438726     0.205128
 [4]        1   1696879-1718750  * |   Amp    0     0.155353     0.478574     0.179487
 [5]        1   1720600-1849550  * |   Amp    0     0.185098     0.476421     0.205128

Is there a way using GRanges to retain both gscore columns in the subsetOverlaps output?

If there isn't, my next step was to take the chr, start, end and gscore columns to make a bed file for the primary tumours and mets tumours and then use HOMER to find differential peaks,

http://homer.ucsd.edu/homer/ngs/mergePeaks.html

Any thoughts would be much appreciated!

ADD COMMENTlink modified 28 days ago • written 4 weeks ago by noodle40
0
gravatar for Kevin Blighe
4 weeks ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

However, only one gscore is in the return GRanges object.

Have you simply tried to change the column names in both input files? - that should work.

Kevin

ADD COMMENTlink written 4 weeks ago by Kevin Blighe63k

Thanks Kevin,

I did try that and it just returns the meta columns for the first granges object

library(GenomicRanges)

colnames(prim.gistic@gis.scores ) <- paste0("prim_", colnames(prim.gistic@gis.scores ))

prim.gr <- makeGRangesFromDataFrame(
  df = prim.gistic@gis.scores,
  keep.extra.columns = TRUE,
  seqnames.field = "prim_Chromosome",
  start.field = "prim_Start_Position",
  end.field = "prim_End_Position"
)

colnames(mets.gistic@gis.scores ) <- paste0("mets_", colnames(mets.gistic@gis.scores ))

mets.gr <- makeGRangesFromDataFrame(
  df = mets.gistic@gis.scores,
  keep.extra.columns = TRUE,
  seqnames.field = "mets_Chromosome",
  start.field = "mets_Start_Position",
  end.field = "mets_End_Position"
)

# find mets regions also in prim that overlap
subsetByOverlaps ( mets.gr, prim.gr,type = "equal")
GRanges object with 76 ranges and 5 metadata columns:

seqnames ranges strand | mets_Variant_Classification mets_fdr mets_G_Score mets_Avg_amplitude mets_frequency
  <Rle>     <IRanges>    <Rle> |  <character> <numeric> <numeric>  <numeric>  <numeric>
[1] 1     8022900-8075600     * |   Amp         0     0.266339     0.962888    0.128205
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by noodle40

Oh, I think that I faced this before a few times... You have to produce the overlaps object, and then 're-overlap' this with the original data-frames in two separate passes. Can you try that?

ADD REPLYlink written 4 weeks ago by Kevin Blighe63k

Okies, i think I have it figured out- thanks Kevin! 2 heads are definitely better than 1

# find mets regions also in prim that overlap
ol.gr  <- subsetByOverlaps( mets.gr, prim.gr, type = "equal")

# index for prim hits in mets
both.ind <- findOverlaps(query = mets.gr, subject = prim.gr, type = "equal")

# grab prim meta cols by index 
tmp.gr  <- prim.gr[subjectHits(both.ind),]

# add meta cols to ol.gr
mcols( ol.gr)[c(6:10)] <- mcols( tmp.gr)

head( ol.gr,n=2)
GRanges object with 2 ranges and 11 metadata columns:
seqnames  ranges strand | mets_Variant_Classification  mets_fdr mets_G_Score mets_Avg_amplitude mets_frequency
     <Rle>           <IRanges>  <Rle> | <character> <numeric>    <numeric>   <numeric>  <numeric>
[1]  1     8022900-8075600      * |  Amp  0     0.266339           0.962888       0.128205
[2]  1 203098275-203134950  * |  Amp  0.738406     0.923121           1.066993       0.871795

  prim_Variant_Classification  prim_fdr prim_G_Score prim_Avg_amplitude prim_frequency    
                  <character> <numeric>    <numeric>          <numeric>      <numeric>         
[1]   Amp         0     0.107352            0.31092       0.205128 
[2]  Amp  0.073643      0.46918           0.695091       0.923077

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by noodle40

In terms of calculating the difference in G scores at overlapping regions, does taking the log2 fold change mets over prim seem like a good strategy?

# calc log2 FC
ol.gr$diff_gscore <- log2( ol.gr$mets_G_Score+1) - log2( ol.gr$prim_G_Score+1)
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by noodle40
1

Great that you got it working, but I would definitely 'spot check' the end result just to make sure that it is as you expect it to be.

For G score differences, I am not sure. Are these scores not already log2 ratios? Would a subtraction not then be better, in that case?

For recurrent aberrations, I use a different program called GAIA (in R).

ADD REPLYlink written 4 weeks ago by Kevin Blighe63k
1

Yep, apologies, these values are already logged.

   ol.gr$diff_gscore <- ol.gr$mets_G_Score - ol.gr$prim_G_Score

Thanks Kevin!

ADD REPLYlink written 4 weeks ago by noodle40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1555 users visited in the last hour