Hi all,
I'm I found overlaps between couple of genes and TCGA copy number calls but I've been unsuccessful in appendding metadata (hgnc_symbol) to subjectHits
data look like this; 1. sample of genes of interest
> my.cordinates.hg19.gr
GRanges object with 3 ranges and 3 metadata columns:
      seqnames              ranges strand | hgnc_symbol        band   gene_biotype
         <Rle>           <IRanges>  <Rle> | <character> <character>    <character>
  [1]        8 128747680-128753674      * |         MYC      q24.21 protein_coding
  [2]        5     1253262-1295184      * |        TERT      p15.33 protein_coding
  [3]       17     7565097-7590856      * |        TP53       p13.1 protein_coding
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
TCGA absolute copy calls in Granges;
  > Granges.merged.df.tcga.mastercalls.abs.BRCA.ID.ethnicities
GRanges object with 159857 ranges and 21 metadata columns:
           seqnames            ranges strand |             Sample Num_Probes    Length Modal_HSCN_1 Modal_HSCN_2 Modal_Total_CN Subclonal_HSCN_a1 Subclonal_HSCN_a2 Cancer_cell_frac_a1 Ccf_ci95_low_a1
              <Rle>         <IRanges>  <Rle> |        <character>  <numeric> <numeric>    <numeric>    <numeric>      <numeric>         <numeric>         <numeric>           <numeric>       <numeric>
       [1]        1    564621-1510801      * | TCGA-3C-AAAU_WHITE         53    946180            1            2              3                 0                 1                0.25         0.08383
       [2]        1  1688192-16142960      * | TCGA-3C-AAAU_WHITE       4180  14454768            1            2              3                 0                 1                0.06         0.02732
       [3]        1 16165661-63472103      * | TCGA-3C-AAAU_WHITE      13591  47306442            1            2              3                 0                 1                0.04         0.00615
       [4]        1 63472868-72759524      * | TCGA-3C-AAAU_WHITE       3196   9286656            1            2              3                 0                 0                0.03         0.00000
       [5]        1 72811904-85632596      * | TCGA-3C-AAAU_WHITE       4069  12820692            1            2              3                 0                 0                0.04         0.00292
       ...      ...               ...    ... .                ...        ...       ...          ...          ...            ...               ...               ...                 ...             ...
  [159853]       21 28284150-39907857      * | TCGA-Z7-A8R6_WHITE       4229  11623707            1            1              2                 0                 0                0.00         0.00000
  [159854]       21 39908195-48084820      * | TCGA-Z7-A8R6_WHITE       3214   8176625            1            1              2                 0                 0                0.05         0.01384
  [159855]       22 16055207-24329711      * | TCGA-Z7-A8R6_WHITE       1878   8274504            1            1              2                 0                 0                0.05         0.00946
  [159856]       22 24402321-44489022      * | TCGA-Z7-A8R6_WHITE       6479  20086701            1            1              2                 0                 0                0.04         0.00644
  [159857]       22 44493823-51199978      * | TCGA-Z7-A8R6_WHITE       2807   6706155            1            1              2                 0                 0                0.06         0.02714
           Ccf_ci95_high_a1 Cancer_cell_frac_a2 Ccf_ci95_low_a2 Ccf_ci95_high_a2       LOH Homozygous_deletion    solution        Type        race              ethnicity
                  <numeric>           <numeric>       <numeric>        <numeric> <numeric>           <numeric> <character> <character> <character>            <character>
       [1]          0.39979                0.74         0.58132          0.89771         0                   0         new        BRCA       WHITE NOT HISPANIC OR LATINO
       [2]          0.08989                0.85         0.81351          0.87773         0                   0         new        BRCA       WHITE NOT HISPANIC OR LATINO
       [3]          0.06643                0.91         0.87122          0.92977         0                   0         new        BRCA       WHITE NOT HISPANIC OR LATINO
       [4]          0.05842                0.03         0.00000          0.05981         0                   0         new        BRCA       WHITE NOT HISPANIC OR LATINO
       [5]          0.06738                0.06         0.02157          0.08639         0                   0         new        BRCA       WHITE NOT HISPANIC OR LATINO
       ...              ...                 ...             ...              ...       ...                 ...         ...         ...         ...                    ...
  [159853]          0.02834                0.00         0.00000          0.02834         0                   0         new        BRCA       WHITE NOT HISPANIC OR LATINO
  [159854]          0.06983                0.01         0.00000          0.03265         0                   0         new        BRCA       WHITE NOT HISPANIC OR LATINO
  [159855]          0.07308                0.05         0.00946          0.07308         0                   0         new        BRCA       WHITE NOT HISPANIC OR LATINO
  [159856]          0.05948                0.04         0.00644          0.05948         0                   0         new        BRCA       WHITE NOT HISPANIC OR LATINO
  [159857]          0.08524                0.06         0.02714          0.08524         0                   0         new        BRCA       WHITE NOT HISPANIC OR LATINO
  -------
  seqinfo: 23 sequences from an unspecified genome; no seqlengths
>
find overlaps between TCGA asbolute copy calls and ancestry- related genes
overlaps.tcga.ancestry.genes <- findOverlaps(query=my.cordinates.hg19.gr, subject=Granges.merged.df.tcga.mastercalls.abs.BRCA.ID.ethnicities, type = "within") ##complete overlaps
create empty column to append gene names for easy filtering for downstream analysis
Granges.merged.df.tcga.mastercalls.abs.BRCA.ID.ethnicities$Ancestry_genes <- NA
> str(Granges.merged.df.tcga.mastercalls.abs.BRCA.ID.ethnicities$Ancestry_genes) ##sanity checks
 logi [1:159857] NA NA NA NA NA NA ...
extract indexes of subject hits
sub_hits.overlaps <- as.data.frame(Granges.merged.df.tcga.mastercalls.abs.BRCA.ID.ethnicities[subjectHits(overlaps.tcga.ancestry.genes)])
str(sub_hits.overlaps$Ancestry_genes) ##sanity checks
append genes names to corresponding overlap indexes in subject hits
Granges.merged.df.tcga.mastercalls.abs.BRCA.ID.ethnicities[queryHits(overlaps.tcga.ancestry.genes)]$Ancestry_genes <- sub_hits.overlaps[,grepl("hgnc_symbol",names(sub_hits.overlaps))]
Error in do.call(`[<-`, args) : replacement has length zero
Thanks for your help
Thanks! this was helpful!