[R] makeGRangesFromDataFrame errors that metadata columns (in dframe) has reserved words. this is not the case
1
0
Entering edit mode
3.8 years ago
Alewa ▴ 150

Hello Biostars, converting dataframe to Granges object using GenomicRanges::makeGRangesFromDataFrame but keeeps failing due to presumably reserved names in metadata column but my metadata columns do not contain these

"seqnames", "ranges", "strand", "seqlevels", "seqlengths", "isCircular", "start", "end", "width"

any suggestions on how to resolve this would be greatly appreciated.

thanks

library(GenomicFeatures)
GenomicRanges::makeGRangesFromDataFrame(df=merged.biomart.hg19.cna,
                         seqnames.field="seqnames",
                         start.field="start",
                         end.field="end",
                         strand.field="strand",
                         keep.extra.columns = T, ignore.strand=TRUE)

Error in validObject(ans) : invalid class “GRanges” object: 
    names of metadata columns cannot be one of "seqnames", "ranges", "strand", "seqlevels", "seqlengths", "isCircular", "start", "end", "width",


#showing colnames
colnames(merged.biomart.hg19.cna)
 [1] "seqnames"       "start"          "end"            "strand"         "Gene.stable.ID" "Biomart_str"    "Gene.name"      "HGNC.symbol"    "Gene.type"     
[10] "copy_number"    "P13_HER2_GU"    "P05_LumA_G2"    "P12_HER2_G3"    "P11_LumA_G2"    "P10_TNBC_G3"    "P09_LumA_G3"    "P14_LumA_G3"    "P15_TNBC_GU"   
[19] "P06_LumB_G3"    "P01_TNBC_G3"    "P04_LumA_G3"    "P02_TNBC_G3"    "P07_LumA_G3"    "P03_TNBC_G2"    "P08_HER2_GU"

#showing traceback()
traceback()
4: stop(msg, ": ", errors, domain = NA)
3: validObject(ans)
2: GRanges(ans_seqnames, ans_ranges, strand = ans_strand, ans_mcols, 
       seqinfo = ans_seqinfo)
1: GenomicRanges::makeGRangesFromDataFrame(df = merged.biomart.hg19.cna, 
       seqnames.field = "seqnames", start.field = "start", end.field = "end", 
       strand.field = "strand", keep.extra.columns = T, ignore.strand = TRUE)
    "element"


##showing session info
sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2     reshape2_1.4.4         ggpubr_0.3.0           reshape_0.8.8          forcats_0.5.0          stringr_1.4.0         
 [7] dplyr_1.0.0            purrr_0.3.4            readr_1.3.1            tidyr_1.1.0            tibble_3.0.1           ggplot2_3.3.1         
[13] tidyverse_1.3.0        dendextend_1.13.4      plot.matrix_1.4        pheatmap_1.0.12        data.table_1.12.8      AnnotationHub_2.20.0  
[19] BiocFileCache_1.12.0   dbplyr_1.4.4           GenomicFeatures_1.40.0 AnnotationDbi_1.50.0   Biobase_2.48.0         biomaRt_2.44.0        
[25] GenomicRanges_1.40.0   GenomeInfoDb_1.24.0    IRanges_2.22.2         S4Vectors_0.26.1       BiocGenerics_0.34.0   

loaded via a namespace (and not attached):
 [1] colorspace_1.4-1              ggsignif_0.6.0                ellipsis_0.3.1                rio_0.5.16                    XVector_0.28.0               
 [6] fs_1.4.1                      rstudioapi_0.11               farver_2.0.3                  bit64_0.9-7                   interactiveDisplayBase_1.26.3
[11] fansi_0.4.1                   lubridate_1.7.9               xml2_1.3.2                    jsonlite_1.6.1                Rsamtools_2.4.0              
[16] broom_0.5.6                   shiny_1.4.0.2                 BiocManager_1.30.10           compiler_4.0.1                httr_1.4.1                   
[21] backports_1.1.7               assertthat_0.2.1              Matrix_1.2-18                 fastmap_1.0.1                 cli_2.0.2                    
[26] later_1.1.0.1                 htmltools_0.4.0               prettyunits_1.1.1             tools_4.0.1                   gtable_0.3.0                 
[31] glue_1.4.1                    GenomeInfoDbData_1.2.3        rappdirs_0.3.1                Rcpp_1.0.4.6                  carData_3.0-4                
[36] cellranger_1.1.0              vctrs_0.3.1                   Biostrings_2.56.0             nlme_3.1-148                  rtracklayer_1.48.0           
[41] openxlsx_4.1.5                rvest_0.3.5                   mime_0.9                      lifecycle_0.2.0               rstatix_0.5.0                
[46] XML_3.99-0.3                  zlibbioc_1.34.0               scales_1.1.1                  hms_0.5.3                     promises_1.1.1               
[51] SummarizedExperiment_1.18.1   yaml_2.2.1                    curl_4.3                      memoise_1.1.0                 gridExtra_2.3                
[56] stringi_1.4.6                 RSQLite_2.2.0                 BiocVersion_3.11.1            zip_2.0.4                     BiocParallel_1.22.0          
[61] rlang_0.4.6                   pkgconfig_2.0.3               matrixStats_0.56.0            bitops_1.0-6                  lattice_0.20-41              
[66] GenomicAlignments_1.24.0      bit_1.1-15.2                  tidyselect_1.1.0              plyr_1.8.6                    magrittr_1.5                 
[71] R6_2.4.1                      generics_0.0.2                DelayedArray_0.14.0           DBI_1.1.0                     foreign_0.8-80               
[76] pillar_1.4.4                  haven_2.3.1                   withr_2.2.0                   abind_1.4-5                   RCurl_1.98-1.2               
[81] modelr_0.1.8                  crayon_1.3.4                  car_3.0-8                     utf8_1.1.4                    viridis_0.5.1                
[86] progress_1.2.2                grid_4.0.1                    readxl_1.3.1                  blob_1.2.1                    reprex_0.3.0                 
[91] digest_0.6.25                 xtable_1.8-4                  httpuv_1.5.4                  openssl_1.4.1                 munsell_0.5.0                
[96] viridisLite_0.3.0             askpass_1.1                  
>
R GenomicRanges bioconductor • 2.8k views
ADD COMMENT
0
Entering edit mode

Hey, what is the output of

str(merged.biomart.hg19.cna)
class(merged.biomart.hg19.cna)
typeof(merged.biomart.hg19.cna)

?

ADD REPLY
0
Entering edit mode
class(merged.biomart.hg19.cna)
[1] "data.frame"

typeof(merged.biomart.hg19.cna)
[1] "list"


str(merged.biomart.hg19.cna)
'data.frame':   57812 obs. of  24 variables:
 $ seqnames      : Factor w/ 24 levels "1","10","11",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ start         : int  11869 14363 29554 29554 29554 29554 34554 52473 62948 69091 ...
 $ end           : int  14412 29806 31109 31109 31109 31109 36081 54936 63887 70008 ...
 $ Gene.stable.ID: chr  "ENSG00000223972" "ENSG00000227232" "ENSG00000243485" "ENSG00000243485" ...
 $ Biomart_str   : int  1 -1 1 1 1 1 -1 1 1 1 ...
 $ Gene.name     : chr  "DDX11L1" "WASH7P" "MIR1302-10" "MIR1302-10" ...
 $ HGNC.symbol   : chr  "DDX11L1" "WASH7P" "MIR1302-11" "MIR1302-10" ...
 $ Gene.type     : chr  "pseudogene" "pseudogene" "lincRNA" "lincRNA" ...
 $ copy_number   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P13_HER2_GU   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P05_LumA_G2   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P12_HER2_G3   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P11_LumA_G2   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P10_TNBC_G3   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P09_LumA_G3   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P14_LumA_G3   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P15_TNBC_GU   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P06_LumB_G3   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P01_TNBC_G3   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P04_LumA_G3   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P02_TNBC_G3   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P07_LumA_G3   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P03_TNBC_G2   : int  NA NA NA NA NA NA NA NA NA NA ...
 $ P08_HER2_GU   : int  NA NA NA NA NA NA NA NA NA NA ...

please ignore the lots NA; what we see are pseudogenes

table(merged.biomart.hg19.cna$P01_TNBC_G3)

    1     2     3     4     5     6     7     8    10    13 
 1799 19911  6696 13848  2080  1460  1182   247     7    22
ADD REPLY
0
Entering edit mode

I see - thanks ekwame. The column names should already be okay, so, you should not have to specify them.

I wonder could you simply try:

makeGRangesFromDataFrame(df = merged.biomart.hg19.cna,
  keep.extra.columns = TRUE, ignore.strand = TRUE)

...or:

merged.biomart.hg19.cna$seqnames <- as.character(merged.biomart.hg19.cna$seqnames)

makeGRangesFromDataFrame(df = merged.biomart.hg19.cna,
  keep.extra.columns = TRUE, ignore.strand = TRUE)
ADD REPLY
2
Entering edit mode
3.8 years ago
Alewa ▴ 150

Thanks @Kevin Blighe, it turns out the 'strand' column was the source of error. i think because of ignore.strand=TRUE flag in GenomicRanges::makeGRangesFromDataFrame set ignore.strand=TRUE inistially because GenomicRanges::makeGRangesFromDataFrame wasn't recognizing values in strand column

quick fix was;

makeGRangesFromDataFrame(df = merged.biomart.hg19.cna,
                         keep.extra.columns = TRUE, ignore.strand = FALSE)

or

merged.biomart.hg19.cna <- merged.biomart.hg19.cna[-4]
makeGRangesFromDataFrame(df = merged.biomart.hg19.cna,
                         keep.extra.columns = TRUE, ignore.strand = TRUE)

thanks S

ADD COMMENT

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6