locateVariants Fails to Give the Correct Gene ID
0
1
Entering edit mode
2.1 years ago

Hello,

I am using R to find out the corresponding gene that covers the position 10736809, chromosome 1:

> library(TxDb.Hsapiens.UCSC.hg38.knownGene)
> library(VariantAnnotation)
> txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
> rng <- GRanges(seqnames='chr1',ranges=IRanges(10736809,10736809))

However, locateVariants() gives a report like this:

> locateVariants(rng, txdb, AllVariants())
'select()' returned many:1 mapping between keys and columns
GRanges object with 9 ranges and 9 metadata columns:
      seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID        TXID         CDSID      GENEID       PRECEDEID
         <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <character> <IntegerList> <character> <CharacterList>
  [1]     chr1  10736809      + |   intron     60904     60904         1       10330                      <NA>                
  [2]     chr1  10736809      + |   intron     63204     63204         1       10331                      <NA>                
  [3]     chr1  10736809      + |   intron     59598     59598         1       10332                      <NA>                
  [4]     chr1  10736809      + |   intron     59783     59783         1       10333                      <NA>                
  [5]     chr1  10736809      + |   intron     59598     59598         1      217346                     80781                
  [6]     chr1  10736809      + |   intron     60904     60904         1      217347                     80781                
  [7]     chr1  10736809      + |   intron     63204     63204         1      217348                     80781                
  [8]     chr1  10736809      + |   intron     59783     59783         1      217350                     80781                
  [9]     chr1  10736809      + |   intron    208422    208422         1      217350                     80781                

The Gene ID 80781 refers to a gene in chr21, so totally wrong. The target position actually belongs to the intron of gene CASZ1 (Gene ID 54897). However, txdb seems to have their correct information:

> select(txdb, keys = c("54897", "80781"), columns="TXCHROM", keytype="GENEID")
'select()' returned 1:1 mapping between keys and columns
  GENEID TXCHROM
1   54897    chr1
2   80781   chr21 

What could be wrong? The basic information of my system:

> sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] VariantAnnotation_1.38.0                 Rsamtools_2.8.0                          Biostrings_2.60.2                       
 [4] XVector_0.32.0                           SummarizedExperiment_1.22.0              MatrixGenerics_1.4.3                    
 [7] matrixStats_0.61.0                       TxDb.Hsapiens.UCSC.hg38.knownGene_3.13.0 GenomicFeatures_1.44.2                  
[10] AnnotationDbi_1.54.1                     Biobase_2.52.0                           GenomicRanges_1.44.0                    
[13] GenomeInfoDb_1.28.4                      IRanges_2.26.0                           S4Vectors_0.30.2                        
[16] BiocGenerics_0.38.0                     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3             lattice_0.20-45          prettyunits_1.1.1        png_0.1-7                assertthat_0.2.1        
 [6] digest_0.6.29            utf8_1.2.2               BiocFileCache_2.0.0      R6_2.5.1                 RSQLite_2.2.11          
[11] httr_1.4.2               pillar_1.7.0             zlibbioc_1.38.0          rlang_1.0.2              progress_1.2.2          
[16] curl_4.3.2               rstudioapi_0.13          blob_1.2.2               Matrix_1.4-1             BiocParallel_1.26.2     
[21] stringr_1.4.0            RCurl_1.98-1.6           bit_4.0.4                biomaRt_2.48.3           DelayedArray_0.18.0     
[26] compiler_4.1.2           rtracklayer_1.52.1       pkgconfig_2.0.3          tidyselect_1.1.2         KEGGREST_1.32.0         
[31] tibble_3.1.6             GenomeInfoDbData_1.2.6   XML_3.99-0.9             fansi_1.0.3              crayon_1.5.1            
[36] dplyr_1.0.8              dbplyr_2.1.1             GenomicAlignments_1.28.0 bitops_1.0-7             rappdirs_0.3.3          
[41] grid_4.1.2               lifecycle_1.0.1          DBI_1.1.2                magrittr_2.0.2           cli_3.2.0               
[46] stringi_1.7.6            cachem_1.0.6             xml2_1.3.3               ellipsis_0.3.2           filelock_1.0.2          
[51] generics_0.1.2           vctrs_0.3.8              rjson_0.2.21             restfulr_0.0.13          tools_4.1.2             
[56] bit64_4.0.5              BSgenome_1.60.0          glue_1.6.2               purrr_0.3.4              hms_1.1.1               
[61] fastmap_1.1.0            yaml_2.3.5               BiocManager_1.30.16      memoise_2.0.1            BiocIO_1.2.0 

Thank you in advance!

VariantAnnotation R • 541 views
ADD COMMENT
1
Entering edit mode

I am getting the correct gene, using: R version 4.0.2, TxDb.Hsapiens.UCSC.hg38.knownGene_3.10.0, VariantAnnotation_1.34.0

      seqnames    ranges strand | LOCATION  LOCSTART    LOCEND   QUERYID        TXID         CDSID      GENEID
         <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <integer> <character> <IntegerList> <character>
  [1]     chr1  10736809      - |   intron     59598     59598         1       11156                     54897
  [2]     chr1  10736809      - |   intron     59598     59598         1       11159                     54897

I suggest updating your R and Bioconductor packages, and try again.

ADD REPLY

Login before adding your answer.

Traffic: 1624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6