Question

Making customized DB using customProDB (R package)

0

Entering edit mode

7.0 years ago

d88020 ▴ 20

I'm trying to make customized DB using customProDB. I have tried to splice junction analysis and made annotation files (splicemax, ids, etc.) thanks to tutorial. However, OutputNovelJun function made error. I copied and paseted code and message below.

> library(customProDB)

필요한 패키지를 로딩중입니다: IRanges

필요한 패키지를 로딩중입니다: BiocGenerics

필요한 패키지를 로딩중입니다: parallel

다음의 패키지를 부착합니다: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’: IQR, mad, xtabs

The following objects are masked from ‘package:base’: anyDuplicated, append, as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which, which.max, which.min

필요한 패키지를 로딩중입니다: S4Vectors

필요한 패키지를 로딩중입니다: stats4

다음의 패키지를 부착합니다: ‘S4Vectors’

The following objects are masked from ‘package:base’: colMeans, colSums, expand.grid, rowMeans, rowSums

필요한 패키지를 로딩중입니다: AnnotationDbi

필요한 패키지를 로딩중입니다: Biobase

Welcome to Bioconductor

Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.

필요한 패키지를 로딩중입니다: biomaRt

Gene2RefSeq <- read.csv("C:/Users/Admin/Desktop/CustomizedDB/Gene2RefSeq_Parsed.txt", sep = "\t")
pepfasta <- "C:/Users/Admin/Desktop/CustomizedDB/customProDB/pepfasta.fasta"
CDSfasta <- "C:/Users/Admin/Desktop/CustomizedDB/customProDB/CDSfasta.fasta"
setwd("C:/Users/Admin/Desktop/CustomizedDB/example")
annotation_path <- getwd()
transcript_ids <- as.matrix(Gene2RefSeq[ , 2])
transcript_ids <- transcript_ids[1:1000]
PrepareAnnotationRefseq(genome='hg19', CDSfasta, pepfasta, annotation_path, transcript_ids = transcript_ids, splice_matrix = TRUE)

Build TranscriptDB object (txdb.sqlite) ...

Download the refGene table ... OK

Download the hgFixed.refLink table ... OK

Extract the 'transcripts' data frame ... OK

Extract the 'splicings' data frame ... OK

Download and preprocess the 'chrominfo' data frame ... OK

Prepare the 'metadata' data frame ... OK

Make the TxDb object ... OK

done

Prepare gene/transcript/protein id mapping information (ids.RData) ... done

Prepare exon annotation information (exon_anno.RData) ... done

Prepare protein sequence (proseq.RData) ... done

Prepare protein coding sequence (procodingseq.RData)... done

Prepare exon splice information (splicemax.RData) ... done

There were 16 warnings (use warnings() to see them)

load("splicemax.RData")
load("ids.RData")
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

필요한 패키지를 로딩중입니다: GenomicFeatures

필요한 패키지를 로딩중입니다: GenomeInfoDb

필요한 패키지를 로딩중입니다: GenomicRanges

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
bedfile <- "C:/Users/Admin/Desktop/CorrectedSpliceJunctions_Colon.bed"
jun <- Bed2Range(bedfile,skip=1,covfilter=5)
junction_type <- JunctionType(jun, splicemax, txdb, ids)
outf_junc <- paste(getwd(), '/test_junc.fasta',sep='')
library('BSgenome.Hsapiens.UCSC.hg19')

필요한 패키지를 로딩중입니다: BSgenome

필요한 패키지를 로딩중입니다: Biostrings

필요한 패키지를 로딩중입니다: XVector

필요한 패키지를 로딩중입니다: rtracklayer

proteinseq <- read.csv2("C:/Users/Admin/Desktop/CustomizedDB/proteinseq.txt", sep = "\t")
OutputNovelJun <- OutputNovelJun(junction_type, Hsapiens, outf_junc, proteinseq)

Error in loadFUN(x, seqname, ranges) :

trying to load regions beyond the boundaries of non-circular sequence "chr17"

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Korean_Korea.949  LC_CTYPE=Korean_Korea.949    LC_MONETARY=Korean_Korea.949 LC_NUMERIC=C                
[5] LC_TIME=Korean_Korea.949    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0       BSgenome_1.42.0                        
 [3] rtracklayer_1.34.2                      Biostrings_2.42.1                      
 [5] XVector_0.14.1                          TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [7] GenomicFeatures_1.26.4                  GenomicRanges_1.26.4                   
 [9] GenomeInfoDb_1.10.3                     customProDB_1.14.1                     
[11] biomaRt_2.30.0                          AnnotationDbi_1.36.2                   
[13] Biobase_2.34.0                          IRanges_2.8.2                          
[15] S4Vectors_0.12.2                        BiocGenerics_0.20.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10               magrittr_1.5               zlibbioc_1.20.0            GenomicAlignments_1.10.1  
 [5] BiocParallel_1.8.2         lattice_0.20-35            plyr_1.8.4                 stringr_1.2.0             
 [9] tools_3.3.3                grid_3.3.3                 SummarizedExperiment_1.4.0 DBI_0.6-1                 
[13] digest_0.6.12              Matrix_1.2-8               bitops_1.0-6               RCurl_1.95-4.8            
[17] memoise_1.1.0              RSQLite_1.1-2              stringi_1.1.5              Rsamtools_1.26.2          
[21] XML_3.98-1.6               VariantAnnotation_1.20.3

R • 2.1k views

ADD COMMENT • link updated 6.9 years ago by Matt Chambers ▴ 20 • written 7.0 years ago by d88020 ▴ 20

score 0 · Answer 1 · 2017-05-26

You should be loading the proteinseq from your annotation directory, i.e.

load("C:/Users/Admin/Desktop/CustomizedDB/example/proseq.RData")

That will load the proteinseq annotations into your environment. Then you can pass that to OutputNovelJun. It's one of the trickier functions to get working so I wouldn't be surprised if you run into more issues with it after this one is fixed.