Question: Forge a BSgenome data package
1
gravatar for josev.die
3.0 years ago by
josev.die20
josev.die20 wrote:

We work on the legume Chickpea which genome has been recently released but it does not have a BSgenome package so far. I though I'd try and make one.

Following the vignette I have placed some FASTA chromosomes in a folder. I have started with a small example, so I have used just the first two chromosomes. I have two FASTA files named “Ca_chr1.fa” and “Ca_chr2.fa” located in

/Users/Cicer/seqs/

Then I have created a seed file ("cicer") which it is also included into that folder. The seed file looks like this :

Package: BSgenome.Carietinum.NCBI.ca1
Title: Cicer arietinum (Chickpea) full genome (NCBI version ASM33114v1)
Description: Cicer arietinum (Chickpea) full genome as provided by NCBI (ASM33114v1, Jan. 2013) and stored in Biostrings objects.
Version: 1.0
organism: Cicer arietinum
species: Chickpea
provider: NCBI
provider_version: ASM33114v1
release_date: Jan. 2013
release_name: BGI-Shenzhen ASM33114v1
source_url: https://www.ncbi.nlm.nih.gov/assembly/GCF_000331145.1/#/def
organism_biocview: Cicer_arietinum
BSgenomeObjname: Carietinum
seqnames: paste("Ca_", paste("chr", c(1:2), sep=""), sep="")
seqs_srcdir: /Users/Cicer/seqs/

After running forgeBSgenomeDataPkg("path/to/cicer") i get this error:

Error in makeS4FromList("BSgenomeDataPkgSeed", x) : 
  some names in 'x' are not valid BSgenomeDataPkgSeed slots (species)
In addition: Warning message:
In readLines(infile, n = 25000L) :
  incomplete final line found on './seqs/cicer'

Any suggestion it is greatly appreciated. Thanks.

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
[1] BSgenome_1.38.0      rtracklayer_1.30.4   Biostrings_2.38.4    XVector_0.10.0      
[5] GenomicRanges_1.22.4 GenomeInfoDb_1.6.3   IRanges_2.4.8        S4Vectors_0.8.11    
[9] BiocGenerics_0.16.1 

loaded via a namespace (and not attached):
 [1] XML_3.98-1.5               Rsamtools_1.22.0           GenomicAlignments_1.6.3   
 [4] bitops_1.0-6               futile.options_1.0.0       zlibbioc_1.16.0           
 [7] futile.logger_1.4.3        lambda.r_1.1.9             BiocParallel_1.4.3        
[10] tools_3.2.3                Biobase_2.30.0             RCurl_1.95-4.8            
[13] SummarizedExperiment_1.0.2
seed file bsgenome • 1.2k views
ADD COMMENTlink modified 2.6 years ago • written 3.0 years ago by josev.die20
1
gravatar for t_pod
2.6 years ago by
t_pod20
t_pod20 wrote:

Hi, between "organism" and "provider" you should write "common_name" instead of "species". However, no idea for the warning message you get ("incomplete line"). I have the same issue right now. Please let me know if you fix that problem. cheers

ADD COMMENTlink written 2.6 years ago by t_pod20
0
gravatar for josev.die
2.6 years ago by
josev.die20
josev.die20 wrote:

Your answer is exactly right. It is caused because I did not follow exactly the vignette (latest updated). There is no 'species' field for the description file.

Concerning the incomplete line, I rebuilt the file and it worked. You likely have truncated your file in such a way that the final line is missing the expected line ending. Please take a look at this helpful comments.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by josev.die20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 797 users visited in the last hour