Question: Forge a BSgenome data package
7 months ago by
josev.die0 wrote:

We work on the legume Chickpea which genome has been recently released but it does not have a BSgenome package so far. I though I'd try and make one.

Following the vignette I have placed some FASTA chromosomes in a folder. I have started with a small example, so I have used just the first two chromosomes. I have two FASTA files named “Ca_chr1.fa” and “Ca_chr2.fa” located in


Then I have created a seed file ("cicer") which it is also included into that folder. The seed file looks like this :

Package: BSgenome.Carietinum.NCBI.ca1
Title: Cicer arietinum (Chickpea) full genome (NCBI version ASM33114v1)
Description: Cicer arietinum (Chickpea) full genome as provided by NCBI (ASM33114v1, Jan. 2013) and stored in Biostrings objects.
Version: 1.0
organism: Cicer arietinum
species: Chickpea
provider: NCBI
provider_version: ASM33114v1
release_date: Jan. 2013
release_name: BGI-Shenzhen ASM33114v1
organism_biocview: Cicer_arietinum
BSgenomeObjname: Carietinum
seqnames: paste("Ca_", paste("chr", c(1:2), sep=""), sep="")
seqs_srcdir: /Users/Cicer/seqs/

After running forgeBSgenomeDataPkg("path/to/cicer") i get this error:

Error in makeS4FromList("BSgenomeDataPkgSeed", x) : 
  some names in 'x' are not valid BSgenomeDataPkgSeed slots (species)
In addition: Warning message:
In readLines(infile, n = 25000L) :
  incomplete final line found on './seqs/cicer'

Any suggestion it is greatly appreciated. Thanks.

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
[1] BSgenome_1.38.0      rtracklayer_1.30.4   Biostrings_2.38.4    XVector_0.10.0      
[5] GenomicRanges_1.22.4 GenomeInfoDb_1.6.3   IRanges_2.4.8        S4Vectors_0.8.11    
[9] BiocGenerics_0.16.1 

loaded via a namespace (and not attached):
 [1] XML_3.98-1.5               Rsamtools_1.22.0           GenomicAlignments_1.6.3   
 [4] bitops_1.0-6               futile.options_1.0.0       zlibbioc_1.16.0           
 [7] futile.logger_1.4.3        lambda.r_1.1.9             BiocParallel_1.4.3        
[10] tools_3.2.3                Biobase_2.30.0             RCurl_1.95-4.8            
[13] SummarizedExperiment_1.0.2
seed file bsgenome
written 7 months ago by josev.die0
10 weeks ago by
t_pod0 wrote:

Hi, between "organism" and "provider" you should write "common_name" instead of "species". However, no idea for the warning message you get ("incomplete line"). I have the same issue right now. Please let me know if you fix that problem. cheers

written 10 weeks ago by t_pod0
9 weeks ago by
josev.die0 wrote:

Your answer is exactly right. It is caused because I did not follow exactly the vignette (latest updated). There is no 'species' field for the description file.

Concerning the incomplete line, I rebuilt the file and it worked. You likely have truncated your file in such a way that the final line is missing the expected line ending. Please take a look at this helpful comments.

written 9 weeks ago by josev.die0
