Question

Error :creating a custom BS genome pakage for Olea Europeae

1

Entering edit mode

7.5 years ago

oussama.badad ▴ 10

I am analysing a set of MeDip-seq Data Using the MeQa pipeline. my pipeline crushes because there is no bioconductor BSgenome Package available for Olea Europeae . i am creating my custom BSgenome package following the Bioconductor Manual .

here is my Package seed File

BSgenome.Oeuropaea.IOGC.v1_seed-file.txt.
Package: BSgenome.Oeuropaea.IOGC.v1.
Title: Full genome sequences for Olea Europaea var. sylvestris (IOGC version 1).
Description: Full genome sequences for Olea Europaea var. sylvestris (olive) as provided by IOGC (v1, 2016) and stored in olea europaea genome browser.
Version: 1.0.
organism: Olea Europaea var.sylvestris.
common_name: Olive.
provider: IOGC.
provider_version: v1.
release_date: 2016.
release_name: OE1.0.
source_url: http://h3abionet.fso.ump.ma/cgi-bin/gb2/gbrowse/olea_europea/.
organism_biocview: Olea_europaea.
BSgenomeObjname: Oeuropaea.
seqs_srcdir:/opt/exp_soft/magrid/MeQA-1.0.0/oussama/Olea-Europaea_BS_genome_package.
seqnames: paste("chr", c(1:23, "Un", paste(c(1:23, "Un"))), sep="").

when i run the

> library(BSgenome)
> forgeBSgenomeDataPkg("path/to/my/seed")

the files are loaded but then i get the following error

Loading 'chr23' sequence from FASTA file '/opt/exp_soft/magrid/MeQA-1.0.0/a/Olea-Europaea_BS_genome_package/chr23.fa' ... DONE
Loading 'chrUn' sequence from FASTA file '/opt/exp_soft/magrid/MeQA-1.0.0/a/Olea-Europaea_BS_genome_package/chrUn.fa' ... DONE
**Error in XVector:::new_XVectorList_from_list_of_XVector(tmp_class, x) :
  all elements in 'x' must be DNAString objects**

do i need to convert my fasta files into DNAstring objects ? if yes how ? any help is very welcome

Cheers

Oussama

R alignment • 3.1k views

ADD COMMENT • link updated 4.5 years ago by kashiff007 ★ 1.9k • written 7.5 years ago by oussama.badad ▴ 10

score 1 · Answer 1 · 2016-10-04

1

Entering edit mode

7.5 years ago

lakhujanivijay 5.8k

Yes, you need to and here is an example on how to do it:

source("http://bioconductor.org/biocLite.R")
biocLite("Biostrings")
require(Biostrings)
dnastring = DNAString("TTGAAA-CTC-N")

ADD COMMENT • link 7.5 years ago by lakhujanivijay 5.8k

1

Entering edit mode

Hi

my forgeBSgenomeDataPkg("seed.file") executed without error, but while running R CMD check BSgenome.Aiptasia.CC7 I am facing following error:

* using log directory ‘/home/nawazk/KAUST_Projects/Aiptasia_epigenetic_analysis/results/UMR_identification/Aiptasia_CC7/BSgenome.Aiptasia.CC7.Rcheck’
* using R version 3.6.1 (2019-07-05)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* checking for file ‘BSgenome.Aiptasia.CC7/DESCRIPTION’ ... OK
* this is package ‘BSgenome.Aiptasia.CC7’ version ‘1.0.’
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘BSgenome.Aiptasia.CC7’ can be installed ... ERROR
Installation failed.
See ‘/home/nawazk/KAUST_Projects/Aiptasia_epigenetic_analysis/results/UMR_identification/Aiptasia_CC7/BSgenome.Aiptasia.CC7.Rcheck/00install.out’ for details.
* DONE

Status: 1 ERROR
See
  ‘/home/nawazk/KAUST_Projects/Aiptasia_epigenetic_analysis/results/UMR_identification/Aiptasia_CC7/BSgenome.Aiptasia.CC7.Rcheck/00check.log’
for details.

ADD REPLY • link 4.5 years ago by kashiff007 ★ 1.9k

0

Entering edit mode

Hello Vijay lakhujani thank you so much for your valuable response .

ADD REPLY • link 7.5 years ago by oussama.badad ▴ 10

0

Entering edit mode

Please up vote the response if it really helped you. It will help others who might face the same issue.

ADD REPLY • link 7.5 years ago by lakhujanivijay 5.8k

0

Entering edit mode

can you please provide more details ??

ADD REPLY • link 7.5 years ago by oussama.badad ▴ 10

0

Entering edit mode

Hi,

I get the same error as oussama badad even after I followed your suggestion of requiring the use of Biostrings... Do you have a further suggestion or idea of what might be wrong?

Thank you in advance!
R

ADD REPLY • link 7.5 years ago by ROka ▴ 10

0

Entering edit mode

Can you please share what you tried? What I gave was a generic example; please share the exact commands you tried.

ADD REPLY • link 7.5 years ago by lakhujanivijay 5.8k

1

Entering edit mode

Here is my seed file:

Package: BSgenome.Zmays.EnsemblPlants.AGPv4r32
Title: Zea mays (EnsemblPlants AGPv4 release 32)
Description: Zea mays full genome as provided by EnsemblPlants (AGPv4, release 32)
Version: 4.32
organism: Zea mays
common_name: maize
provider: EnsemblPlants
provider_version: 4.32
release_date: Aug. 2016
release_name: AGPv4
source_url: ftp://ftp.ensemblgenomes.org/pub/release-32/plants/fasta/zea_mays/dna/
organism_biocview: Zea_mays
BSgenomeObjname: Zmays
seqs_srcdir: ~/Documents/ref/AGPv4
seqfiles_prefix: Zea_mays.AGPv4.dna.chromosome.
seqfiles_suffix: .fa
seqnames: paste(c(1:10, paste(c(1:10), sep="")), sep="")

Then, I run:

library(BSgenome)
forgeBSgenomeDataPkg("path/to/my/seed")

...gives an error message of:

...
Loading '9' sequence from FASTA file '~/Documents/ref/AGPv4/Zea_mays.AGPv4.dna.chromosome.9.fa' ... DONE
Loading '10' sequence from FASTA file '~/Documents/ref/AGPv4/Zea_mays.AGPv4.dna.chromosome.10.fa' ... DONE
Error in XVector:::new_XVectorList_from_list_of_XVector(tmp_class, x) : 
  all elements in 'x' must be DNAString objects

So as you suggested, I tried:

source("http://bioconductor.org/biocLite.R")
biocLite("Biostrings")
require(Biostrings)

...which still gives me the same error message:

...
Loading '9' sequence from FASTA file '~/Documents/ref/AGPv4/Zea_mays.AGPv4.dna.chromosome.9.fa' ... DONE
Loading '10' sequence from FASTA file '~/Documents/ref/AGPv4/Zea_mays.AGPv4.dna.chromosome.10.fa' ... DONE
Error in XVector:::new_XVectorList_from_list_of_XVector(tmp_class, x) : 
  all elements in 'x' must be DNAString objects

Thanks again!

ADD REPLY • link 7.5 years ago by ROka ▴ 10

0

Entering edit mode

Here is a slight update on my situation.

I haven't really solved my problem yet, but I found a work around (namely: use-another-computer).
Apparently, if I use newer versions of the packages, I get into the error mentioned above. I also realised that it never saves seqlengths or compressed data. The versions were:

R version 3.2.2 (2015-08-14) -- "Fire Safety"
BSgenome_1.38.0
Biostrings_2.38.4

But if I use:

R version 3.0.3 (2014-03-06) -- "Warm Puppy"
BSgenome_1.30.0
Biostrings_2.30.1

Then, it runs without an error message.
However, when I check if I can install the final package with

R CMD check BSgenome.Zmays.EnsemblPlants.AGPv4r32_4.32.tar.gz

Now, I get

* installing *source* package 'BSgenome.Zmays.EnsemblPlants.AGPv4r32' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error : .onLoad failed in loadNamespace() for 'BSgenome.Zmays.EnsemblPlants.AGPv4r32', details:
  call: .normargSeqnames(seqnames)
  error: supplied 'seqnames' cannot contain duplicated sequence names
Error: loading failed
Execution halted

I would appreciate if someone can point out what I need to change or what I should try.
Thanks!

ADD REPLY • link 7.5 years ago by ROka ▴ 10

0

Entering edit mode

Hello again, I received a reply on Bioconductor, and changing the seqname line in the seed file to:

seqnames: paste0(1:10)

solved my problem.

ADD REPLY • link 7.5 years ago by ROka ▴ 10