Tool: pyGeno 1.2: Python package for Personalized Genomics and Proteomics
8
gravatar for Tariq Daouda
2.5 years ago by
Tariq Daouda190
IRIC | Institute for Research in Immunology and Cancer
Tariq Daouda190 wrote:

pyGeno 1.2 is now available: http://pyGeno.iric.ca.

pyGeno is a python package that allows you to easily combine Reference Genomes and sets of Polymorphisms together to create personalized genomes. Personalized genomes can be used to work directly on the genomes of you subjects and be translated into Personalized Proteomes, 

Multiple sets of of polymorphisms can also be combined together to leverage their independent benefits ex: 

  • RNA-seq and DNA-seq for the same individual to improve the coverage
  • RNA-seq of an individual + dbSNP for validation
  • Combine the results of RNA-seq of several individual to create a genome only containing the common polymorphisms

pyGeno is also a personal database that give you access to all the information provided by Ensembl (for both Reference and Personalized Genomes) without the need of queries to distant HTTP APIs. Allowing for much faster and reliable genome wide study pipelines.

It also comes with parsers for several file types and various other useful tools.

ensembl rna-seq snp dbsnp python tool • 1.0k views
ADD COMMENTlink modified 10 weeks ago by haoye.ecust0 • written 2.5 years ago by Tariq Daouda190

This sounded like a  cool tool but I was unable to run it at all. Your installation fails on my machine right away

https://github.com/tariqdaouda/pyGeno/issues/2

also I strongly recommend disconnecting the data download from the python code - python is not all that well suited to downloading massive datasets - or at least provide alternatives via  http rsync or bittorrent sources for the download of the data.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Istvan Albert ♦♦ 74k

Thank you for bringing that up, the pip version was lagging behind. It is fixed now but I recommend the git version.

I had a look at the issue, the problem was that the folders containing the datawraps were not included in the pip version. But the rest of the installation went fine and you can import datawraps using the importation module.

I would nonetheless recommend that you either update pyGeno to the latest pip version to get the missing datawraps:

pip install --upgrade pyGeno

Or switch to the git version to get the latest bleeding edge updates.

Python is used for downloads to avoid dependencies to third parties softwares, in order to simplify the installation as much as possible. That is also the reason why pyGeno comes with a set of parsers.

The datawraps shipped with the bootstrap module only contain links to data made available by third parties such as Ensembl and dbSNP. But you also have the possibility to create your own datawraps by downloading the files independently and including them into the tar.gz archive, as explained here:

http://pygeno.iric.ca/importation.html

and here:

https://github.com/tariqdaouda/pyGeno/wiki/How-to-create-a-pyGeno-datawrap-to-import-your-data

That being said, pyGeno has been tested many times with both Ensembl and dbSNP, and we never suffered any problem due to the initial downloads. 

Thanks,

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Tariq Daouda190

Thanks for the fix. I like the concepts behind this pacakge and want to test it out in practice. More feedback to follow.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Istvan Albert ♦♦ 74k

Thank you, your feedback is greatly appreciated.

ADD REPLYlink written 2.5 years ago by Tariq Daouda190
0
gravatar for haoye.ecust
10 weeks ago by
United States
haoye.ecust0 wrote:

Hi Tariq,

I have an quession on importing genome data in PyGeno. Since, The human reference sequence data was downloaded locally in HPC. The manifest.ini file was modified as following. It report a dug saying "sqlite3.OperationalError: disk I/O error",when I import the genome. However, the free disk space is enough in the HPC. Would you tell me how to fix such issue?

The platform I used is Python-2.7.13/PyGeno1.3.1 CentOS Linux release 7.3.1611 (Core)

Thank you very much.

Hao

manifest.ini

[package_infos] description = Human reference genome maintainer = Tariq Daouda maintainer_contact = tariq.daouda@umontreal.ca version = 1

[genome] species = human name = GRCh37.75 source = http://useast.ensembl.org/info/data/ftp/index.html

[chromosome_files] 10 = Homo_sapiens.GRCh37.75.dna.chromosome.10.fa.gz 11 = Homo_sapiens.GRCh37.75.dna.chromosome.11.fa.gz 12 = Homo_sapiens.GRCh37.75.dna.chromosome.12.fa.gz 13 = Homo_sapiens.GRCh37.75.dna.chromosome.13.fa.gz 14 = Homo_sapiens.GRCh37.75.dna.chromosome.14.fa.gz 15 = Homo_sapiens.GRCh37.75.dna.chromosome.15.fa.gz 16 = Homo_sapiens.GRCh37.75.dna.chromosome.16.fa.gz 17 = Homo_sapiens.GRCh37.75.dna.chromosome.17.fa.gz 18 = Homo_sapiens.GRCh37.75.dna.chromosome.18.fa.gz 19 = Homo_sapiens.GRCh37.75.dna.chromosome.19.fa.gz 1 = Homo_sapiens.GRCh37.75.dna.chromosome.1.fa.gz 20 = Homo_sapiens.GRCh37.75.dna.chromosome.20.fa.gz 21 = Homo_sapiens.GRCh37.75.dna.chromosome.21.fa.gz 22 = Homo_sapiens.GRCh37.75.dna.chromosome.22.fa.gz 2 = Homo_sapiens.GRCh37.75.dna.chromosome.2.fa.gz 3 = Homo_sapiens.GRCh37.75.dna.chromosome.3.fa.gz 4 = Homo_sapiens.GRCh37.75.dna.chromosome.4.fa.gz 5 = Homo_sapiens.GRCh37.75.dna.chromosome.5.fa.gz 6 = Homo_sapiens.GRCh37.75.dna.chromosome.6.fa.gz 7 = Homo_sapiens.GRCh37.75.dna.chromosome.7.fa.gz 8 = Homo_sapiens.GRCh37.75.dna.chromosome.8.fa.gz 9 = Homo_sapiens.GRCh37.75.dna.chromosome.9.fa.gz MT = Homo_sapiens.GRCh37.75.dna.chromosome.MT.fa.gz X = Homo_sapiens.GRCh37.75.dna.chromosome.X.fa.gz Y = Homo_sapiens.GRCh37.75.dna.chromosome.Y.fa.gz

[gene_set] gtf = Homo_sapiens.GRCh37.75.gtf.gz

bug

import pyGeno.bootstrap as B B.importGenome("Human.GRCh37.75/") Importing genome package: /home/yeh/program/Python-2.7.13/lib/python2.7/site-pac kages/pyGeno/bootstrap_data/genomes/Human.GRCh37.75/... (This may take a while) Importing: description: Human reference genome maintainer: Tariq Daouda maintainer_contact: tariq.daouda@umontreal.ca version: 1 Genome: species: human name: GRCh37.75 source: http://useast.ensembl.org/info/data/ftp/index.html ... Importing gene set infos from /home/yeh/program/Python-2.7.13/lib/python2.7/site -packages/pyGeno/bootstrap_data/genomes/Human.GRCh37.75/Homo_sapiens.GRCh37.75.g tf.gz... Backuping indexes... Droping all your indexes, (don't worry i'll restore them later)... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/boots trap.py", line 105, in importGenome PG.importGenome(path, batchSize) File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/impor tation/Genomes.py", line 179, in importGenome chros = _importGenomeObjects(gtfFile, chromosomeSet, genome, batchSize, verb ose) File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/impor tation/Genomes.py", line 257, in _importGenomeObjects Transcript_Raba.flushIndexes() File "build/bdist.linux-x86_64/egg/rabaDB/Raba.py", line 547, in flushIndexes File "build/bdist.linux-x86_64/egg/rabaDB/rabaSetup.py", line 148, in dropInde xByName File "build/bdist.linux-x86_64/egg/rabaDB/rabaSetup.py", line 224, in execute sqlite3.OperationalError: disk I/O error

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by haoye.ecust0
0
gravatar for haoye.ecust
10 weeks ago by
United States
haoye.ecust0 wrote:

When I use the following command, it error was listed as following. "sqlite3.OperationalError: database or disk is full"

from pyGeno.importation.Genomes import * importGenome('/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/bootstrap_data/genomes/Human.GRCh37.75/') Importing genome package: /home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/bootstrap_data/genomes/Human.GRCh37.75/... (This may take a while) Importing: description: Human reference genome maintainer: Tariq Daouda \ - Chr\ progress[~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-:>] 100.00% (2828313/2828312) runtime: 21.471min, remaining: -0.000sc, avg: 0.000sc | progress[~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-:>] 104.00% (26/25) runtime: 0.003sc, remaining: -0.000sc, avg: 0.000sc saving genome object... restoring core indexes... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/importation/Genomes.py", line 179, in importGenome chros = _importGenomeObjects(gtfFile, chromosomeSet, genome, batchSize, verbose) File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/importation/Genomes.py", line 419, in _importGenomeObjects Transcript.ensureGlobalIndex('exons') File "/home/yeh/program/Python-2.7.13/lib/python2.7/site-packages/pyGeno/pyGenoObjectBases.py", line 223, in ensureGlobalIndex cls._wrapped_class.ensureIndex(fields) File "build/bdist.linux-x86_64/egg/rabaDB/Raba.py", line 510, in ensureIndex File "build/bdist.linux-x86_64/egg/rabaDB/rabaSetup.py", line 138, in createIndex File "build/bdist.linux-x86_64/egg/rabaDB/rabaSetup.py", line 224, in execute sqlite3.OperationalError: database or disk is full

ADD COMMENTlink written 10 weeks ago by haoye.ecust0
0
gravatar for Tariq Daouda
10 weeks ago by
Tariq Daouda190
IRIC | Institute for Research in Immunology and Cancer
Tariq Daouda190 wrote:

Hi Hao,

It was going well until it stopped at the indexing of exons. This is by far the biggest index that is automatically created. Unfortunately, I can't tell you what caused the error since I don't have admin access to you computer. I can however give you some tips.

You need at least 2GB of free space to store one human reference genome. This is without counting the temporary space that sqlite takes while running.

You can find how to redirect/increase the temporary space used by sqlite here: https://stackoverflow.com/questions/23249843/sqlite3-vacuum-database-or-disk-is-full

Another possibility is that pyGeno's database has been somehow corrupted. If that is the case you can erase the .pyGeno folder in your home directory and start a new importation.

Best,

ADD COMMENTlink written 10 weeks ago by Tariq Daouda190
0
gravatar for haoye.ecust
10 weeks ago by
United States
haoye.ecust0 wrote:

Thank you very much, Tariq. The bug was properly fixed, since I redirect the temporary file folder.

Hao

ADD COMMENTlink written 10 weeks ago by haoye.ecust0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1078 users visited in the last hour