Question: Create Dataset from reads -- IndexError: list index out of range
0
gravatar for yusuf.sgs
9 months ago by
yusuf.sgs0
yusuf.sgs0 wrote:

Hi,

I'm currently trying to create a dataset from FASTA reads for Kover using the following command:

kover dataset create from-reads --genomic-data where_read.tsv --phenotype-description "Rifampicin resistance" --phenotype-metadata pheno_read.tsv --output test --progress

where:

  • where_read.tsv --> a tsv file containing the fasta file name ending in .fa, and the pathname to the folder in which the file is stored
  • pheno_read.tsv --> a tsv file containing the fasta file name ending in.fa followed by either "sensitive" or "resistant".

This seems to trigger an IndexError in the from_reads file - the output of the command reading:

Traceback (most recent call last):
  File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 1192, in <module>
    CommandLineInterface()
  File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 1150, in __init__
    getattr(self, args.command)()
  File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 1170, in dataset
    getattr(dataset_tool, args.command)()
  File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 250, in create
    getattr(creation_tool, args.datasource)()
  File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 224, in from_reads
    progress=args.progress)
  File "/home/yusuf/.local/lib/python2.7/site-packages/kover/dataset/create.py", line 488, in from_reads
    list_reads_dsk_output.append(join(temp_dir, basename(splitext(files[-1])[0]) + ".h5"))
IndexError: list index out of range

Any thoughts on what's causing it?

Thanks,

Yusuf

kover • 624 views
ADD COMMENTlink modified 9 months ago by Alexandre Drouin90 • written 9 months ago by yusuf.sgs0
0
gravatar for Alexandre Drouin
9 months ago by
Montreal, Canada
Alexandre Drouin90 wrote:

Hi there!

The "where_read.tsv" file should have the format GENOMEID{TAB}DIRECTORY_CONTAINING_READ_FILES. The first column is a unique identifier for each genome (not the name of the read file). The second column is the path to a directory that contains all read files for that specific genome. Similarly, the first column of the metadata file shouldn't be the fasta file name, but rather the genome identifier (should be a 1-1 correspondance between the metadata file and the "where_reads.tsv" file genome IDs.

Also, if you have a single fasta file per genome, can you try using kover dataset create from-contigs?

The doc for the data formats is here: http://aldro61.github.io/kover/doc_input_formats.html.

Let me know if this solves your issue!

Cheers, Alex

ADD COMMENTlink written 9 months ago by Alexandre Drouin90

Hi Alex,

Thanks for your previous post - I've managed to produce a "test" dataset by using "contigs" instead of "reads" and correcting the where_read file! However, I've now run into a different set of problems: The creation of the dataset produces an HDF5 output error (apologies for the image, If I were to include the text of the error message I'd go over the Biostars comment word limit): error https://imgur.com/a/oFHKxWE

The test file is capable of retrieving information (using kover dataset info) like genome-count and genome-ids. However, when asked for a kmer-count it produces the following error

  kover dataset info --dataset test --genome-count --kmer-count
    Genome count: 5

    K-mer count:
    Traceback (most recent call last):
      File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 1192, in <module>
        CommandLineInterface()
      File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 1150, in __init__
        getattr(self, args.command)()
      File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 1170, in dataset
        getattr(dataset_tool, args.command)()
      File "/home/yusuf/Documents/Bioinformatics/kover/bin/kover", line 321, in info
        print "K-mer count:", dataset.kmer_count
      File "/home/yusuf/.local/lib/python2.7/site-packages/kover/dataset/ds.py", line 79, in kmer_count
        return dataset["kmer_sequences"].shape[0]
      File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
      File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
      File "/home/yusuf/.local/lib/python2.7/site-packages/h5py/_hl/group.py", line 264, in __getitem__
        oid = h5o.openself.id, self._e(name), lapl=self._lapl)
      File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
      File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
      File "h5py/h5o.pyx", line 190, in h5py.h5o.open
    KeyError: "Unable to open object (object 'kmer_sequences' doesn't exist)"

Any ideas on how to fix it? I really do appreciate you taking the time out to help :)

Thanks,

Yusuf

ADD REPLYlink modified 9 months ago • written 9 months ago by yusuf.sgs0

Hi M having the same problem? Did you get any solution to the same.

ADD REPLYlink written 9 months ago by sharmatina18905940

Hi Yusuf, sorry for the delayed response.

It looks like the dataset file was not created correctly or is corrupted. I've seen this happen before when the DSK (the tool used for k-mer counting) was not compiled properly during the installation process.

  1. Can you show me the output of h5ls test? This will list the HDF5 datasets contained in the file. If there is no kmer_sequences dataset in that HDF5 file, then something went wrong in the dataset creation.

  2. If that dataset is not there, can you try reinstalling kover and sending me the install log (displayed in the terminal during installation)? Also, if you're familiar with Docker, you can try using our prebuilt image that contains a working installation of kover (https://hub.docker.com/r/aldro61/kover).

  3. Another way to figure out what's going on is to download the example dataset provided here and try running the kover dataset info commands. If it works on that dataset and not on yours, then there is an issue with your installation of Kover.

Keep me posted!

Cheers, Alex

ADD REPLYlink modified 8 months ago • written 8 months ago by Alexandre Drouin90

Thank you for your response. My problem is solved now. I had two genomes for which data was not there in the generated file, as soon as i removed those blank file, program runs well.

ADD REPLYlink written 8 months ago by sharmatina18905940

@yusuf.sgs, did this resolve your issue? You can also reach out by email if you need further assistance.

Cheers, Alex

ADD REPLYlink written 8 months ago by Alexandre Drouin90

Hi Alex,

Sorry for the late reply - I've been bogged down with exams. I've decided to use the docker version of Kover instead, though this is still presenting problems. I still get the same HDF5 errors when I create and run dataset info kmer-count on my dataset. The h5ls test gives me this output:

genome_identifiers Dataset {3} kmer_by_matrix_column Dataset {986} kmer_matrix Dataset {1, 986} phenotype Dataset {3} phenotype_tags Dataset {2}

with kmer_sequences absent from the output. The kover dataset info commands seem to work on the example dataset you provided though, and since I'm using the docker installation it can't be an issue with the installation. I've sent a link to a google drive folder containing the exact files I'm using to create the dataset - it also contains a .txt file called 'kover command' which details the exact command I'm running in the terminal https://drive.google.com/open?id=1-Dc1SkaSq_YypS1F8JEjlvsa_hTFlreE Thanks again for helping out!

Yusuf

ADD REPLYlink written 8 months ago by yusuf.sgs0

Hi I have created a dataset using contigs but having the same problem as mention above.Can you please tell us where we went wrong..

kover dataset info --dataset example.kover --kmers
Kmer sequences (fasta):
Traceback (most recent call last):
  File "/home/tina/kover/kover/bin/kover", line 1192, in <module>
    CommandLineInterface()
  File "/home/tina/kover/kover/bin/kover", line 1150, in __init__
    getattr(self, args.command)()
  File "/home/tina/kover/kover/bin/kover", line 1170, in dataset
    getattr(dataset_tool, args.command)()
  File "/home/tina/kover/kover/bin/kover", line 313, in info
    for i, k in enumerate(dataset.kmer_sequences):
  File "/home/tina/.local/lib/python2.7/site-packages/kover/dataset/ds.py", line 94, in kmer_sequences
    return dataset["kmer_sequences"]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/tina/.local/lib/python2.7/site-packages/h5py/_hl/group.py", line 264, in __getitem__
    oid = h5o.openself.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'kmer_sequences' doesn't exist)"
ADD REPLYlink written 9 months ago by sharmatina18905940
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1165 users visited in the last hour