Question: Pysam cannot find index
1
gravatar for eric.kern13
4.3 years ago by
eric.kern13150
United States
eric.kern13150 wrote:

Hi Biostars,
This is my first question on the site, so thanks for all the help I've gotten from lurking, and thanks for the help I'm about to get. I'll try to share my research and follow all the etiquette. Can anyone help me with the following situation?

I'm working with the python packages Pysam and Pysamstats. I am using BAM and FASTA files from the 1000genomes project. I would like to provide a 50 base-pair interval on chromosome 22 in HG18 coordinates and in turn receive the average depth of coverage from that interval. My attempts so far have opened the files first, then fed them into pysamstats.stat_coverage_binned, like this.

    fasta_file = pysam.Fastafile(<fasta_ftp_path>)
    bamfile = pysam.AlignmentFile(<alignment_ftp_path>)
    pysamstats.stat_coverage_binned(bamfile, fasta_file, <chrom>, <start_pos>, <end_pos>). 

From the first line, I encounter this:

    ValueError: could not locate index file

The same error appears on the third line if I try to feed in the ftp addresses rather than file objects. My understanding is that these packages look for index files by appending ".bai" or ".fai" to the filename I feed them, which should work (you can look at the files available further down my post). This is supported when I RTFS at line 142 of this file, which has my error message. I can't quite read cython, but given "filename.fa" I think it says to look for "filename.fa.fai". Here's TFS:

    https://github.com/pysam-developers/pysam/blob/master/pysam/cfaidx.pyx 

FASTA-associated files:

    ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome

BAM-associated files:

    ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/other_exome_alignments/HG00096/exome_alignment/

A minimal example to generate the error (sorry about the formatting; can't figure out how this will render):

    import os
    import pysam
    import pysamstats
    alignment_ftp_path = "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/" \
                         "other_exome_alignments/HG00096/exome_alignment/"
    alignment_filename = "HG00096.mapped.illumina.mosaik.GBR.exome.20111114.bam"
    alignment_ftp_path = os.path.join(alignment_ftp_path, alignment_filename)
    fasta_ftp_path  = "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/" \
"GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa"

    fasta_file = pysam.Fastafile(fasta_ftp_path)
    bamfile2 = pysam.AlignmentFile(filename = alignment_ftp_path)
    ex_bin_read_depth = pysamstats.stat_coverage_binned(bamfile2, fasta_ftp_path,\
                                                        chrom="22",              \
                                                        start=14481425,          \
                                                        end=14481475)

 

 

pysam samtools • 2.8k views
ADD COMMENTlink written 4.3 years ago by eric.kern13150
1

I think the problem stems from the fact that you're trying to access FTP links like local files. Try downloading the data first.

If space is an issue, you could try linking the FTP data to a named pipe, though that's a little more tricky.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by Dan D6.9k

That worked; thanks. Is there an "accept" button like on SE?

I notice that os.path.isfile(<address>) is false even for correct ftp addresses.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by eric.kern13150

os.path.isfile returns True for files that exist on your current filesystem. It does not work for remote URLs.

ADD REPLYlink written 4.3 years ago by Kamil2.0k

There is an accept button on Biostars, but only for actual answers. I was guessing at the problem so I posted a comment instead. :)

ADD REPLYlink written 4.3 years ago by Dan D6.9k

I think you found a bug in pysam. I reported it here. You'll probably have to download the FASTA file for now, as Dan D says. The remote BAM file should work, though.

ADD REPLYlink written 4.3 years ago by Kamil2.0k

Indeed, the remote BAM works. Thanks.

ADD REPLYlink written 4.3 years ago by eric.kern13150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1306 users visited in the last hour