Question

BWA_MEM error "cannot find file.fa.pac" but input is file.fa

0

Entering edit mode

4.5 years ago

jdk48542 • 0

I have four total fastq files (two bulks, with forward and reverse reads for each) containing Illumina reads which I am trying to align to a reference file. The reference file has file extension ".pa" but when the script fails, the error that is returned shows that the algorithm is looking for and failing to find that same file, but with extension ".pac".

Looking around on this site, I can tell that .pac files are index files that BWA creates and uses internally. So then why would it lose track of, or fail to create, this file? What kinds of things should I do to ensure that my files are appropriate for this function call? I already ran them through fastqc and read quality looks fine.

This is being conducted on a remote server via SSH, so read/write permission issues are possible, but the error log should be documenting such things if so, and it is not. A colleague has performed alignment using the same reference genome kept at the same directory, with fewer permissions, and my script is modeled closely after his, so I can only think there must be some issue with my inputs.

Below is the function call with variable names slightly anonymized:

time bwa mem -t 8 /work/lab/reference_genomes/ref_v2.fa /scratch/user/qtlSeq/eReadsForward.fastq.gz  /scratch/user/qtlSeq/eReadsReverse.fastq.gz > eAligned.sam

And the error:

[bns_restore_core] fail to open file '/work/lab/reference_genomes/ref_v2.fa.pac' : No such file or directory

General advice about how to interpret this error would be appreciated.

alignment • 2.9k views

ADD COMMENT • link updated 2.2 years ago by Karly • 0 • written 4.5 years ago by jdk48542 • 0

2

Entering edit mode

Did you create indexes for ref_v2.fa for use with BWA using bwa index? Can you show us output of ls -lh /work/lab/reference_genomes/ref_v2.fa*?

ADD REPLY • link 4.5 years ago by GenoMax 141k

0

Entering edit mode

Ahhhh, I see now that my colleague's pipeline has some cart-before-horse going on. He created the indexes for the reference genome in advance, referring to the files where they are needed in the pipeline, but placed the creation of the indexes downstream from their first use on the assumption that they would be retained for future users. They must have been cleaned out by a clumsy mv command or suchlike. I will reproduce the index file and see if that resolves the issue.

RESULT:

Function call:

bwa index /work/lab/reference_genomes/ref_v2.fa

Error message:

[bwa_index] Pack FASTA... [bns_fasta2bntseq] fail to open file '/work/lab/reference_genomes/ref_v2.fa.pac' : Permission denied

This makes me suspect even more that I am just having permission issues with the host server.

FYI, /work/lab/reference_genomes contains the following files (notably, none with the extension .fa.pac):

ref_v2blastdb.nhr  ref_v2.fa.bwt  ref_v2.fa.nsq  ref_v2.fa.sa
ref_v2blastdb.nin  ref_v2.fa.fai  ref_v2.fa.phr  ref_v2.nhr
ref_v2blastdb.nsq  ref_v2.fa.nhr  ref_v2.fa.pin  ref_v2.nin
ref_v2.dict        ref_v2.fa.nin  ref_v2.fa.pog  ref_v2.nsq
ref_v2.fa          ref_v2.fa.nog  ref_v2.fa.psd
ref_v2.fa.amb      ref_v2.fa.nsd  ref_v2.fa.psi
ref_v2.fa.ann      ref_v2.fa.nsi  ref_v2.fa.psq

The quick solution here looks like getting my colleague to reproduce the .pac file for me, and the long-term solution is to talk to the server admins and ensure I have adequate permissions in /work/lab/reference_genomes. Thanks very much for the quick responses. Additional feedback is still welcome.

ADD REPLY • link updated 4.5 years ago by GenoMax 141k • written 4.5 years ago by jdk48542 • 0

1

Entering edit mode

It looks like someone mixed indexes for BLAST+ and BWA in the same directory. Not a great idea. Your account does not seem to have have write permissions to /work/lab/reference_genomes/?

I would suggest that you create the indexes again in a directory where you have write permissions.

ADD REPLY • link 4.5 years ago by GenoMax 141k

0

Entering edit mode

I've recently suffered a lot from indexing a customized human genome (mask certain regions). I found a similar issue here:

[bwa_index] Construct SA from BWT and Occ... [bwt_restore_bwt] fail to open file '/paedyl01/disk1/yangyxt/simulation_data/ucsc.hg19.ucsc.hg19.sub.NCF1.masked.fasta.bwt' : No such file or directory

But I use ls -lh and confirmed the existence of the .bwt file.

For the issue here, lack of memory is not likely to be the reason since I already have 120 GB allocated to this shell(by PBS pro) and only one bwa index job is running. BTW, the command I carried out is bwa index -a bwtsw <in.fasta>

Furthermore, the /usr/bin/time gives memory profiling, and the peak RAM usage seems to be around 4596492 kb(4.4Gb) only.

6292.08user 57.20system 1:47:00elapsed 98%CPU (0avgtext+0avgdata 4596492maxresident)k
0inputs+13786480outputs (0major+83721376minor)pagefaults 0swaps

Anyone knows what might be the reason causing this error?

ADD REPLY • link 3.1 years ago by u3005579 • 0

1

Entering edit mode

Do you have read/write permissions to the following location?

/paedyl01/disk1/yangyxt/simulation_data/

Is that directory available to the worker node where this job is run (you are using PBS Pro so a job scheduler)?

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

This ended up being my issue. I was using docker and the docker user did not have permission to write to my mounted dir. Thanks.

ADD REPLY • link 2.2 years ago by Karly • 0