Entering edit mode
2.4 years ago
melissachua90
▴
70
I want to use BWA to index my paired-end dataset.
First, I indexed the reference genome:
bwa index -p refseq -a is refseq.fa
Next, I use bwa-mem
:
bwa mem -aHMP -t 20 refseq.fa corrected_data.tar.gz
I also tried:
cd corrected_data/
for f in `ls -1 *_1.fq.gz | sed 's/_1.fq.gz//’`;
do bwa mem -aHMP -t 20 refseq.fa $f\_1.fq.gz $f\_2.fq.gz;
done
Traceback:
[E::bwa_idx_load_from_disk] fail to locate the index files
I think your problem is your prefix
-p refseq
. I imagine you should usebwa mem -aHMP -t 20 refseq corrected_data.tar.gz
instead ofrefseq.fa
Tarballs with bwa-mem, where did you get that from? I doubt that works, or is this some edge case documentation I missed all these years?
Just tried that, it even works, but each individual file in the tarball is considered as a single-end technical replicate so for paired-end data one would need two tarballs?! My advise would be to use the standard syntax of feeding the individual fastq files.
you should use a
makefile
or better, workflow manager like snakemake or nextflow....Thank you for the suggestion! I'm new to nextflow (and bash scripts) but this is my attempt.
Call it:
There are plenty of bugs and I'm still attempting to correct it. But if you can take a look at the script, that would be excellent!