I am trying to analyze data for RRBS (reduced representation bisulfite sequencing) and want to use BWA-METH for alignment. I also ran Bismark, but bismark output only shows mapping efficiency of 33.8% while BWA-METH shows 99.8% mapping efficiency (paired-end). So, I converted .sam to .bam with samtools and tried to run MehtylDackel. However, when I tried to run MethylDackel on the HPC server I got the following error.
[E::idx_find_and_load] Could not retrieve index file for 'sample.bam' Couldn't load the index for sample.bam, will attempt to build it. [E::hts_idx_push] Unsorted positions on sequence #25: 19733692 followed by 19733689 [E::sam_index] Read 'VH00301:6:AAAHKFLHV:1:1102:43283:1057' with ref_name='chr19', ref_length=58617616, flags=147, pos=19733689 cannot be indexed Couldn't build the index for sample.bam! File corrupted?
The script I submitted was
#!/bin/bash #SBATCH --partition=batch # Partition name #SBATCH --job-name=MethyDackel # Job Name #SBATCH --ntasks=1 # Run a single task #SBATCH --time=7-00:00:00 # Time limit day-hrs:min:sec #SBATCH --mem=64G # Memory per node (4GB); by default using M as unit #SBATCH --cpus-per-task=4 # Number of CPU cores per task #SBATCH --output=%x.%j.out # Standard output log #SBATCH --error=%x.%j.err # Standard error log #SBATCH --export=NONE # Do not export any user's expliciti environment varialbes to compute node cd $SLURM_SUBMIT_DIR # Change directory to job submission director module load MethylDackel/0.6.0-foss-2019b MethylDackel extract --keepDupes /DNAfa/hg38.fa /workDir/sample.bam
I tried to ran samtools to build index for the bam file and got
[E::hts_idx_push] Unsorted positions on sequence #25: 19733692 followed by 19733689 [E::sam_index] Read 'VH00301:6:AAAHKFLHV:1:1102:43283:1057' with ref_name='chr19', ref_length=58617616, flags=147, pos=19733689 cannot be indexed samtools index: failed to create index for "sample.bam": No such file or directory
At this point, I don't even know what I am doing wrong. Did the mapping fail? Should I put my trimmed fa file and/or reference genome in the same workdir (folder) as .sam/.bam file?