Samtools indexing problem during bwa mem alignment
0
0
Entering edit mode
9 months ago
GlouGlou • 0

Hello,

I'm trying to align some resequensing data to a reference genome. I made a bash loop faur this. I used bwa mem and in the same script I want to build the samtool index and other stuff. Here is my script :

#!/usr/bin/env bash
#SBATCH --time=150:00:00
#SBATCH --mem=150000

#SBATCH --output=alignment.log
#SBATCH --error=alignment.err

ASSEMBLY="/mnt/data/scaffold.fasta"

for SAMPLE in T1 T2 T3 T4 T5 T6;
do
mkdir -p ${SAMPLE} bwa mem -R "@RG\tID:${SAMPLE}\tPU:x\tSM:${SAMPLE}\tPL:Illumina\tLB:x" -t 18${ASSEMBLY}  \
<(zcat ${READ_DIR}/${SAMPLE}/${SAMPLE}.final_1.fastq.gz) \ <(zcat${READ_DIR}/${SAMPLE}/${SAMPLE}.final_2.fastq.gz) | \
samtools fixmate -@ 2 -m - - | \
samtools sort -@ 4 -m 12G |  \
samtools markdup -@ 2 - ${SAMPLE}/${SAMPLE}.mkdup.bam

samtools index ${SAMPLE}/${SAMPLE}.mkdup.bam
mosdepth -t 4 ${SAMPLE}/${SAMPLE}.mkdup ${SAMPLE}/${SAMPLE}.mkdup.bam
done

But it seems to be a problem with the indexing. Here is the bottom of my error log :

[E::main_mem] fail to open file  '.
/var/lib/slurm-llnl/slurmd/job417492/slurm_script: line 18: /dev/fd/62: Permission denied
samtools index: "T1/T1.mkdup.bam" is in a format that cannot be usefully indexed


EDIT: sorry the space between 18 and assembly was here but disapear when I copy paste here. And thanks for fixing the post.

Alignment bwa samtools • 924 views
1
Entering edit mode
   <(zcat ${READ_DIR}/${SAMPLE}/${SAMPLE}.final_1.fastq.gz)  it's useless, bwa mem can read gz files. ADD REPLY 0 Entering edit mode for SAMPLE in 1 2 3 4 5 6;  you should use a workflow manager like snakemake or nextflow... ADD REPLY 0 Entering edit mode -t 18${ASSEMBLY}


space missng between 18 and ASSEMBLY

0
Entering edit mode
ASSEMBLY=".../scaffold.fasta"


as far as I know, three dots '...' doesn't mean anything in linux.

0
Entering edit mode

I would use absolute paths to define these variables. This relative definition is error-prone. In general, make a small subset of the data, say the first 400000 lines of the fastq file pairs and test that locally, and once it works go big on the cluster.

0
Entering edit mode

[E::main_mem] fail to open file '.

This means there is a hickup in parsing the files. Remove the zcat's, try to simplify the script, test with a subset first.

0
Entering edit mode

I tried but nothing change, same error and I don't know the cause.

0
Entering edit mode

set -o pipefail
set -x
set -e
set -u


and show us the output of stdout and stderr

0
Entering edit mode
• set -e
• set -u
• ASSEMBLY=/mnt/data/scaffold.fasta
• for SAMPLE in T1 T2 T3 T4 T5 T6
• mkdir -p T1
• bwa mem -R '@RG\tID:T1\tPU:x\tSM:T1\tPL:Illumina\tLB:x' -t 18 /mnt/data/scaffold.fasta /dev/fd/63 ' ' ++ zcat /mnt/data/reads/filtered/T1/T1.final_1.fastq.gz [M::bwa_idx_load_from_disk] read 0 ALT contigs [E::main_mem] fail to open file  '.

Thank you

0
Entering edit mode

Without zcat :

set -e
set -u
ASSEMBLY=/mnt/data/scaffold.fasta
for SAMPLE in T1 T2 T3 T4 T5 T6
mkdir -p T1
bwa mem -R '@RG\tID:1_19\tPU:x\tSM:T1\tPL:Illumina\tLB:x' -t 18 /mnt/data/scaffold.fasta samtools fixmate -@ 2 -m - -
samtools sort -@ 4 -m 12G
samtools markdup -@ 2 - T1/T1.mkdup.bam
mem: invalid option -- '@'
`
0
Entering edit mode

That misses do and done and the pipe operator.

0
Entering edit mode

If these pipe-like constructs are above your current bash-fu then simply make it serial, nothing wrong with that. Save each output to a file and read it in the next operation, after all what counts is that you get the job done. Files that are produced on the way can be deleted if they take up space.

0
Entering edit mode

I'm sorry I'm quite new in bioinf can you elaborate how to do that ? And what do you mean by "If these pipe-like constructs are above your current bash-fu" ?

Thank you