Question

Samtools indexing problem during bwa mem alignment

0

Entering edit mode

22 months ago

GlouGlou • 0

Hello,

I'm trying to align some resequensing data to a reference genome. I made a bash loop faur this. I used bwa mem and in the same script I want to build the samtool index and other stuff. Here is my script :

#!/usr/bin/env bash
#SBATCH --time=150:00:00
#SBATCH --cpus-per-task=26
#SBATCH --mem=150000

#SBATCH --output=alignment.log
#SBATCH --error=alignment.err

ASSEMBLY="/mnt/data/scaffold.fasta"

READ_DIR=/mnt/data/reads/filtered

for SAMPLE in T1 T2 T3 T4 T5 T6;
    do
    mkdir -p ${SAMPLE}
    bwa mem -R "@RG\tID:${SAMPLE}\tPU:x\tSM:${SAMPLE}\tPL:Illumina\tLB:x" -t 18 ${ASSEMBLY}  \
             <(zcat ${READ_DIR}/${SAMPLE}/${SAMPLE}.final_1.fastq.gz) \         
             <(zcat ${READ_DIR}/${SAMPLE}/${SAMPLE}.final_2.fastq.gz) | \
             samtools fixmate -@ 2 -m - - | \
             samtools sort -@ 4 -m 12G |  \
             samtools markdup -@ 2 - ${SAMPLE}/${SAMPLE}.mkdup.bam 

    samtools index ${SAMPLE}/${SAMPLE}.mkdup.bam
        mosdepth -t 4 ${SAMPLE}/${SAMPLE}.mkdup ${SAMPLE}/${SAMPLE}.mkdup.bam
    done


But it seems to be a problem with the indexing. Here is the bottom of my error log :

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[E::main_mem] fail to open file ` '.
/var/lib/slurm-llnl/slurmd/job417492/slurm_script: line 18: /dev/fd/62: Permission denied
[bam_mating_core] ERROR: Couldn't read header
samtools sort: failed to read header from "-"
[markdup] error reading header
samtools index: "T1/T1.mkdup.bam" is in a format that cannot be usefully indexed

Thank you for your help.

EDIT: sorry the space between 18 and assembly was here but disapear when I copy paste here. And thanks for fixing the post.

Alignment bwa samtools • 1.8k views

ADD COMMENT • link 22 months ago by GlouGlou • 0

1

Entering edit mode

   <(zcat ${READ_DIR}/${SAMPLE}/${SAMPLE}.final_1.fastq.gz)

it's useless, bwa mem can read gz files.

ADD REPLY • link 22 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

for SAMPLE in 1 2 3 4 5 6;

you should use a workflow manager like snakemake or nextflow...

ADD REPLY • link 22 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

-t 18${ASSEMBLY}

space missng between 18 and ASSEMBLY

ADD REPLY • link 22 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

ASSEMBLY=".../scaffold.fasta"

as far as I know, three dots '...' doesn't mean anything in linux.

ADD REPLY • link 22 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

I would use absolute paths to define these variables. This relative definition is error-prone. In general, make a small subset of the data, say the first 400000 lines of the fastq file pairs and test that locally, and once it works go big on the cluster.

ADD REPLY • link 22 months ago by ATpoint 82k

0

Entering edit mode

[E::main_mem] fail to open file '.`

This means there is a hickup in parsing the files. Remove the zcat's, try to simplify the script, test with a subset first.

ADD REPLY • link 22 months ago by ATpoint 82k

0

Entering edit mode

I tried but nothing change, same error and I don't know the cause.

ADD REPLY • link 22 months ago by GlouGlou • 0

0

Entering edit mode

before the line ASSEMBLY=, add

set -o pipefail
set -x
set -e
set -u

and show us the output of stdout and stderr

ADD REPLY • link 22 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

set -e
set -u
ASSEMBLY=/mnt/data/scaffold.fasta
READ_DIR=/mnt/data/reads/filtered
for SAMPLE in T1 T2 T3 T4 T5 T6
mkdir -p T1
bwa mem -R '@RG\tID:T1\tPU:x\tSM:T1\tPL:Illumina\tLB:x' -t 18 /mnt/data/scaffold.fasta /dev/fd/63 ' ' ++ zcat /mnt/data/reads/filtered/T1/T1.final_1.fastq.gz [M::bwa_idx_load_from_disk] read 0 ALT contigs [E::main_mem] fail to open file ` '.

Thank you

ADD REPLY • link 22 months ago by GlouGlou • 0

0

Entering edit mode

Without zcat :

set -e
set -u
ASSEMBLY=/mnt/data/scaffold.fasta
READ_DIR=/mnt/data/reads/filtered
for SAMPLE in T1 T2 T3 T4 T5 T6
   mkdir -p T1
   bwa mem -R '@RG\tID:1_19\tPU:x\tSM:T1\tPL:Illumina\tLB:x' -t 18 /mnt/data/scaffold.fasta samtools fixmate -@ 2 -m - -
samtools sort -@ 4 -m 12G
samtools markdup -@ 2 - T1/T1.mkdup.bam
mem: invalid option -- '@'
samtools sort: failed to read header from "-"
[markdup] error reading header

ADD REPLY • link 22 months ago by GlouGlou • 0

0

Entering edit mode

That misses do and done and the pipe operator.

ADD REPLY • link 22 months ago by ATpoint 82k

0

Entering edit mode

If these pipe-like constructs are above your current bash-fu then simply make it serial, nothing wrong with that. Save each output to a file and read it in the next operation, after all what counts is that you get the job done. Files that are produced on the way can be deleted if they take up space.

ADD REPLY • link 22 months ago by ATpoint 82k

0

Entering edit mode

I'm sorry I'm quite new in bioinf can you elaborate how to do that ? And what do you mean by "If these pipe-like constructs are above your current bash-fu" ?

Thank you

ADD REPLY • link 22 months ago by GlouGlou • 0