BWA MEM using snakemake
0
1
Entering edit mode
4.2 years ago
User000 ▴ 690

Hello, I am using snakemake to aling my reads to genomes using BWA MEM

workdir: "/path/to/BAMs/"
(GENOMES,)=glob_wildcards("/path/to/folder/{genome}.fasta")
(SAMPLES,)=glob_wildcards("/path/to/folder/{sample}_R1.fastq.gz")

rule all:
    input: 
        expand("{sample}.{genome}.bam", genome=GENOMES, sample=SAMPLES)

rule bwa_index:
    input:
        database="/path/to/folder/{genome}.fasta"
    output:
        done =touch("{genome}")
    shell:
        """/Tools/bwa-0.7.12/bwa index -p {wildcards.genome} {input.database}"""

rule bwa_mem:
    input:
        bwa_index_done = "{genome}",
        fastq1="/path/to/folder/{sample}_R1.fastq.gz",
        fastq2="/path/to/folder/{sample}_R2.fastq.gz"
    output:
        bam = "{sample}.{genome}.bam"
    threads: 2
    shell:
        """/Tools/bwa-0.7.12/bwa mem -t {threads} /path/to/folder/{wildcards.genome} {input.fastq1} {input.fastq2} | /Tools/samtools-1.3/samtools sort -@ {threads} -o {output.bam}"""

I get the error:

- [W::sam_read1] parse error at line 1610193
[bam_sort_core] truncated file. Aborting.
- Error in job bwa_mem while creating output file

Any ideas why is that? Thanks in advance

snakemake bwa • 3.1k views
ADD COMMENT
0
Entering edit mode

It is a parsing error from samtools sort that is unrelated to snakemake itself. In my experience that happens when your job runs out of memory. How many threads were used for bwa and samtools and what is the total RAM of the node/workstation/machine?

ADD REPLY
0
Entering edit mode

I'm using the cluster of 160 cpu total, with threads=2 and -j = 20. I am also running some other programs using samtools simultaneously. Do you think I should decrease the n.of threads? I am not very good at understanding the RAM. Does this information make sense?

             total        used        free      shared  buff/cache   available
Mem:         257770      195528       24790           9       37451       60814
Swap:          4315         383        3932
ADD REPLY
0
Entering edit mode

Are you limiting threads from a particular job so they stay on a physical node?

ADD REPLY
0
Entering edit mode

actually no, so is it better if I delete threads and use just -j option? I'm not so practical in these terms. If you could lighten me would be helpful :)

ADD REPLY
1
Entering edit mode

I would suggest limiting worker threads for a particular job so they remain on same physical node. Are you using a job scheduler on the cluster?

ADD REPLY
0
Entering edit mode

Unfortunately no, I submit just like

snakemake -s Snakefilename -j 20

There is no qsub installed on the cluster and I do not know how to use the others. I do not know if it is possible to so using snakemake?

ADD REPLY
0
Entering edit mode

as soon as I run other scripts using samtools in parallel it starts giving this problem.

ADD REPLY
1
Entering edit mode

It is probably too much. bwa uses I think somewhat 4GB for the index alone (human/mouse), times 20 jobs is already 80GB. Samtools sort by default uses 768Mb per threads times two (threads) times 20 is approx. 30Gb, total ~ 110Gb. Then bwa needs additional RAM to keep reads in memory and perform its operations, it can be RAM-hungry at times especially if it hits reads aligning to multiple regions. Also you say that other programs are running, so I would reserve this node for alignment only, and see how it goes, maybe limit jobs to 10 or so. Difficult to tell, but it seems you are right now simply out of memory.

ADD REPLY
0
Entering edit mode

thanks ATpoint, very clear explanation. So if I am not mistaken my available memory is 60GB? And the sum of all my 3 programs I am running in parallel should not exceed this number, right?

ADD REPLY
1
Entering edit mode

You have a total of 250Gb available on the node. Still without seeing the run myself I do not know how it is distributed. If you cancel all jobs and then do free -hm again the used column (so the memory right now in use) should almost be zero. As I said difficult to tell from distance, so I would simply see if it goes smoothly if you reserve the node for alignment only. You can also use top to see how the percentage of memory each job consumes.

ADD REPLY
0
Entering edit mode

thread is actually the same as node? I do not understand if I should set threads = 10, nodes =10 or -j = 10...I apologize if my questions result banal and thank you a lot! EDIT: So I will try to use threads = 20 and -j (cores) = 10...I hope it is ok..

ADD REPLY
1
Entering edit mode

Node is equivalent to a physical server (box). Some boxes may have N individual blades but let us not worry about that.
Each node has CPU (s) - One or more depending on number of sockets. Generally 2 but can be 4
Each CPU has multiple cores (modern CPU's have at least 2 and up)
Each core can run multi-threaded i.e. run 2 threads.

ADD REPLY

Login before adding your answer.

Traffic: 3332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6