Entering edit mode
4.2 years ago
User000
▴
690
Hello, I am using snakemake to aling my reads to genomes using BWA MEM
workdir: "/path/to/BAMs/"
(GENOMES,)=glob_wildcards("/path/to/folder/{genome}.fasta")
(SAMPLES,)=glob_wildcards("/path/to/folder/{sample}_R1.fastq.gz")
rule all:
input:
expand("{sample}.{genome}.bam", genome=GENOMES, sample=SAMPLES)
rule bwa_index:
input:
database="/path/to/folder/{genome}.fasta"
output:
done =touch("{genome}")
shell:
"""/Tools/bwa-0.7.12/bwa index -p {wildcards.genome} {input.database}"""
rule bwa_mem:
input:
bwa_index_done = "{genome}",
fastq1="/path/to/folder/{sample}_R1.fastq.gz",
fastq2="/path/to/folder/{sample}_R2.fastq.gz"
output:
bam = "{sample}.{genome}.bam"
threads: 2
shell:
"""/Tools/bwa-0.7.12/bwa mem -t {threads} /path/to/folder/{wildcards.genome} {input.fastq1} {input.fastq2} | /Tools/samtools-1.3/samtools sort -@ {threads} -o {output.bam}"""
I get the error:
- [W::sam_read1] parse error at line 1610193
[bam_sort_core] truncated file. Aborting.
- Error in job bwa_mem while creating output file
Any ideas why is that? Thanks in advance
It is a parsing error from
samtools sort
that is unrelated to snakemake itself. In my experience that happens when your job runs out of memory. How many threads were used for bwa and samtools and what is the total RAM of the node/workstation/machine?I'm using the cluster of 160 cpu total, with threads=2 and -j = 20. I am also running some other programs using samtools simultaneously. Do you think I should decrease the n.of threads? I am not very good at understanding the RAM. Does this information make sense?
Are you limiting threads from a particular job so they stay on a physical node?
actually no, so is it better if I delete threads and use just -j option? I'm not so practical in these terms. If you could lighten me would be helpful :)
I would suggest limiting worker threads for a particular job so they remain on same physical node. Are you using a job scheduler on the cluster?
Unfortunately no, I submit just like
There is no qsub installed on the cluster and I do not know how to use the others. I do not know if it is possible to so using snakemake?
as soon as I run other scripts using samtools in parallel it starts giving this problem.
It is probably too much.
bwa
uses I think somewhat 4GB for the index alone (human/mouse), times 20 jobs is already 80GB. Samtools sort by default uses 768Mb per threads times two (threads) times 20 is approx. 30Gb, total ~ 110Gb. Then bwa needs additional RAM to keep reads in memory and perform its operations, it can be RAM-hungry at times especially if it hits reads aligning to multiple regions. Also you say that other programs are running, so I would reserve this node for alignment only, and see how it goes, maybe limit jobs to 10 or so. Difficult to tell, but it seems you are right now simply out of memory.thanks ATpoint, very clear explanation. So if I am not mistaken my available memory is 60GB? And the sum of all my 3 programs I am running in parallel should not exceed this number, right?
You have a total of 250Gb available on the node. Still without seeing the run myself I do not know how it is distributed. If you cancel all jobs and then do
free -hm
again theused
column (so the memory right now in use) should almost be zero. As I said difficult to tell from distance, so I would simply see if it goes smoothly if you reserve the node for alignment only. You can also usetop
to see how the percentage of memory each job consumes.thread is actually the same as node? I do not understand if I should set threads = 10, nodes =10 or -j = 10...I apologize if my questions result banal and thank you a lot! EDIT: So I will try to use threads = 20 and -j (cores) = 10...I hope it is ok..
Node is equivalent to a physical server (box). Some boxes may have
N individual blades
but let us not worry about that.Each node has CPU (s) - One or more depending on number of sockets. Generally 2 but can be 4
Each CPU has multiple cores (modern CPU's have at least 2 and up)
Each core can run multi-threaded i.e. run 2 threads.