I am completely new to bioinformatics and I was just using samtools and bwa to complete an assignment. I was using reference GRCh38 and two exome sequence (each about 16 GB after extracting) and expected to run samtools after bwa to get a vcf file to submit. I set up everything correctly with the help of my instructor. However, when bwa mem started running, it never ends and lasts up to 24 hours. I ended up not able to submit anything for that assignment. I read posts about allocating memory but I don't really know how to do that. My PC is Mac (M1 chip). I just want to know is this speed normal? What can I do to improve its performance in case I need to use it again in future?
Naively I would say that it is normal. A laptop is not meant for heavy duty such as alignments of larger datasets. At some point it heats up and CPU throttles down. Processing time scales with file size. Make a subset of 1mio reads from the fastq file and test your commands. From there you can extrapolate roughly how long the full thing takes.
You might do a test. Extract 1 million lines from your exome sequence file (250k reads), and try running BWA on that to see the time. Then double it to see how much slower it goes. While the relationship may ultimately not be linear, at least you'll be able to (1) prove that BWA can run on your machine, and (2) get a sense of a minimum time it might take, (3) examine the resources required for a run. (you can place time in front of your command to get the system to tell you how long it took).
How much memory do you have on your laptop? If it is the standard 8G then you should not be using more than one thread (
-t 1
) for this analysis. If you are using-t
option with more than one thread then this may be leading to contention and actually slowing things down.