I'm running MUSKET on my dataset trimmed_data.tar.gz using 1000 threads, 2000 threads, and 4000 threads on a HPC. I've been unable to obtain any results because the software seems to be running for a long time.
That means you can't have the -o switch after file name. Or, if you will, trimmed_data.tar.gz must come last. Also not sure why you are saving the file with .fa extension - presumably this is a fastq file?
Separately, it makes absolutely no sense to run this with 1000 threads (and about 4 times less sense to run it with 4000). There is absolutely no way that your read/write operations can catch up with what thousands of threads can do. I suggest you run this with 10-20 threads, a k-mer size of 21-25. In fact, it says on the program's web page that MAX_KMER_SIZE is 28, so you are way out of the ballpark with 90.
There is no need to loop over samples if everything is in one file. You can go with the command you already used but with adjusted k-mer and threads options.
File name must come at the end, which means -inorder must move inside. Also, -omulti is for multiple outputs when you have multiple inputs, but you have a single input file which means that -o is more appropriate. Something like:
Thanks for the tip! Should I loop over all the samples individually or can I run Musket on the entire dataset
trimmed_data.tar.gz
?There is no need to loop over samples if everything is in one file. You can go with the command you already used but with adjusted k-mer and threads options.
Would this command make sense for paired-end reads?
File name must come at the end, which means
-inorder
must move inside. Also,-omulti
is for multiple outputs when you have multiple inputs, but you have a single input file which means that-o
is more appropriate. Something like:This is not that difficult to figure out if you read the explanations on musket's web site I referenced above.