Reference-based alignment using MUSKET
1
0
Entering edit mode
2.1 years ago

I'm running MUSKET on my dataset trimmed_data.tar.gz using 1000 threads, 2000 threads, and 4000 threads on a HPC. I've been unable to obtain any results because the software seems to be running for a long time.

./../musket-1.1/musket -k 90 600000000 -p 1000 -zlib 9 -ino
rder trimmed_data.tar.gz -o dataset.fq

When I reduce the threads to 100, I get segmentation fault error.

parallel HPC reference-based alignment MUSKET • 871 views
ADD COMMENT
0
Entering edit mode
2.1 years ago
Mensur Dlakic ★ 27k

This is the expected form of the command:

 musket [options] file

That means you can't have the -o switch after file name. Or, if you will, trimmed_data.tar.gz must come last. Also not sure why you are saving the file with .fa extension - presumably this is a fastq file?

Separately, it makes absolutely no sense to run this with 1000 threads (and about 4 times less sense to run it with 4000). There is absolutely no way that your read/write operations can catch up with what thousands of threads can do. I suggest you run this with 10-20 threads, a k-mer size of 21-25. In fact, it says on the program's web page that MAX_KMER_SIZE is 28, so you are way out of the ballpark with 90.

http://musket.sourceforge.net/homepage.htm

ADD COMMENT
0
Entering edit mode

Thanks for the tip! Should I loop over all the samples individually or can I run Musket on the entire dataset trimmed_data.tar.gz?

ADD REPLY
0
Entering edit mode

There is no need to loop over samples if everything is in one file. You can go with the command you already used but with adjusted k-mer and threads options.

ADD REPLY
0
Entering edit mode

Would this command make sense for paired-end reads?

./../musket-1.1/musket -k 21 536870912 -p 20 -zlib 9 –omulti corrected trimmed_data.tar.gz –inorder
ADD REPLY
0
Entering edit mode

File name must come at the end, which means -inorder must move inside. Also, -omulti is for multiple outputs when you have multiple inputs, but you have a single input file which means that -o is more appropriate. Something like:

musket -k 21 536870912 -p 20 -zlib 9 -inorder -o corrected.gz trimmed_data.tar.gz

This is not that difficult to figure out if you read the explanations on musket's web site I referenced above.

ADD REPLY

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6