Question: Is Casava Multithreaded?
1
gravatar for Misha
7.5 years ago by
Misha20
Misha20 wrote:

I built a copy of CASAVA from source.

I am using CASAVA to perform a BCL to FASTQ conversion on a single lane (my favorite lane 4).

I am launching the program with a "make -j 32" on my CPU. Where I want 32 threads with 1 per each of my server's fancy cpu cores.

When I take a look at my CPU utilization I see that the demuplex command only uses one CPU thread and my total cpu utilization is a petty 3%.

  1. Can somebody tell me if this step in CASAVA is parallel?

  2. From the user guide:

NOTE the -j <n> command line option is supported to indicate up to <n> processes in parallel. However, for Bcl conversion the maximum level of parallelization is 8.

Does this mean each lane can only have 1 thread?

  1. What in CASAVA is parallel. My understanding is that CASAVA spends most of its time in BCL to FASTQ or are their other costly operation that CASAVA can perform?

==Answers to Questions==

  1. I have a 16 core AMD cpu.
  2. To check utilization I am using top and "ps -m", I see one demultiplexing process and ~3% CPU utilization (1/32?)
  3. I am using the latest version of CASAVA 1.8.2

==Unaligned Folder==

-rw-------  1 sanger criemp 24845 2012-06-06 11:26 myOutput.out
-rw-------  1 sanger criemp   301 2012-06-06 11:18 node_name
drwxr-xr-x  5 sanger criemp  4096 2012-06-06 11:15 Basecall_Stats_C0806ACXX
drwx------  3 sanger criemp  4096 2012-06-06 11:15 Temp
-rw-r--r--  1 sanger criemp   773 2012-06-06 11:10 SampleSheet.mk
-rw-r--r--  1 sanger criemp 12528 2012-06-06 11:09 DemultiplexedBustardConfig.xml
-rw-r--r--  1 sanger criemp  4858 2012-06-06 11:08 Makefile
-rw-r--r--  1 sanger criemp  1897 2012-06-06 11:08 DemultiplexConfig.xml
drwxr-xr-x 10 sanger criemp  4096 2012-06-06 11:08 Project_C0806ACXX
-rw-r--r--  1 sanger criemp 25906 2012-06-06 11:08 support.txt
-rw-r--r--  1 sanger criemp   380 2012-06-05 16:04 CASAVA.sh

The configure command is:

configureBclToFastq.pl --input-dir XXX/Basecalls --output-dir XXX/Unaligned --force --ignore-missing-bcl --ignore-missing-stats --use-bases-mask=y51

==Thread Usage is==

Cpu(s):  3.1%us,  0.1%sy,  0.0%ni, 96.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:     32307M total,    10456M used,    21851M free,        0M buffers

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                        
27279 sanger   20   0 91672  43m 2892 R  100  0.1  18:44.37 demultiplexBcls
• 3.5k views
ADD COMMENTlink modified 7.5 years ago by Dan D6.9k • written 7.5 years ago by Misha20

Do you have a 8-core CPU? I think if you run more than number of CPU cores , that will not increase the speed. 32 threads also saturated your IOs.

ADD REPLYlink written 7.5 years ago by jingtao09110

Couple of questions: -Are you running CASAVA 1.8.2? -How are you checking utilization--are you using top, dstat, or something else?

ADD REPLYlink written 7.5 years ago by Dan D6.9k

Please see responses to questions

ADD REPLYlink written 7.5 years ago by Misha20

What happens when you run make -j 8?

ADD REPLYlink written 7.5 years ago by Dan D6.9k

The demultiplexing step (the first one) is not parallel. I see only 3% cpu utilization.

ADD REPLYlink written 7.5 years ago by Misha20

Are you running CASAVA on the same machine that has the data, or is CASAVA accessing the basecall data over a network?

ADD REPLYlink written 7.5 years ago by Dan D6.9k

Casava is on the same machine as the BCL files. Also I am using linux.

ADD REPLYlink written 7.5 years ago by Misha20

Can you paste or describe the contents of the "unaligned" folder in the flowcell data directory?

ADD REPLYlink written 7.5 years ago by Dan D6.9k

Thank you, please see the edits atop

ADD REPLYlink written 7.5 years ago by Misha20

OK, two things: first, you're not specifying the "--use-bases-mask" parameter correctly. There should be two hyphens and no equals sign. The second thing is that you haven't specified a sample sheet location. Try correcting both of those and running it again. I'll stand by.

ADD REPLYlink written 7.5 years ago by Dan D6.9k

I appear to have missed a hyphen but the equal sign should work. --use-bases-mask=y50 results in an error as expected. Anyways, I do not have a sample sheet, and I edited the SampleSheet.mk by commenting the lines corresponding to the unused lanes.

ADD REPLYlink written 7.5 years ago by Misha20

OK, I think the lack of a sample sheet might be a problem. There's some information in there that CASAVA needs, especially an index, in order to do its thing. You can make a one-liner sample sheet for lane 4. If you're willing, give it a shot, and let me know how it goes. it would be helpful to paste the sample sheet info here if you decide to go that route.

ADD REPLYlink written 7.5 years ago by Dan D6.9k

FCID,Lane,SampleID,SampleRef,Index,Description,Control,Recipe,Operator C0806ACXX,4,H-X,human,,Cypress,Y,51,CB,mTest ... Fixed lane

ADD REPLYlink modified 7.5 years ago • written 7.5 years ago by Misha20

OK, I'm a little concerned that you omitted an index, and that you specified lane 5 instead of your favorite lane 4, but what happened when you ran CASAVA and pointed it to that sample sheet?

ADD REPLYlink written 7.5 years ago by Dan D6.9k

It runs but it only runs on a single CPU :-( Also some errors about image magic

ADD REPLYlink written 7.5 years ago by Misha20

Well, that sounds like some progress, albeit incremental. Can you post the full text of the error? And did you specify the correct index?

ADD REPLYlink written 7.5 years ago by Dan D6.9k

Thanks for all your help, the real trouble here is that the program runs and makes some fastq files but it doesn't run in parallel.

ADD REPLYlink written 7.5 years ago by Misha20

on your latest CASAVA run, how many threads did you specify (-j argument)?

ADD REPLYlink written 7.5 years ago by Dan D6.9k

I tried -j 32 and -j 8.

ADD REPLYlink written 7.5 years ago by Misha20

OK, and with both you saw that CASAVA was only running one process (ie not in parallel)?

ADD REPLYlink written 7.4 years ago by Dan D6.9k

Exactly! It makes me rather sad :-(

ADD REPLYlink written 7.4 years ago by Misha20

Have you successfully run other processes in parallel using make -j ?

ADD REPLYlink written 7.4 years ago by Dan D6.9k

Yes, would you be able to show what top or process monitor says on your system?

ADD REPLYlink written 7.4 years ago by Misha20
1

OK, we recently moved our HiSeqs, so we're running a simple PhiX validation. My sample sheet just has one line for each lane, like so:

BC0RT0ACXX,1,PhiX1,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,2,PhiX2,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,3,PhiX3,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,4,PhiX4,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,5,PhiX5,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,6,PhiX6,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,7,PhiX7,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,8,PhiX8,,,Control,Y,SR100index,CRB,Control

When I run the commands: [PATHTOCASAVA]/configureBclToFastq.pl --input-dir [PATHTOFLOWCELLBASEFOLDER] --output-dir [PATHTOFLOWCELLBASEFOLDER]/unaligned --sample-sheet [PATHTOSAMPLESHEET] --fastq-cluster-count 0 --mismatches 1 --with-failed-reads nohup make -j 16

I see, in top, after about 20 seconds:

46198 root 20 0 35220 11m 1880 R 100.0 0.0 1:51.69 demultiplexBcls 46170 root 20 0 35252 11m 1880 R 100.0 0.0 2:08.96 demultiplexBcls 46192 root 20 0 35256 11m 1880 R 100.0 0.0 1:57.27 demultiplexBcls 46159 root 20 0 34996 11m 1880 R 99.6 0.0 2:07.40 demultiplexBcls 46148 root 20 0 34660 11m 1880 R 98.7 0.0 2:05.87 demultiplexBcls 46179 root 20 0 35252 12m 1880 R 98.7 0.0 2:11.79 demultiplexBcls 46195 root 20 0 35328 11m 1880 R 98.0 0.0 1:56.12 demultiplexBcls 46238 root 20 0 35220 11m 1880 R 94.0 0.0 1:51.12 demultiplexBcls 46176 root 20 0 35252 11m 1880 D 55.9 0.0 2:03.77 demultiplexBcls 46183 root 20 0 35328 11m 1880 D 55.2 0.0 1:58.87 demultiplexBcls 46141 root 20 0 34660 11m 1880 D 52.9 0.0 2:06.30 demultiplexBcls 46144 root 20 0 34988 11m 1880 R 27.6 0.0 2:11.31 demultiplexBcls 46166 root 20 0 34988 11m 1880 R 16.1 0.0 2:08.96 demultiplexBcls 46175 root 20 0 35172 11m 1880 D 8.9 0.0 2:14.39 demultiplexBcls

ADD REPLYlink written 7.4 years ago by Dan D6.9k

on my system, top will show one process for each thread. I will post an example tomorrow, as I'm going to start a CASAVA analysis.

ADD REPLYlink written 7.4 years ago by Dan D6.9k
1
gravatar for Dan D
7.5 years ago by
Dan D6.9k
Tennessee
Dan D6.9k wrote:

To answer your question, yes, CASAVA runs in parallel in the BCL-> FASTQ conversion, demultiplexing, and alignment steps. This is independent of lanes. It's not immediately apparent where your run is going wrong, but it's definitely not performing as it typically should. Hopefully we can narrow it down--see the comments.

ADD COMMENTlink written 7.5 years ago by Dan D6.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1873 users visited in the last hour