# Serial and Parallel Processing Monaco supports three different processing methods for running Monte Carlo simulations: serial (single-threaded), multiprocessing, and Dask distributed computing. Different methods can be faster depending on your use case. ## Serial Processing (Single-threaded) Serial processing runs all simulation cases sequentially in a single thread using a simple for loop. This is the most straightforward execution mode with no parallelization overhead, and is the best place to start for ease of debugging. ### Configuration The `singlethreaded=True` parameter forces serial execution regardless of other parallel settings. ```python sim = mc.Sim( name='MySimulation', ndraws=1000, fcns=my_functions, singlethreaded=True, # Enable serial processing # Other parameters are ignored when singlethreaded=True ) ``` ## Multiprocessing Multiprocessing uses Python's built-in `multiprocessing` module with `ProcessPoolExecutor` to distribute cases across multiple CPU cores on a single machine. It provides good performance improvements for CPU-bound simulations and automatically balances the workload across available cores. Multiprocessing is built into Python's standard library, requiring no additional dependencies. ### Configuration Setting `singlethreaded=False` enables parallel processing, while `usedask=False` ensures multiprocessing is used instead of Dask. ```python sim = mc.Sim( name='MySimulation', ndraws=1000, fcns=my_functions, singlethreaded=False, # Enable parallel processing usedask=False, # Use multiprocessing instead of Dask ncores=4, # Optional: specify number of cores multiprocessing_method='spawn', # Optional: set start method ('fork', 'spawn', or 'forkserver') ) ``` ### Start Methods The multiprocessing start method affects how worker processes are created and can have significant speed implications: * `'fork'`: The default method on Linux. It provides fast startup with shared memory from the parent process, which can potentially lead to issues with unsafe memory management. * `'spawn'`: The default method on Windows and macOS, and will become the default for Linux in Python 3.14. It has a slower startup and requires pickling data but creates completely separate, memory-safe processes. * `'forkserver'`: Offers a hybrid approach that preserves most of fork's speed while avoiding many (but not all) of its thread-safety pitfalls. ## Dask Distributed Computing Dask provides distributed computing capabilities that can scale from a single machine to a cluster of machines. For local workflows, it is generally slower and offers no benefits over multiprocessing. However it does allow scaling to distributed compute clusters. Note that Dask is not a required dependency. It can be installed with `pip install dask[distributed]`. When using Dask locally, there is a nice web dashboard available at [http://localhost:8787](http://localhost:8787) to watch the progress of your simulation. ### Configuration Setting `singlethreaded=False` enables parallel processing, and `usedask=True` selects Dask instead of multiprocessing. The `ncores` parameter overrides the `n_workers` setting in `daskkwargs` if specified. The `daskkwargs` parameter accepts a dictionary of keyword arguments that are passed directly to the Dask Client constructor. See [the Dask docs](https://distributed.dask.org/en/stable/api.html#client) for a description of the valid kwargs. ```python sim = mc.Sim( name='MySimulation', ndraws=1000, fcns=my_functions, singlethreaded=False, # Enable parallel processing usedask=True, # Use Dask ncores=4, # Optional: overrides n_workers in daskkwargs daskkwargs={ # Optional: Dask client configuration 'n_workers': 4, 'threads_per_worker': 1, 'memory_limit': '2GB', 'dashboard_address': ':8787' } ) ``` ### Distributed Cloud Compute Clusters Using cloud compute with Dask can be accomplished by overwriting a sim's `client` and `cluster` attributes. The below setup builds on [the Coiled example from the dask documentation](https://docs.dask.org/en/latest/deploying.html), and will require you to first set up an account with Coiled, connected to AWS or some other cloud compute provider. Note that network bandwidth might be the bottleneck for computations that don't live on your local machine. Ensure that the `keepsiminput=False` and `keepsimrawoutput=False` flags are set to reduce the data volume. ```python import coiled cluster = coiled.Cluster( n_workers=8, worker_memory="8 GiB", wait_for_workers=True, ) client = cluster.get_client() sim.client = client sim.cluster = cluster ``` ### Connecting to Existing Clusters Note that default port for the Dask client scheduler (8786) is different than the port for the dashboard (8787). You can get your existing scheduler address with `client.scheduler.address`. ```python from dask.distributed import Client existing_scheduler_address = 'tcp://127.0.0.1:8786' client = Client(existing_scheduler_address) cluster = client.cluster sim.client = client sim.cluster = cluster ``` ## Choosing your Processing Approach Start your development process with a single-threaded setup for ease of debugging. Once things are working, try multiproccessing. Whether this is faster than single-threaded depends on the particulars of your simulation. If your sim has a lot of input and output data, then pickling these objects and transfering that memory to and from worker processes may mean that multiprocessing is actually slower. However if your sim is computation-heavy, then multiprocessing's use of multiple cores can give a significant performance benefit. It's worth experimenting with your specific setup. Try single-threaded, and multiprocessing with both the `'fork'` and `'spawn'` start methods to see what is fastest. If you need to scale your simulation beyond your local machine, then move to Dask. A full guide on setting up distributed computation is beyond the scope of these docs, but resources are widely available to get you started.