Serial and Parallel Processing

Monaco supports three different processing methods for running Monte Carlo simulations: serial (single-threaded), multiprocessing, and Dask distributed computing. Different methods can be faster depending on your use case.

Serial Processing (Single-threaded)

Serial processing runs all simulation cases sequentially in a single thread using a simple for loop. This is the most straightforward execution mode with no parallelization overhead, and is the best place to start for ease of debugging.

Configuration

The singlethreaded=True parameter forces serial execution regardless of other parallel settings.

sim = mc.Sim(
    name='MySimulation',
    ndraws=1000,
    fcns=my_functions,
    singlethreaded=True,  # Enable serial processing
    # Other parameters are ignored when singlethreaded=True
)

Multiprocessing

Multiprocessing uses Python’s built-in multiprocessing module with ProcessPoolExecutor to distribute cases across multiple CPU cores on a single machine. It provides good performance improvements for CPU-bound simulations and automatically balances the workload across available cores.

Multiprocessing is built into Python’s standard library, requiring no additional dependencies.

Configuration

Setting singlethreaded=False enables parallel processing, while usedask=False ensures multiprocessing is used instead of Dask.

sim = mc.Sim(
    name='MySimulation',
    ndraws=1000,
    fcns=my_functions,
    singlethreaded=False,            # Enable parallel processing
    usedask=False,                   # Use multiprocessing instead of Dask
    ncores=4,                        # Optional: specify number of cores
    multiprocessing_method='spawn',  # Optional: set start method ('fork', 'spawn', or 'forkserver')
)

Start Methods

The multiprocessing start method affects how worker processes are created and can have significant speed implications:

'fork': The default method on Linux. It provides fast startup with shared memory from the parent process, which can potentially lead to issues with unsafe memory management.
'spawn': The default method on Windows and macOS, and will become the default for Linux in Python 3.14. It has a slower startup and requires pickling data but creates completely separate, memory-safe processes.
'forkserver': Offers a hybrid approach that preserves most of fork’s speed while avoiding many (but not all) of its thread-safety pitfalls.

Dask Distributed Computing

Dask provides distributed computing capabilities that can scale from a single machine to a cluster of machines. For local workflows, it is generally slower and offers no benefits over multiprocessing. However it does allow scaling to distributed compute clusters.

Note that Dask is not a required dependency. It can be installed with pip install dask[distributed].

When using Dask locally, there is a nice web dashboard available at http://localhost:8787 to watch the progress of your simulation.

Configuration

Setting singlethreaded=False enables parallel processing, and usedask=True selects Dask instead of multiprocessing. The ncores parameter overrides the n_workers setting in daskkwargs if specified. The daskkwargs parameter accepts a dictionary of keyword arguments that are passed directly to the Dask Client constructor. See the Dask docs for a description of the valid kwargs.

sim = mc.Sim(
    name='MySimulation',
    ndraws=1000,
    fcns=my_functions,
    singlethreaded=False,  # Enable parallel processing
    usedask=True,          # Use Dask
    ncores=4,              # Optional: overrides n_workers in daskkwargs
    daskkwargs={           # Optional: Dask client configuration
        'n_workers': 4,
        'threads_per_worker': 1,
        'memory_limit': '2GB',
        'dashboard_address': ':8787'
    }
)

Distributed Cloud Compute Clusters

Using cloud compute with Dask can be accomplished by overwriting a sim’s client and cluster attributes. The below setup builds on the Coiled example from the dask documentation, and will require you to first set up an account with Coiled, connected to AWS or some other cloud compute provider.

Note that network bandwidth might be the bottleneck for computations that don’t live on your local machine. Ensure that the keepsiminput=False and keepsimrawoutput=False flags are set to reduce the data volume.

import coiled
cluster = coiled.Cluster(
    n_workers=8,
    worker_memory="8 GiB",
    wait_for_workers=True,
)
client = cluster.get_client()

sim.client = client
sim.cluster = cluster

Connecting to Existing Clusters

Note that default port for the Dask client scheduler (8786) is different than the port for the dashboard (8787). You can get your existing scheduler address with client.scheduler.address.

from dask.distributed import Client
existing_scheduler_address = 'tcp://127.0.0.1:8786'
client = Client(existing_scheduler_address)
cluster = client.cluster

sim.client = client
sim.cluster = cluster

Choosing your Processing Approach

Start your development process with a single-threaded setup for ease of debugging.

Once things are working, try multiproccessing. Whether this is faster than single-threaded depends on the particulars of your simulation. If your sim has a lot of input and output data, then pickling these objects and transfering that memory to and from worker processes may mean that multiprocessing is actually slower. However if your sim is computation-heavy, then multiprocessing’s use of multiple cores can give a significant performance benefit.

It’s worth experimenting with your specific setup. Try single-threaded, and multiprocessing with both the 'fork' and 'spawn' start methods to see what is fastest.

If you need to scale your simulation beyond your local machine, then move to Dask. A full guide on setting up distributed computation is beyond the scope of these docs, but resources are widely available to get you started.