**Authors:**

(1) Omer Rathore, Department of Physics, Durham University;

(2) Alastair Basden, Department of Physics, Durham University;

(3) Nicholas Chancellor, Department of Physics, Durham University and School of Computing, Newcastle University;

(4) Halim Kusumaatmaja, Department of Physics, Durham University and Institute for Multiscale Thermofluids, School of Engineering, The University of Edinburgh.

## Table of Links

**Methods**

**Results**

**Conclusions and Outlook, Acknowledgements, and References**

## ABSTRACT

With the advent of exascale computing, effective load balancing in massively parallel software applications is critically important for leveraging the full potential of high performance computing systems. Load balancing is the distribution of computational work between available processors. Here, we investigate the application of quantum annealing to load balance two paradigmatic algorithms in high performance computing. Namely, adaptive mesh refinement and smoothed particle hydrodynamics are chosen as representative grid and off-grid target applications. While the methodology for obtaining real simulation data to partition is application specific, the proposed balancing protocol itself remains completely general. In a grid based context, quantum annealing is found to outperform classical methods such as the round robin protocol but lacks a decisive advantage over more advanced methods such as steepest descent or simulated annealing despite remaining competitive. The primary obstacle to scalability is found to be limited coupling on current quantum annealing hardware. However, for the more complex particle formulation, approached as a multi-objective optimization, quantum annealing solutions are demonstrably Pareto dominant to state of the art classical methods across both objectives. This signals a noteworthy advancement in solution quality which can have a large impact on effective CPU usage.

## Introduction

During the initial development of scientific computation, simulations greatly benefited from improvements to clock speeds of individual processors[1]. However, since the early 2000s[2] hardware limitations have resulted in dwindling improvements to sequential programming applications. The result has been a paradigm shift to concurrency in programming software that aims to exploit the multi-core architectures which form the bedrock of modern day high-performance computing (HPC). However, the effectiveness of these applications is heavily dependant on the equitable distribution of computational workload across available resources, a concept known as load balancing.

y that minimises the need to communicate data between processors. Motivation for the former is evident considering how all processors need to wait for the slowest one to finish before proceeding to the next step, thus potentially resulting in idling and resource wastage at a much larger scale if even a single processor lags behind. In addition to this, communication bandwidth between processors is usually not as efficient as within the processor itself[3, 4] and ideally should be minimised. The importance of load balancing clearly extends beyond any singular discipline. However for the purposes of this paper emphasis is placed on applications in the realm of computational fluid dynamics. In particular, this paper explores the viability of quantum annealing as a solution strategy, as demonstrated using representative grid and off-grid applications.

In the context of grid based applications, it is usually not the governing equations themselves that define the load balancing problem but the chosen method of domain decomposition (DD). DD is the art of splitting a computational domain into smaller sub-domains. This allows solving each smaller problem individually before subsequently recombining into the global solution. Historically, since its original proposal[5], DD in the realm of fluid simulations has been used to accommodate the inherent challenges of complex geometries[6, 7] or higher dimensionality[8]. Extensive research in the last 40 years has led to rapid development, facilitating application to multi-physics problems[9], moving meshes[10] and a wider pool of applications[11]. More importantly in the context of modern HPC, DD is an indispensable tool for transforming the global simulation domain into sub-components that can be assigned to individual processors.

A popular choice of DD for large scale simulations in the realms of fluids[12], stress analysis[13] and biological flows[14] among others is adaptive mesh refinement (AMR). AMR operates on the principle that different areas of the computational domain are best solved with different levels of spatial resolution. This can prevent wastage of computational resources in regions with little activity while maintaining precision in zones of interest. The method is particularly useful for applications with large variation of spatial scales or when high resolution data is required in specific regions. This paper considers block structured implementations as a representative class and does not make further distinctions with alternative AMR varieties, although an interested reader is referred to the relevant literature[15, 16].

For off-grid applications, the representative method here is smoothed particle hydrodynamics (SPH), a mesh free simulation method for continuum mechanics. Since its original conception[17], its application has spread to encompass a wide variety of fields including but not limited to solid mechanics, engineering, astrophysics and the food industry[18, 19]. Fundamentally, SPH is an interpolation technique that utilizes a collection of unordered points, or particles, to ascertain the value of a function at a specific point. This process is facilitated through the use of a kernel function, which effectively integrates the influence of adjacent particles to compute the value of a function at the desired point.

When applied in the context of computational fluids this forms a Lagrangian particle method for solving hydrodynamic equations, thus forming an equivalent counterpart to the Eulerian equations solved by grid based methods. Moreover, when compared with the latter, SPH is inherently more adept at handling free surface flows and complex boundaries and can accommodate large deformations without being hindered by grid distortion. However, these advantages are usually accompanied with caveats when it comes to stability and convergence, as well as limited accuracy[20].

This paper delves into the potential of quantum computing to address the challenges of load balancing, with a particular focus on the quantum annealing (QA) approach[21] for each of the two cases described above. QA is particularly suited for finding the ground state of an Ising problem, which is essentially analogous to finding the optimal solution to many binary combinatorial optimisation problems of interest. This includes the travelling salesman and knapsack problem, as well as more niche applications in computational chemistry or graph theory[22–24]. In light of its range of applicability, the field has attracted considerable interest in recent years, a trend that has only been further fueled by the increasing accessibility of quantum annealers—programmable quantum computers designed for quantum annealing. While the largest gate based quantum computer (IBM) only has 433 quantum bits (i.e. qubits)[25], the largest annealer (D-Wave) has over 12000 qubits[26].

There is an expectation in the community that QA might outperform classical algorithms for some applications. This has led to many empirical studies into the efficacy of QA for a wide range of problems[27–31], often with mixed results that lack a decisive consensus. As of yet, there may be more promise in using QA for approximate optimisation based on a recent demonstration of a scaling advantage over the best classical algorithm, even though this was for a rather artificial optimisation problem[32].

As such it seems the actual choice of problem to solve strongly influences any inferences made about the potential (or lack thereof) of quantum annealing. While there are many works in the existing literature that apply QA to a particular problem, the body of literature tackling the more fundamental question of whether QA even should be applied or not, is very sparse. A recent attempt[33] to create a rigorous methodology for evaluating the viability of a potential use case for quantum computing aims to shed some light on the latter. However it should be noted that the discussion remains very open ended, despite being of paramount importance particularly in the context of near term quantum hardware.

So why does this paper consider load balancing as the target application for QA? Firstly, it should be evident that load balancing is a topic of relevance for almost all computational scientists regardless of discipline. This is particularly true as the exascale era of classical computing approaches and programmers strive towards utilising more and more compute power. It is also still very much an area of active research[34–37], highlighting that the room for potential improvements to be made persists. Furthermore, as will be seen shortly, the problem lends itself to a very natural conversion into an Ising formulation suitable for quantum annealers. Although it is not always inherently apparent a priori how complex the solution energy landscape will be, it is at the very least very large and scales drastically with problem size.

Perhaps the most compelling rationale for selecting load balancing as an application for QA emerges when considering the broader HPC landscape. Classical computing, benefiting from several decades of development, significantly outpaces the emergent field of quantum computing in terms of maturity and scale when it comes to hardware. Despite the immense potential of quantum technology, it’s unlikely to match the sheer scope of classical systems in the near future. Additionally, there’s a growing consensus that classical computing will always retain some relevance, never being completely supplanted by its quantum counterpart. In this context, envisioning quantum computers as complementary accelerators, rather than standalone solutions, is a strategic and viable approach. Particularly in load balancing, where recalculations aren’t constant but occur at regular or semi-regular intervals and hence quantum annealers can effectively augment the process. The proposal here is to integrate quantum annealers with classical HPC systems. The primary computational tasks, whether in fluid dynamics, solid-state mechanics, biological flows or some other algorithm, continue to run on classical HPC infrastructure. Meanwhile, the load balancing component is offloaded to an attached quantum processor operating in parallel. This concept mirrors the synergy between GPUs and CPUs[38], and aligns well with the growing narrative of heterogeneous HPC architectures. Such an approach not only capitalizes on the strengths of quantum annealers but is also in tune with the trend[39–41] towards diversifying and optimizing computational resources in HPC environments.

The structure of the remainder of this paper is outlined as follows: firstly, an introductory review of quantum annealing is presented, alongside pertinent details from the classical methodologies under consideration. This encompasses an in-depth discussion on the acquisition of real-world classical data to be subsequently load balanced using QA. After which, the outcomes derived from both grid-based (AMR) and off-grid (SPH) simulations run on actual quantum annealing hardware are explored.

This paper is available on arxiv under CC BY 4.0 DEED license.