Reentrancy Vulnerability Scope Expanded

Overview

Late last year the ChainSecurity team reported a vulnerability via Immunefi involving read-only reentrancy that we thought might render getRate() and related functions manipulable, and thus unsafe to use in many contexts.

Upon further investigation, we realized it actually applied to a broader class of pools (vs. only pools with certain tokens), but it still seemed limited to potential interactions with partner integrations. We didn’t see a way to exploit it directly, so thought it was sufficient to document the potential dangers for integrators.

After the holiday break, looking at the issue with fresh eyes, we realized there was in fact a possible exploit on the pools themselves: albeit a very obscure and inefficient one. Nevertheless, Balancer Labs took immediate measures to reduce the chance of an exploit.

Context

It turns out there were four classes of pools:

A) Pools and pool types that were unaffected.

B) Pools that have unsafe external rate functions, but are not vulnerable on their own.

C) Pools where unsafe view function calls (e.g., getRate) could lead to a DoS attack, but could be mitigated through governance actions, mainly by placing pools into Recovery Mode, which turns off protocol fees and sends the contracts down code paths that are not vulnerable: i.e., they become type B pools.

The Emergency SubDAO multisig acted to place these pools in Recovery Mode. It also disabled the factories that had this feature, so that no new vulnerable pools could be created.

D) Pools where the unsafe rates could lead to loss of funds: mainly, nested Composable Stable pools. These pools could not be mitigated, and could also not be paused, since we are already outside the factory’s pause window.

Balancer’s Twitter disclosed the list of affected pools that had liquidity, and asked LPs to withdraw immediately. Also, governance acted to kill the associated gauges.

Affected Contracts

Stable Pool
Phantom Stable Pool
Linear Pool (V3)
Composable Stable Pool (V1, V2)
Weighted Pool (V2)
Managed Pool

Vulnerability Brief

The Balancer Vault is non-reentrant, meaning you cannot initiate a Vault interaction if you are already in the context of another: i.e., you cannot swap inside a join, or exit a pool and transfer internal balance in the same interaction.

Of course, that doesn’t mean you can’t call the Vault at all during an operation. Read-only reentrancy occurs when a contract calls a view (read-only) function on the Vault. This is generally perfectly safe, since by definition view functions cannot modify contract state: and you can’t get up to much mischief without modifying state.

Pool contracts are also non-reentrant for the most part, and likewise internally safe. However, there is no way to automatically enforce a reentrancy scope that encompasses both the Vault and pools: especially when pools are nested inside each other.

It is possible, then, for read-only reentrancy to result in pools receiving stale data during joins and exits, and then using it to update pool state, in such a way as to cause incorrect behavior on subsequent operations.

In a nutshell, the vulnerability arises from the fact that the total supply (which is updated in the pool code), and the balances (updated in the Vault) can get out of sync in certain circumstances. Calling the Vault from an outside contract - which succeeds because it’s outside the Vault reentrancy context - can mislead the pools into updating their own state in ways that can be exploited. For instance, they can cause exorbitant protocol fees to be paid, or maliciously update caches to manipulate rates, making it possible to extract value from the pool via incorrect pricing: i.e., tricking pools into making bad trades.

In _onJoinOrExit note that the _callPoolBalanceChange function (which processes token transfers) is called before the set balance functions, which update the pool accounting. _callPoolBalanceChange calls _processJoinPoolTransfers, which calls _handleRemainingEth unconditionally: effectively making every token a callback token, since if you send any unused ETH in a join, it will be returned to the caller. If this caller is a malicious attacker contract, it can define a fallback function that calls into the pool and updates state: for instance, the token rate cache. Since (in the case of affected pools) this happens after the BPT minting but before the token balances are updated in the Vault, the invariant and rate calculations - anything that relies on balances or supply (especially the relationship between them) - will be wrong, and can be manipulated.

Here’s how that can work when joining a Weighted Pool (the simplest example):

Weaponizing the Bug

As mentioned before, we grouped pools into 4 different types. Type A pools are unaffected (these are typically pools that don’t support regular joins or exits to begin with), and Type B pools are subject to their own view functions being affected (notably getRate(), which can be caused to return bogus values) - but their own operation remains secure.

Type C pools use a protocol fee percentage cache, which can be updated by anyone. Before they update their local copy of the protocol fees however, they attempt to pay any fees due using the old protocol fee percentage, which involves looking at the growth of the invariant and BPT supply ratio over time. Since this can be manipulated, these pools can be tricked into paying excess protocol fees by forcing a reentrant protocol fee cache update during an exploited call to joinPool. This has been mitigated by placing these pools into Recovery Mode, which overrides all protocol fee percentages and sets them to zero, ignoring the cache and therefore preventing any excess payout of fees.

Type D pools are Stable pools that contain the BPT of another Type B Pool as one of their tokens. Stable pools feature a rate cache for their underlying tokens, and as such it is possible to force a Type D pool to update its rate cache for a vulnerable Type B while manipulating its rate, resulting in the Stable pool mis-pricing the asset. Since Stable pools are fairly insensitive to fluctuations in their balances, profiting from this attack is non trivial and requires significant access to capital to compensate for the cost of the attack. The only mitigation that exists here is to increase the amplification factor, which will increase the economic cost of the attack (by requiring more capital and gas).

Ongoing Mitigation Efforts

We considered doing a white hat hack to recover and distribute the remaining funds, but the response to our call to withdraw liquidity was very strong and effective. Around 85% of the liquidity had been withdrawn in the first 24 hours, and at the time of writing, it is nearly all gone. The highest liquidity DOLA pool went from $8m to around $130k (now down to $4k, as expected after removing incentives). At the time of writing, the total TVL of all affected pools together is < $50k.

We disclosed it privately to a long list of partners that were potentially affected, and have been working with them all along to ensure that any vulnerabilities were identified and patched in advance of this announcement. To the best of our knowledge and belief, these have all been addressed.

We are continuing efforts to contact LPs, directly and through partners, to reduce the liquidity even further. We also expect that ending incentives (see the proposal above to kill the gauges) will take out most of the stragglers.

Given the complexity and capital inefficiency of the exploit, we believe it is not profitable at these liquidity levels. The risk has decreased to the point where a white hat effort might actually be more costly than a hack, especially considering the dev time and resources we would need to devote to do it safely.

Conclusion and Future Plans

We’ve done all we can to contain and mitigate this vulnerability: and quite successfully. No funds have been lost, and the chances of an exploit decrease with each passing day.

The downside of such an aggressive response is that we are now in a state where our revenue stream from protocol fees is near zero, and creation of new Weighted, Stable, and Managed Pools has been halted.

Luckily, the changes required to render the pools safe are very minor, and are already in review. (We are also enlisting prominent partners in the review effort, to ensure that we have not missed anything in our analysis.)

A permanent fix would of course require a Vault migration (i.e., rearranging the order of calls so that balances are in sync before any external calls), which - at least for mainnet - will have to wait for V3, as it would be too disruptive to migrate to a new Vault at this time, especially when there are far simpler (and equally effective) fixes that can be made at the pool level.

At the pool level, we simply need to ensure that no state-altering functions on pools can be called externally when inside the Vault’s re-entrancy context: i.e., a public call that updates a protocol fee or rate cache can be called externally on its own, but cannot be called during a Vault operation (swap/join/exit). The simplest way to do this is to wrap the call with a function that cheaply enters the Vault context: which would revert if we were already in that context.

To restore normal operations, new versions of the affected pools will be deployed, hopefully within the next couple of weeks, and liquidity providers will be able to migrate their funds at that time. (Any incentives will of course be moved to the new pools.) We will endeavor to make this transition as smooth and painless as possible, and most users will be able to migrate directly through the Balancer UI.

We will provide timely updates on this thread, and on Twitter and Discord, if there are major new developments.

Vulnerable Pools

Mainnet
Polygon
Arbitrum
Optimism
Fantom
7 Likes