Some inconsistencies in block and blob generation resulted in missed slots on the Ethereum Network.
The CEO of bloXroute Labs blamed an unproven connection between Lighthouse’s HTTP API and the BDN for the problem.
According to Lighthouse Chief, the problem stemmed from the BDN’s strategy of preventing spread.
The Ethereum network saw a large spike in lost slots earlier this week, mostly from blocks relayed by bloXroute relays. Studies showed that although the bloXroute relays successfully published blocks and blobs, blocks propagated via the BloXroute Distributed Network (BDN) more quickly than blobs propagated across peer-to-peer (p2p) channels more slowly. This disparity brought attention to a certain Client (CL) behavior, resulting in the client rejecting blocks and missing slots.
Using bloXroute to Fix Ethereum Missed Slots.
The CEO of BloXroute Labs, Uri Klarman, thoroughly explained the events surrounding the Ethereum (ETH) missing slots in a lengthy Github discussion.
Nodes in the current Lighthouse version anticipate receiving the blobs from the same peer that supplied the block. In a recent BDN release, block propagation without blobs accelerated as consensus nodes linked to it ignored blocks first received from it. The release relied on the p2p network to distribute blobs as needed. This adjustment unintentionally caused a notable increase in missed slots.
As the bulk of beacon nodes at bloXroute are Lighthouse, Klarman clarified, the BDN depends significantly on Lighthouse. Because of their close integration with the BDN, bloXroute relays were the main target of the successful block propagation through the BDN, according to early post-release findings.
Researchers conducted several tests, concentrating on Lighthouse’s behavior upon encountering blocks via the BDN. They progressively shifted relays from relying on the BDN for block publishing and eventually disabled the BDN’s block propagation containing blobs.
During this time, bloXrouterelays persisted in sending blocks containing blobs to validators and publishing blocks containing blobs to the BDN and beacon node network. However, because beacon nodes had already received the block from the BDN, these publish requests returned a 202 response.
Lighthouse Chief Addresses the Charges
Lighthouse’s chief operating officer, Michael Sproul, has taken issue with Klarman’s explanation of the missed slots, saying it distorts the truth about a problem with the Lighthouse p2p bug. According to Sproul, the cause of the bug was an untested interaction between Bloxroute’s centralized “block distribution network” (BDN) and Lighthouse’s HTTP API.
This post-mortem misrepresents the issue as a Lighthouse p2p bug, when in fact it was caused by an untested interaction between Bloxroute’s centralised “block distribution network” (BDN) and Lighthouse’s HTTP API
Here’s an account from my perspective https://t.co/T2i9dbI2zQ
— Michael Sproul (@sproulM_) March 29, 2024
Bloxroute, according to Sproul, was obstructive during the event and wouldn’t give logs to back up their allegations. Before providing the required information, Bloxroute contends that they conducted a hasty post-mortem.
Sproul claims that the problem started when Bloxroute published blocks without blobs over the BDN to the peer-to-peer network and then tried to POST the missing blobs to Lighthouse as part of an HTTP request. Lighthouse and Prysm HTTP APIs expected blocks to be delivered peer-to-peer along with complete blobs. According to Sproul, this presumption was incorrect because a “block distribution network” was in place that circumvented the standard procedure for publishing blocks.
Sproul suggests immediate fixes, like shutting off the BDN during blob involvement, and recommends long-term solutions, such as rebuilding the PBS ecosystem to prevent future mistakes. Additionally, he advocates rendering the BDN obsolete due to its centralized architecture and threats to Ethereum’s decentralization.