The Lightning Network is Vulnerable to Attack
In this article I will explain why the Lightning Network is vulnerable to attack, and thus will likely not become a useful technology.
[Edit: October 25, 2019] This article describes basically the same vulnerability: https://chainbulletin.com/researchers-describe-bitcoin-attack-that-could-halt-lightning-payments/
My main thesis is simple: the Lightning Network is vulnerable because with only a relatively small investment, a malicious actor can seriously degrade the network to the point that many users will find its failure rate intolerable. Users will leave the network which will further reduce its functionality leading to a cascading cycle of fewer users and declining functionality.
Without a fundamental redesign the network cannot be defended against such malicious actors.
Since my argument is a broad conceptual one, I won’t dwell on fine details since most of the them can only be estimated anyway.
The Basic Design of Lightning
The Lightning Network is a payment system. It consists of users and channels. The purpose of the system is to allow any user to transfer bitcoin to any other user quickly (under a few seconds), and cheaply (less than a few tenths of one U.S. cent).
Any two users in the Lightning system may establish a channel. A channel requires one or both users to keep a balance of bitcoin in what is essentially an escrow account. Establishing a channel takes around twenty minutes and requires one user to initially commit some amount of bitcoin to the channel.
Bitcoin can be transferred across channels. The amount that can be transferred in a given direction is the capacity in that direction between the users.
For instance, imagine a channel between Alice and Bob that was established by Alice with 1 bitcoin. The channel has a capacity of 1 bitcoin in the direction Alice → Bob. It therefore allows the transfer from Alice to Bob of any amount less than or equal to 1 bitcoin.
Transfers along such channels take place quickly because they are limited only by the communication delay between Alice and Bob’s computers.
Sending Payments Using Intermediaries
The innovation of Lightning is to allow the transfer of payments between users who are not connected by a channel. Such transfers are accomplished by using intermediaries. For instance, Alice can transfer bitcoin to Charles: she first transfers the payment to Bob who then transfers it to Charles.
However the transfer described above is only possible if there are channels with sufficient capacity from Alice to Bob and from Bob to Charles.
As currently implemented, a single transfer can traverse a path of up to 20 users. So a transfer by a user who sends a payment to himself can cross up to 18 other channels.
The Nub of the Problem with Lightning
Imagine that Alice wants to transfer 1 bitcoin to Zach along a path of intermediaries, for example, Alice → Bob → Carol → … → Zach. To initiate the transfer Alice makes a commitment to Bob requiring her to put aside 1 bitcoin from the channel capacity. That 1 bitcoin eventually is either 1) claimed by Bob if the transfer succeeds, or 2) refunded to Alice if the transfer is rejected by the network or Zach does not receive it in a period of time called the timeout. As the transfer moves from user to user along the path, each user makes a similar commitment to the next user in the path requiring him to put aside 1 bitcoin from his channel capacity.
If the timeout were on the order of seconds, there would be no problem. However the timeout can range from an hour to days depending on the length of the path and the options set in the channel.
When Alice initiates her transfer to Zach, there are three possibilities that can occur within the next few seconds: 1) Zach successfully receives the transfer, 2) the network rejects the transfer, because, for instance, there is insufficient capacity, or 3) the network does not reject the transfer but Zach does not receive the transfer.
The last possibility is the nub of the problem: Alice’s transfer is not guaranteed to either succeed or fail quickly. The transfer can go into an unresolved state which can last until the timeout expires. The unresolved state can last for hours to days.
During that time the channel capacities along each step of the path (Alice to Bob, Bob to Carol, etc.) are reduced by 1 bitcoin until the channel timeout expires on that step of the path. (The timeouts for the intermediary steps are less than for Alice.)
Also Alice will not know whether the transfer will eventually succeed or timeout. At that point Alice has two undesirable choices: 1) to resend the payment on Lightning or use another payment method and trust that Zach will refund one of the payments if the first payment succeeds, or 2) do nothing and wait until the payment succeeds or times out.
Channel capacity is the resource that the network offers. Successful transfers require capacity on the network, so reducing capacity reduces the ability to successfully complete transfers and increases the probability of transfer failures.
Hypothetically then a malicious actor who runs several copies of the Lightning software on different computers, that is, who has several user identities on the network, can degrade the system in two ways:
- Reduce channel capacity: The attacker can route a payment to herself across a maximum path of 20 users. She can then cause the transfer to not complete and to go into the unresolved state. All the intermediary channels in the path will have the amount of the transfer stuck and unusable. So she is able to reduce aggregate capacity by about 20 bitcoin for each bitcoin in the transfer.
- Causing system failure for other users: by offering and announcing a channel with low fees, the attacker can put transfers across her channels into the unresolved state so the attacker can cause other users to see their transfers go into the unresolved state which is, from the sender’s viewpoint, a transfer failure.
In order to be widely adopted a payment system must be reliable.
How reliable a system must be can only be estimated. For some users, if a payment fails once in 1000 attempts, the user will abandon the system. For other users, a rate of one failure in 50 attempts might be tolerable. I will assume that at a 5% failure rate most users will abandon the system.
Assume that a malicious actor can reduce aggregate capacity by 20 bitcoin for every 1 bitcoin that he is willing to lock up for a period of time. Note that the attacker will eventually recover all of his bitcoin. He is only making his bitcoin unavailable for the duration of the attack. His only costs are the bitcoin transaction fees for establishing channels and the very small fees that Lightning users can charge for forwarding transfers. The cost of the attack is small but the impact may be very large. If the attacker controls just 5% of capacity he can likely reduce network capacity enough to cause an unacceptably high rate of transfer failures.
Or assume that the attacker establishes enough user identities and channel capacity so that only 5% of transfers cross his channels. Then he can put all of those transfers into the unresolved state and cause an unacceptably high rate of transfer failures.
So such an attacker can disrupt the network in two ways and can likely cause total costs to the other users of the network far beyond the costs to himself.
As users who will not tolerate a given failure rate leave the network, their channel capacity will also leave the network thereby causing a higher failure rate, causing more users and capacity to leave the network, and so on.
The cost to the attacker is almost trivial: the bitcoin transaction fees to establish channels plus the very low fees on the Lightning Network plus the time-cost of the bitcoin in his channels. If he is passively holding bitcoin in a wallet anyway then the time-cost is zero. Yet the impact on the network is potentially massive.
In summary, it will be far too easy easy for an attacker or a group of attackers to cause the whole Lightning network to experience at least a 5% payment failure rate. Users will then leave the network leading to a catastrophic cycle of increasing failure rates and fewer users.
I’ve purposely left out most of the technical details of the Lightning Network and how the described attacks would be carried out. My aim is to illustrate the basic idea. The design features of Lightning that make the attack possible are these:
- The sender plans the path for the transfer before sending the transfer so it is easy for the attacker to create a long path affecting many other users.
- If a user controls several user identities, she can probe the network by transferring payments between her identities and possibly identifying especially vulnerable channels.
- The network is designed to accommodate users going offline unexpectedly or experiencing communication delays and so is not designed to require that transfers either complete or fail in seconds.
- Because the network requires that users monitor and interact with the Bitcoin blockchain, it must accommodate fairly long periods of time on the order of the blockchain block time of ten minutes. With all of the buffer times, delays and grace periods built into the network, over a long path the time that a transfer can be unresolved can easily stretch to days.
- The privacy goals of the network which keep intermediaries unaware of the other users on a path (except the previous and next steps) make it difficult or impossible to identify malicious actors in the network. So any attempt to blacklist malicious users would cause many non-malicious actors to be blacklisted also.
The simplest mitigation strategy would be to enforce some minimum fee for acting as an intermediary in a transfer thereby making the attacker’s strategy too expensive. But that would defeat the purpose of Lightning.
There has been some discussion of tracking payment failures and avoiding users involved in failed payments.
But, those ideas are problematic because:
- They increase the complexity of the network.
- If a user were to keep track of all nodes across which her transfers were always successful and transfer only across those “well-behaved” nodes, she would be limiting the effective capacity of the network for her transfers.
- If there were a way to identify malicious actors, the attacker could easily switch identities (public keys and ip addresses).
- If a reputation-type system were designed to spread information (using the so-called “gossip” protocol) among users about well-behaved and badly-behaved users on the network that would open a new avenue for malicious attackers to spoof the reputation system and may make the problem even worse.
If your credit card went into an unresolved state even occasionally, causing you significant frustration each time, you would probably start carrying cash instead.
The Lightning Network is fragile. Fragile things don’t last.