16bd283b3a Reapply "test: p2p: check that connecting to ourself leads to disconnect" (Sebastian Falbesoner)
0dbcd4c148 net: prevent sending messages in `NetEventsInterface::InitializeNode` (Sebastian Falbesoner)
66673f1c13 net: fix race condition in self-connect detection (Sebastian Falbesoner)
Pull request description:
This PR fixes a recently discovered race condition in the self-connect detection (see #30362 and #30368).
Initiating an outbound network connection currently involves the following steps after the socket connection is established (see [`CConnman::OpenNetworkConnection`](bd5d1688b4/src/net.cpp (L2923-L2930)) method):
1. set up node state
2. queue VERSION message (both steps 1 and 2 happen in [`InitializeNode`](bd5d1688b4/src/net_processing.cpp (L1662-L1683)))
3. add new node to vector `m_nodes`
If we connect to ourself, it can happen that the sent VERSION message (step 2) is received and processed locally *before* the node object is added to the connection manager's `m_nodes` vector (step 3). In this case, the self-connect remains undiscovered, as the detection doesn't find the outbound peer in `m_nodes` yet (see `CConnman::CheckIncomingNonce`).
Fix this by swapping the order of 2. and 3., by taking the `PushNodeVersion` call out of `InitializeNode` and doing that in the `SendMessages` method instead, which is only called for `CNode` instances in `m_nodes`.
The temporarily reverted test introduced in #30362 is readded. Fixes #30368.
Thanks go to vasild, mzumsande and dergoegge for suggestions on how to fix this (see https://github.com/bitcoin/bitcoin/issues/30368#issuecomment-2200625017 ff. and https://github.com/bitcoin/bitcoin/pull/30394#discussion_r1668290789).
ACKs for top commit:
naiyoma:
tested ACK [16bd283b3a), built and tested locally, test passes successfully.
mzumsande:
ACK 16bd283b3a
tdb3:
ACK 16bd283b3a
glozow:
ACK 16bd283b3a
dergoegge:
ACK 16bd283b3a
Tree-SHA512: 5b8aced6cda8deb38d4cd3fe4980b8af505d37ffa0925afaa734c5d81efe9d490dc48a42e1d0d45dd2961c0e1172a3d5b6582ae9a2d642f2592a17fbdc184445
5b7f70ba26 test: loadtxoutset in divergent chain with less work (Alfonso Roman Zubeldia)
d35efe1efc p2p: Start downloading historical blocks from common ancestor (Martin Zumsande)
Pull request description:
This PR adds a test to cover the scenario of loading an assumeutxo snapshot when the current chain tip is not an ancestor of the snapshot block but has less work.
During the review process, a bug was discovered where blocks between the last common ancestor and the background tip were not being requested if the background tip was not an ancestor of the snapshot block. mzumsande suggested a fix (65343ec49a6b73c4197dfc38e1c2f433b0a3838a) to start downloading historical blocks from the last common ancestor to address this issue. This fix has been incorporated into the PR with a slight modification.
Related to https://github.com/bitcoin/bitcoin/issues/28648
ACKs for top commit:
fjahr:
tACK 5b7f70ba26
achow101:
ACK 5b7f70ba26
mzumsande:
Code Review ACK 5b7f70ba26
Tree-SHA512: f8957349686a6a1292165ea9e0fd8c912d21466072632a10f8ef9d852a5f430bc6b2a531e6884a4dbf2e3adb28b3d512b25919e78f5804a67320ef54c3b1aaf6
6ecda04fef random: drop ad-hoc Shuffle in favor of std::shuffle (Pieter Wuille)
da28a26aae bench random: benchmark more functions, and add InsecureRandomContext (Pieter Wuille)
0a9bbc64c1 random bench refactor: move to new bench/random.cpp (Pieter Wuille)
Pull request description:
This adds benchmarks for various operations on `FastRandomContext` and `InsecureRandomContext`, and then removes the ad-hoc `Shuffle` functions, now that it appears that standard library `std::shuffle` has comparable performance. The other reason for keeping `Shuffle`, namely the fact that libstdc++ used self-move (which debug mode panics on) has been fixed as well (see https://github.com/bitcoin/bitcoin/pull/29625#discussion_r1658344049).
ACKs for top commit:
achow101:
ACK 6ecda04fef
hodlinator:
ACK 6ecda04fef
dergoegge:
Code review ACK 6ecda04fef
Tree-SHA512: 2560b7312410581ff2b9bd0716e0f1558d910b5eadb9544785c972384985ac0f11f72d6b2797cfe2e7eb71fa57c30cffd98cc009cb4ee87a18b1524694211417
Now that the queueing of the VERSION messages has been moved out of
`InitializeNode`, there is no need to pass a mutable `CNode` reference any
more. With a const reference, trying to send messages in this method would
lead to a compile-time error, e.g.:
----------------------------------------------------------------------------------------------------------------------------------
...
net_processing.cpp: In member function ‘virtual void {anonymous}::PeerManagerImpl::InitializeNode(const CNode&, ServiceFlags)’:
net_processing.cpp:1683:21: error: binding reference of type ‘CNode&’ to ‘const CNode’ discards qualifiers
1683 | PushNodeVersion(node, *peer);
...
----------------------------------------------------------------------------------------------------------------------------------
Initiating an outbound network connection currently involves the
following steps after the socket connection is established (see
`CConnman::OpenNetworkConnection` method):
1. set up node state
2. queue VERSION message
3. add new node to vector `m_nodes`
If we connect to ourself, it can happen that the sent VERSION message
(step 2) is received and processed locally *before* the node object
is added to the connection manager's `m_nodes` vector (step 3). In this
case, the self-connect remains undiscovered, as the detection doesn't
find the outbound peer in `m_nodes` yet (see `CConnman::CheckIncomingNonce`).
Fix this by swapping the order of 2. and 3., by taking the `PushNodeVersion`
call out of `InitializeNode` and doing that in the `SendMessages` method
instead, which is only called for `CNode` instances in `m_nodes`.
Thanks go to vasild, mzumsande, dergoegge and sipa for suggestions on
how to fix this.
There are only a few call sites of these throughout the codebase, so
move the functionality into FastRandomContext, and rewrite all call sites.
This requires the callers to explicit construct FastRandomContext objects,
which do add to the verbosity, but also make potentially apparent locations
where the code can be improved by reusing a FastRandomContext object (see
further commit).
55eea003af test: Make blockencodings_tests deterministic (AngusP)
4c99301220 test: Add ReceiveWithExtraTransactions Compact Block receive test. (AngusP)
4621e7cc8f test: refactor: Rename extra_txn to const empty_extra_txn as it is empty in all test cases (AngusP)
Pull request description:
This test uses the `extra_txn` (`vExtraTxnForCompact`) vector of optional orphan/conflicted/etc. transactions to provide transactions to a PartiallyDownloadedBlock that are not otherwise present in the mempool, and check that they are used.
This also covers a former nullptr deref bug that was fixed in #29752 (bf031a517c) where the `extra_txn` vec/circular-buffer was null-initialized and not yet filled when dereferenced in `PartiallyDownloadedBlock::InitData`.
ACKs for top commit:
marcofleon:
Code review ACK 55eea003af. I ran the `blockencodings` unit test and no issues with the new test case.
dergoegge:
Code review ACK 55eea003af
glozow:
ACK 55eea003af
Tree-SHA512: d7909c212bb069e1f6184b26390a5000dcc5f2b18e49b86cceccb9f1ec4f874dd43bc9bc92abd4207c71dd78112ba58400042c230c42e93afe55ba51b943262c
Otherwise, if the background tip is not an ancestor of the snapshot, blocks in between that ancestor up to the height of the background tip will never be requested.
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
Co-authored-by: Alfonso Roman Zubeldia <19962151+alfonsoromanz@users.noreply.github.com>
6eecba475e net_processing: make MaybePunishNodeFor{Block,Tx} return void (Pieter Wuille)
ae60d485da net_processing: remove Misbehavior score and increments (Pieter Wuille)
6457c31197 net_processing: make all Misbehaving increments = 100 (Pieter Wuille)
5120ab1478 net_processing: drop 8 headers threshold for incoming BIP130 (Pieter Wuille)
944c54290d net_processing: drop Misbehavior for unconnecting headers (Pieter Wuille)
9f66ac7cf1 net_processing: do not treat non-connecting headers as response (Pieter Wuille)
Pull request description:
So far, discouragement of peers triggers when their misbehavior score exceeds 100 points. Most types of misbehavior increment the score by 100, triggering immediate discouragement, but some types do not. This PR makes all increments equal to either 100 (meaning any misbehavior will immediately cause disconnection and discouragement) or 0 (making the behavior effectively unconditionally allowed), and then removes the logic for score accumulation.
This simplifies the code a bit, but also makes protocol expectations clearer: if a peer misbehaves, they get disconnected. There is no good reason why certain types of protocol violations should be permitted 4 times (howmuch=20) or 9 times (howmuch=10), while many others are never allowed. Furthermore, the distinction between these looks arbitrary.
The specific types of misbehavior that are changed to 100 are:
* Sending us a `block` which does not connect to our header tree (which necessarily must have been unsollicited). [used to be score 10]
* Sending us a `headers` with a non-continuous headers sequence. [used to be score 20]
* Sending us more than 1000 addresses in a single `addr` or `addrv2` message [used to be score 20]
* Sending us more than 50000 invs in a single `inv` message [used to be score 20]
* Sending us more than 2000 headers in a single `headers` message [used to be score 20]
The specific types of misbehavior that are changed to 0 are:
* Sending us 10 (*) separate BIP130 headers announcements that do not connect to our block tree [used to be score 20]
* Sending us more than 8 headers in a single `headers` message (which thus does not get treated as a BIP130 announcement) that does not connect to our block tree. [used to be score 10]
I believe that none of these behaviors are unavoidable, except for the one marked (*) which can in theory happen still due to interaction between BIP130 and variations in system clocks (the max 2 hour in the future rule). This one has been removed entirely. In order to remove the impact of the bug it was designed to deal with, without relying on misbehavior, a separate improvement is included that makes `getheaders`-tracking more accurate.
In another unrelated improvement, this also gets rid of the 8 header limit heuristic to determine whether an incoming non-connecting `headers` is a potential BIP130 announcement, as this rule is no longer needed to prevent spurious Misbehavior. Instead, any non-connecting `headers` is now treated as a potential announcement.
ACKs for top commit:
sr-gi:
ACK [6eecba4](6eecba475e)
achow101:
ACK 6eecba475e
mzumsande:
Code Review ACK 6eecba475e
glozow:
light code review / concept ACK 6eecba475e
Tree-SHA512: e11e8a652c4ec048d8961086110a3594feefbb821e13f45c14ef81016377be0db44b5311751ef635d6e026def1960aff33f644e78ece11cfb54f2b7daa96f946
refactor: CBlockHeaderAndShortTxIDs constructor now always takes an explicit nonce.
test: Make blockencodings_tests deterministic using fixed seed providing deterministic
CBlockHeaderAndShortTxID nonces and dummy transaction IDs.
Fixes very rare flaky test failures, where the ShortIDs of test transactions collide, leading to
`READ_STATUS_FAILED` from PartiallyDownloadedBlock::InitData and/or `IsTxAvailable` giving `false`
when the transaction should actually be available.
* Use a new `FastRandomContext` with a fixed seed in each test, to ensure 'random' uint256s
used as fake prevouts are deterministic, so in-turn test txids and short IDs are deterministic
and don't collide causing very rare but flaky test failures.
* Add new test-only/internal initializer for `CBlockHeaderAndShortTxIDs` that takes a specified
nonce to further ensure determinism and avoid rare but undesireable short ID collisions.
In a test context this nonce is set to a fixed known-good value. Normally it is random, as
previously.
Flaky test failures can be reproduced with:
```patch
diff --git a/src/blockencodings.cpp b/src/blockencodings.cpp
index 695e8d806a..64d635a97a 100644
--- a/src/blockencodings.cpp
+++ b/src/blockencodings.cpp
@@ -44,7 +44,8 @@ void CBlockHeaderAndShortTxIDs::FillShortTxIDSelector() const {
uint64_t CBlockHeaderAndShortTxIDs::GetShortID(const Wtxid& wtxid) const {
static_assert(SHORTTXIDS_LENGTH == 6, "shorttxids calculation assumes 6-byte shorttxids");
- return SipHashUint256(shorttxidk0, shorttxidk1, wtxid) & 0xffffffffffffL;
+ // return SipHashUint256(shorttxidk0, shorttxidk1, wtxid) & 0xffffffffffffL;
+ return SipHashUint256(shorttxidk0, shorttxidk1, wtxid) & 0x0f;
}
```
to increase the likelihood of a short ID collision; and running
```shell
set -e;
n=0;
while (( n++ < 5000 )); do
src/test/test_bitcoin --run_test=blockencodings_tests;
done
```
This removes the need to actually track misbehavior score (see further commit), because any
Misbehaving node will immediately hit the discouragement threshold.
With the Misbehavior score gone for non-connecting headers (see previous
commit), there is no need to only treat headers messages with up to 8
headers as potential BIP130 announcements. BIP130 does not specify such
a limit; it was purely a heuristic.
This misbehavior was originally intended to prevent bandwidth wastage due to
actually observed very broken (but likely non-malicious) nodes that respond
to GETHEADERS with a response unrelated to the request, triggering a request
cycle.
This has however largely been addressed by the previous commit, which causes
non-connecting HEADERS that are received while a GETHEADERS has not been
responded to, to be ignored, as long as they do not time out (2 minutes).
With that, the specific misbehavior is largely irrelevant (for inbound peers,
it is now harmless; for outbound peers, the eviction logic will eventually
kick them out if they're not keeping up with headers at all).
Since https://github.com/bitcoin/bitcoin/pull/25454 we keep track of the last
GETHEADERS request that was sent and wasn't responded to. So far, every incoming
HEADERS message is treated as a response to whatever GETHEADERS was last sent,
regardless of its contents.
This commit makes this tracking more accurate, by only treating HEADERS messages
which (1) are empty, (2) connect to our existing block header tree, or (3) are a
continuation of a low-work headers sync as responses that clear the "outstanding
GETHEADERS" state (m_last_getheaders_timestamp).
That means that HEADERS messages which do not satisfy any of the above criteria
will be ignored, not triggering a GETHEADERS, and potentially (for now, but see
later commit) increase misbehavior score.
0fb17bf61a [log] updates in TxOrphanage (glozow)
b16da7eda7 [functional test] attackers sending mutated orphans (glozow)
6675f6428d [unit test] TxOrphanage handling of same-txid-different-witness txns (glozow)
8923edfc1f [p2p] allow entries with the same txid in TxOrphanage (glozow)
c31f148166 [refactor] TxOrphanage::EraseTx by wtxid (glozow)
efcc593017 [refactor] TxOrphanage::HaveTx only by wtxid (glozow)
7e475b9648 [p2p] don't query orphanage by txid (glozow)
Pull request description:
Part of #27463 in the "make orphan handling more robust" section.
Currently the main map in `TxOrphanage` is indexed by txid; we do not allow 2 transactions with the same txid into TxOrphanage. This means that if we receive a transaction and want to store it in orphanage, we'll fail to do so if a same-txid-different-witness version of the tx already exists in the orphanage. The existing orphanage entry can stay until it expires 20 minutes later, or until we find that it is invalid.
This means an attacker can try to block/delay us accepting an orphan transaction by sending a mutated version of the child ahead of time. See included test.
Prior to #28970, we don't rely on the orphanage for anything and it would be relatively difficult to guess what transaction will go to a node's orphanage. After the parent(s) are accepted, if anybody sends us the correct transaction, we'll end up accepting it. However, this is a bit more painful for 1p1c: it's easier for an attacker to tell when a tx is going to hit a node's orphanage, and we need to store the correct orphan + receive the parent before we'll consider the package. If we start out with a bad orphan, we can't evict it until we receive the parent + try the 1p1c, and then we'll need to download the real child, put it in orphanage, download the parent again, and then retry 1p1c.
ACKs for top commit:
AngusP:
ACK 0fb17bf61a
itornaza:
trACK 0fb17bf61a
instagibbs:
ACK 0fb17bf61a
theStack:
ACK 0fb17bf61a
sr-gi:
crACK [0fb17bf](0fb17bf61a)
stickies-v:
ACK 0fb17bf61a
Tree-SHA512: edcbac7287c628bc27036920c2d4e4f63ec65087fbac1de9319c4f541515d669fc4e5fdc30c8b9a248b720da42b89153d388e91c7bf5caf4bc5b3b931ded1f59
cc67d33fda refactor: Simply include CTxMemPool::Options in CTxMemPool directly rather than duplicating definition (Luke Dashjr)
Pull request description:
Instead of duplicating mempool options two places, just include the Options struct directly on the CTxMemPool
ACKs for top commit:
achow101:
ACK cc67d33fda
kristapsk:
cr utACK cc67d33fda
jonatack:
ACK cc67d33fda
Tree-SHA512: 9deb5ea6f85eeb1c7e04536cded65303b0ec459936a97e4f257aff2c50b0984a4ddbf69a4651f48455b9c80200a1fd24e9c74926874fdd9be436bbbe406251ce
Index by wtxid instead of txid to allow entries with the same txid but
different witnesses in orphanage. This prevents an attacker from
blocking a transaction from entering the orphanage by sending a mutated
version of it.
d53d848347 test: adds outbound eviction tests for non outbound-full-relay peers (Sergi Delgado Segura)
a8d9a0edc7 test: adds outbound eviction functional tests, updates comment in ConsiderEviction (Sergi Delgado Segura)
Pull request description:
## Motivation
While checking the outbound eviction code I realized a case was not considered within the comments, which in turn made me realize we had no functional tests for the outbound eviction case (when I went to check/add the test case).
This PR updates the aforementioned comment and adds functional tests to cover the outbound eviction logic, in addition to the existing unit tests found at `src/test/denialofservice_tests.cpp`.
ACKs for top commit:
davidgumberg:
reACK d53d848347
tdb3:
Re ACK for d53d848347
achow101:
ACK d53d848347
cbergqvist:
ACK d53d848347
Tree-SHA512: 633b84bb1229fe21e2f650c1beada33ca7f190b64eafd64df2266516d21175e5d652e019ff7114f00cb8bd19f5817dc19e65adf75767a88e24dc0842ce40c63e
This also changes behavior if ReadBlockFromDisk or
ReadRawBlockFromDisk fails. Previously, the node would crash
due to an assert. This has been replaced with logging the error,
disconnecting the peer, and returning early.
ffc674595c Replace remaining "520" magic numbers with MAX_SCRIPT_ELEMENT_SIZE (Jon Atack)
Pull request description:
Noticed these while reviewing BIPs yesterday.
It would be clearer and more future-proof to refer to their constant name.
ACKs for top commit:
instagibbs:
ACK ffc674595c
sipa:
ACK ffc674595c
achow101:
ACK ffc674595c
glozow:
ACK ffc674595c, agree it's clearer for these comments to refer to the greppable name of the limit rather than the number
Tree-SHA512: 462afc1c64543877ac58cb3acdb01d42c6d08abfb362802f29f3482d75401a2a8adadbc2facd222a9a9fefcaab6854865ea400f50ad60bec17831d29f7798afe
c6be144c4b Remove timedata (stickies-v)
92e72b5d0d [net processing] Move IgnoresIncomingTxs to PeerManagerInfo (dergoegge)
7d9c3ec622 [net processing] Introduce PeerManagerInfo (dergoegge)
ee178dfcc1 Add TimeOffsets helper class (stickies-v)
55361a15d1 [net processing] Use std::chrono for type-safe time offsets (stickies-v)
038fd979ef [net processing] Move nTimeOffset to net_processing (dergoegge)
Pull request description:
[An earlier approach](1d226ae1f9/) in #28956 involved simplifying and refactoring the network-adjusted time calculation logic, but this was eventually [left out](https://github.com/bitcoin/bitcoin/pull/28956#issuecomment-1904214370) of the PR to make it easier for reviewers to focus on consensus logic changes.
Since network-adjusted time is now only used for warning/informational purposes, cleaning up the logic (building on @dergoegge's approach in #28956) should be quite straightforward and uncontroversial. The main changes are:
- Previously, we would only calculate the time offset from the first 199 outbound peers that we connected to. This limitation is now removed, and we have a proper rolling calculation. I've reduced the set to 50 outbound peers, which seems plenty.
- Previously, we would automatically use the network-adjusted time if the difference was < 70 mins, and warn the user if the difference was larger than that. Since there is no longer any automated time adjustment, I've changed the warning threshold to ~~20~~ 10 minutes (which is an arbitrary number).
- Previously, a warning would only be raised once, and then never again until node restart. This behaviour is now updated to 1) warn to log for every new outbound peer for as long as we appear out of sync, 2) have the RPC warning toggled on/off whenever we go in/out of sync, and 3) have the GUI warn whenever we are out of sync (again), but limited to 1 messagebox per 60 minutes
- no more globals
- remove the `-maxtimeadjustment` startup arg
Closes #4521
ACKs for top commit:
sr-gi:
Re-ACK [c6be144](c6be144c4b)
achow101:
reACK c6be144c4b
dergoegge:
utACK c6be144c4b
Tree-SHA512: 1063d639542e882186cdcea67d225ad1f97847f44253621a8c4b36c4d777e8f5cb0efe86bc279f01e819d33056ae4364c3300cc7400c087fb16c3f39b3e16b96
It should never be a nullopt when the transaction result is valid -
Assume() this is the case. However, as a belt-and-suspenders just in
case it is nullopt, use an empty list.
This helper class is an alternative to CMedianFilter, but without a
lot of the special logic and exceptions that we needed while it was
still used for consensus.