dd065dae9f refactor: Make mapBlocksUnknownParent local, and rename it (Hennadii Stepanov)
Pull request description:
This PR is a second attempt at #19594. This PR has two motivations:
- Improve code hygiene by eliminating a global variable, `mapBlocksUnknownParent`
- Fix fuzz test OOM when running too long ([see #19594 comment](https://github.com/bitcoin/bitcoin/pull/19594#issuecomment-958801638))
A minor added advantage is to release `mapBlocksUnknownParent` memory when the reindexing phase is done. The current situation is somewhat similar to a memory leak because this map exists unused for the remaining lifetime of the process. It's true that this map should be empty of data elements after use, but its internal metadata (indexing structures, etc.) can have non-trivial size because there can be many thousands of simultaneous elements in this map.
This PR helps our efforts to reduce the use of global variables. This variable isn't just global, it's hidden inside a function (it looks like a local variable but has the `static` attribute).
This global variable exists because the `-reindex` processing code calls `LoadExternalBlockFile()` multiple times (once for each block file), but that function must preserve some state between calls (the `mapBlocksUnknownParent` map). This PR fixes this by allocating this map as a local variable in the caller's scope and passing it in on each call. When reindexing completes, the map goes out of scope and is deallocated.
I tested this manually by reindexing on mainnet and signet. Also, the existing `feature_reindex.py` functional test passes.
ACKs for top commit:
mzumsande:
re-ACK dd065dae9f
theStack:
re-ACK dd065dae9f
shaavan:
reACK dd065dae9f
Tree-SHA512: 9cd20e44d2fa1096dd405bc107bc065ea8f904f5b3f63080341b08d8cf57b790df565f58815c2f331377d044d5306708b4bf6bdfc5ef8d0ed85d8e97d744732c
Also:
1. Have CChainState::LoadMempool and ::ThreadImport take in paths and
pass it through untouched to LoadMempool.
2. Make LoadMempool exit early if the load_path is empty.
3. Adjust the call to ::ThreadImport in ::AppInitMain to correctly pass
in an empty path if mempool persistence is disabled.
This fixes a blk file size calculation made during reindex that results in increased blk file malformity.
The fix is to avoid double counting the size of the serialization header during reindex.
This adds a unit test to reproduce the bug before the fix and to ensure that it does not recur.
These changes include a log message change also so as to not be as alarming. This is a common and recoverable
data corruption. These messages can now be filtered by the debug log reindex category.
In previous commits in this patchset, we've made sure that every
Unload/UnloadBlockIndex member function resets its own members, and does
not reach out to globals.
This means that their corresponding classes' default destructors can now
replace them, and do an even more thorough job without the need to be
updated for every new member variable.
Therefore, we can remove them, and also remove UnloadBlockIndex since
that's not used anymore.
Unfortunately, chainstatemanager_loadblockindex relies on
CChainState::UnloadBlockIndex, so that needs to stay for now.
This change also introduces an aditional buffer of 10 blocks (PRUNE_LOCK_BUFFER) that will not be pruned before the best block.
Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org>
-----
Code Reviewer Notes
Call graph of relevant functions:
UnloadBlockIndex() <-- Moved from
calls ChainstateManager::Unload()
which calls BlockManager::Unload() <-- Moved to
So calling UnloadBlockIndex() would still run this moved code. The code
will also now run when ~BlockManager gets called, which makes sense.
Now BlockManager::LoadBlockIndex() will ACTUALLY only load BlockMan
members.
[META] In a later commit, pindexBestHeader will be moved to ChainMan as
a member
-----
Code Reviewer Notes
Call graph of relevant functions:
ChainstateManager::LoadBlockIndex() <-- Moved to
calls BlockManager::LoadBlockIndexDB()
which calls BlockManager::LoadBlockIndex() <-- Moved from
There is only one call to each of inner functions, meaning that no
behavior is changing.
This commit effectively splits the "load block index itself" logic from
"derive Chainstate variables from loaded block index" logic.
This means that BlockManager::LoadBlockIndex{,DB} will only load what's
relevant to the BlockManager.
I strongly recommend reviewing with the following git-diff flags:
--color-moved=dimmed_zebra --color-moved-ws=allow-indentation-change
Instead of having CBlockIndex's live on the heap, which requires manual
memory management, have them be owned by m_block_index. This means that
they will live and die with BlockManager.
A change to BlockManager::LookupBlockIndex:
- Previously, it was a const member function returning a non-const CBlockIndex*
- Now, there's are const and non-const versions of
BlockManager::LookupBlockIndex returning a CBlockIndex with the same
const-ness as the member function:
(e.g. const CBlockIndex* LookupBlockIndex(...) const)
See next commit for some weirdness that this eliminates.
The range based for-loops are modernize (using auto + destructuring) in
a future commit.
fa5d2e678c Remove unused char serialize (MarcoFalke)
fa24493d63 Use spans of std::byte in serialize (MarcoFalke)
fa65bbf217 span: Add BytePtr helper (MarcoFalke)
Pull request description:
This changes the serialize code (`.read()` and `.write()` functions) to take a `Span` instead of a pointer and size. This is a breaking change for the serialize interface, so at no additional cost we can also switch to `std::byte` (instead of using `char`).
The benefits of using `Span`:
* Less verbose and less fragile code when passing an already existing `Span`(-like) object to or from serialization
The benefits of using `std::byte`:
* `std::byte` can't accidentally be mistaken for an integer
The goal here is to only change serialize to use spans of `std::byte`. If needed, `AsBytes`, `MakeUCharSpan`, ... can be used (temporarily) to pass spans of the right type.
Other changes that are included here:
* [#22167](https://github.com/bitcoin/bitcoin/pull/22167) (refactor: Remove char serialize by MarcoFalke)
* [#21906](https://github.com/bitcoin/bitcoin/pull/21906) (Preserve const in cast on CTransactionSignatureSerializer by promag)
ACKs for top commit:
laanwj:
Concept and code review ACK fa5d2e678c
sipa:
re-utACK fa5d2e678c
Tree-SHA512: 08ee9eced5fb777cedae593b11e33660bed9a3e1711a7451a87b835089a96c99ce0632918bb4666a4e859c4d020f88fb50f2dd734216b0c3d1a9a704967ece6f
Mutex cs_main is already held by the caller of WriteUndoDataForBlock().
This change is needed to require CBlockIndex::GetUndoPos() to hold
cs_main and CBlockIndex::nStatus to be guarded by cs_main in the
following commits without adding 2 unnecessary cs_main locks to
WriteUndoDataForBlock().
e5b6aef612 Move CBlockFileInfo::ToString method where class is declared (Russell Yanofsky)
f7086fd8ff Add src/wallet/* code to wallet:: namespace (Russell Yanofsky)
90fc8b089d Add src/node/* code to node:: namespace (Russell Yanofsky)
Pull request description:
There are no code changes, this is just adding `namespace` and `using` declarations and `node::` or `wallet::` qualifiers in some places.
Motivations for this change are:
- To make it easier to see when node and wallet code is being accessed places where it shouldn't be. For example if GUI code is accessing node and wallet internals or if wallet and node code are referencing each other.
- To make source code organization clearer ([#15732](https://github.com/bitcoin/bitcoin/issues/15732)), being able to know that `wallet::` code is in `src/wallet/`, `node::` code is in `src/node/`, `init::` code is in `src/init/`, `util::` code is in `src/util/`, etc.
Reviewing with `git log -p -n1 -U0 --word-diff-regex=.` can be helpful to verify this is only updating declarations, not changing code.
ACKs for top commit:
achow101:
ACK e5b6aef612
MarcoFalke:
Concept ACK e5b6aef612🍨
Tree-SHA512: 3797745c90246794e2d55a2ee6e8b0ad5c811e4e03a242d3fdfeb68032f8787f0d48ed4097f6b7730f540220c0af99ef423cd9dbe7f76b2ec12e769a757a2c8d
The new helper function, BlockManager::WriteBlockIndexDB(),
has a thread safety lock annotation in its declaration but is
missing the corresponding run-time lock assertion in its definition.
Per doc/developer-notes.md: "Combine annotations in function
declarations with run-time asserts in function definitions."
fa68a6c2fc scripted-diff: Rename touched member variables (MarcoFalke)
facd3df21f Make blockstorage globals private members of BlockManager (MarcoFalke)
faa8c2d7d7 doc: Clarify nPruneAfterHeight for signet (MarcoFalke)
fad381b2f8 test: Load genesis block to allow flush (MarcoFalke)
fab262174b Move blockstorage-related unload to BlockManager::Unload (MarcoFalke)
fa467f3913 move-only: Create WriteBlockIndexDB helper (MarcoFalke)
fa88cfd3f9 Move functions to BlockManager (MarcoFalke)
Pull request description:
Globals aren't too nice because they hide dependencies, also they make testing harder.
Fix that by removing some.
ACKs for top commit:
Sjors:
ACK fa68a6c2fc
ryanofsky:
Code review ACK fa68a6c2fc. Nice changes!
Tree-SHA512: 6abc5929a5e43a05e238276721d46a64a44f23dca18c2caa9775437a32351d6815d88b88757254686421531d0df13861bbd3a202e13a3192798d87a96abef65d
This is a refactor and safe to do because:
* UnloadBlockIndex calls ChainstateManager::Unload, which calls
BlockManager::Unload
* Only unit tests call Unload directly