Instead of constructing a new class every time a file operation is done,
construct them once for each of the undo and block file when a new
BlockManager is created.
In future, this might make it easier to introduce an abstract block
store.
fa14e1d9d5 log: Fix __func__ in LogError in blockstorage module (MarcoFalke)
fad59a2f0f log: LogError with FlatFilePos in UndoReadFromDisk (MarcoFalke)
aaaa3323f3 refactor: Mark IsBlockPruned const (MarcoFalke)
Pull request description:
These errors should never happen in normal operation. If they do,
knowing the `FlatFilePos` may be useful to determine if data corruption
happened. Also, handle the error `pos.IsNull()` as part of `OpenUndoFile`,
because it may as well have happened due to data corruption.
This mirrors the `LogError` behavior from `ReadBlockFromDisk`.
Also, two other fixup commits in this module.
ACKs for top commit:
kevkevinpal:
ACK [fa14e1d](fa14e1d9d5)
tdb3:
cr and light test ACK fa14e1d9d5
ryanofsky:
Code review ACK fa14e1d9d5. This should make logging clearer and more consistent
Tree-SHA512: abb492a919b4796698d1de0a7874c8eae355422b992aa80dcd6b59c2de1ee0d2949f62b3cf649cd62892976fee640358f7522867ed9d48a595d6f8f4e619df50
These errors should never happen. However, when they do happen, it is
useful to log the correct error location (function name).
For example, this fixes an incorrect "ConnectBlock()" in
"WriteUndoDataForBlock".
These errors should never happen in normal operation. If they do,
knowing the FlatFilePos may be useful to determine if data corruption
happened. Also, handle the error pos.IsNull() as part of OpenUndoFile,
because it may as well have happened due to data corruption.
This mirrors the LogError behavior from ReadBlockFromDisk.
8789dc8f31 doc: Add note to getblockfrompeer on missing undo data (Fabian Jahr)
4a1975008b rpc: Make pruneheight also reflect undo data presence (Fabian Jahr)
96b4facc91 refactor, blockstorage: Generalize GetFirstStoredBlock (Fabian Jahr)
Pull request description:
The function `GetFirstStoredBlock()` helps us find the first block for which we have data. So far this function only looked for a block with `BLOCK_HAVE_DATA`. However, this doesn't mean that we also have the undo data of that block, and undo data might be required for what a user would like to do with those blocks. One example of how this might happen is if some blocks were fetched using the `getblockfrompeer` RPC. Blocks fetched from a peer will have data but no undo data.
The first commit here allows `GetFirstStoredBlock()` to check for undo data as well by passing a parameter. This alone is useful for #29553 and I would use it there.
In the second commit I am applying the undo check to the RPCs that report `pruneheight` to the user. I find this much more intuitive because I think the user expects to be able to do all operations on blocks up until the `pruneheight` but that is not the case if undo data is missing. I personally ran into this once before and now again when testing for assumeutxo when I had used `getblockfrompeer`. The following commit adds test coverage for this change of behavior.
The last commit adds a note in the docs of `getblockfrompeer` that undo data will not be available.
ACKs for top commit:
achow101:
ACK 8789dc8f31
furszy:
Code review ACK 8789dc8f31.
stickies-v:
ACK 8789dc8f31
Tree-SHA512: 90ae8bdd07a496ade579aa25240609c61c9ed173ad38d30533f6c631fe674e5a41727478ade69ca4b71a571ad94c9da4b33ebba6b5d8821109313c2de3bdfb3d
GetFirstStoredBlock is generalized to check for any data status with a
status mask that needs to be passed as a parameter. To reflect this the
function is also renamed to GetFirstBlock.
Co-authored-by: stickies-v <stickies-v@protonmail.com>
This is a just a mechanical change, renaming and inverting the meaning
of the indexing variable.
"m_blockfiles_indexed" is a more straightforward name for this variable
because this variable just indicates whether or not
<datadir>/blocks/blk?????.dat files have been indexed in the
<datadir>/blocks/index LevelDB database. The name "m_reindexing" was
more confusing, it could be true even if -reindex was not specified, and
false when it was specified. Also, the previous name unnecessarily
required thinking about the whole reindexing process just to understand
simple checks in validation code about whether blocks were indexed.
The motivation for this change is to follow up on previous commits,
moving away from having multiple variables called "reindex" internally,
and instead naming variables individually after what they do and
represent.
b47bd95920 kernel: De-globalize fReindex (TheCharlatan)
Pull request description:
fReindex is one of the last remaining globals exposed by the kernel library, so move it into the blockstorage class to reduce the amount of global mutable state and make the kernel library a bit less awkward to use.
---
This pull request is part of the [libbitcoinkernel project](https://github.com/bitcoin/bitcoin/issues/27587).
ACKs for top commit:
achow101:
ACK b47bd95920
ryanofsky:
Code review ACK b47bd95920. I rereviewed the whole PR, but the only change since last review was reverting the bugfix https://github.com/bitcoin/bitcoin/pull/29817#discussion_r1578327024 and make the change a pure refactoring.
mzumsande:
Code Review ACK b47bd95920
stickies-v:
ACK b47bd95920
Tree-SHA512: f7399d01f93bc0c0c7428fe95d19b9d29b4ed00a4f1deabca78fb0c4fecb434ec971e890feecb105938b5247c926850b1b7b4a4a9caa333a061e40777d0c8463
e41667b720 blockstorage: Don't move cursor backwards in UpdateBlockInfo (Ryan Ofsky)
17103637c6 blockstorage: Rename FindBlockPos and have it return a FlatFilePos (Martin Zumsande)
d9e477c4dc validation, blockstorage: Separate code paths for reindex and saving new blocks (Martin Zumsande)
064859bbad blockstorage: split up FindBlockPos function (Martin Zumsande)
fdae638e83 doc: Improve doc for functions involved in saving blocks to disk (Martin Zumsande)
0d114e3cb2 blockstorage: Add Assume for fKnown / snapshot chainstate (Martin Zumsande)
Pull request description:
`SaveBlockToDisk` / `FindBlockPos` are used for two purposes, depending on whether they are called during reindexing (`dbp` set, `fKnown = true`) or in the "normal" case when adding new blocks (`dbp == nullptr`, `fKnown = false`).
The actual tasks are quite different
- In normal mode, preparations for saving a new block are made, which is then saved: find the correct position on disk (maybe skipping to a new blk file), check for available disk space, update the blockfile info db, save the block.
- during reindex, most of this is not necessary (the block is already on disk after all), only the blockfile info needs to rebuilt because reindex wiped the leveldb it's saved in.
Using one function with many conditional statements for this leads to code that is hard to read / understand and bug-prone:
- many code paths in `FindBlockPos` are conditional on `fKnown` or `!fKnown`
- It's not really clear what actually needs to be done during reindex (we don't need to "save a block to disk" or "find a block pos" as the function names suggest)
- logic that should be applied to only one of the two modes is sometimes applied to both (see first commit, or #27039)
#24858 and #27039 were recent bugs directly related to the differences between reindexing and normal mode, and in both cases the simple fix took a long time to be reviewed and merged.
This PR proposes to clean this code up by splitting out the reindex logic into a separate function (`UpdateBlockInfo`) which will be called directly from validation. As a result, `SaveBlockToDisk` and `FindBlockPos` only need to cover the non-reindex logic.
ACKs for top commit:
paplorinc:
ACK e41667b720
TheCharlatan:
Re-ACK e41667b720
ryanofsky:
Code review ACK e41667b720. Just improvements to comments since last review.
Tree-SHA512: a14ff9a0facf6b1e3c1cd724a2d19a79a25d4b48de64398fdd172671532a472bc10a20cbb64ac3a3e55814dcc877d0597a3e1699cabc4f9d9a86b439b6eaba20
fReindex is one of the last remaining globals exposed by the kernel
library, so move it into the blockstorage class to reduce the amount of
global mutable state and make the kernel library a bit less awkward to
use.
Previously, it was possible to move the cursor back to an older file
during reindex if blocks are enocuntered out of order during reindex.
This would mean that MaxBlockfileNum() would be incorrect, and
a wrong DB_LAST_BLOCK could be written to disk.
This improves the logic by only ever moving the cursor forward (if possible)
but not backwards.
Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
The new name reflects that it is no longer called with existing blocks
for which the position is already known.
Returning a FlatFilePos directly simplifies the interface.
By calling SaveBlockToDisk only when we actually want to save a new
block to disk. In the reindex case, we now call UpdateBlockInfo
directly from validation.
This commit doesn't change behavior.
FindBlockPos does different things depending on whether the block is known
or not, as shown by the fact that much of the existing code is conditional on fKnown set or not.
If the block position is known (during reindex) the function only updates the block info
statistics. It doesn't actually find a block position in this case.
This commit removes fKnown and splits up these two code paths by introducing a separate function
for the reindex case when the block position is known.
It doesn't change behavior.
The current usage of ScheduleBatchPriority is not transparent. Once the
thread scheduling is changed, it remains unchanged for the remainder of
the thread's lifetime. So move the call from `ImportBlocks` to the init
code where it is clearer that its effect lasts for the entire lifetime
of the thread.
Users of the kernel library might not expect `ImportBlocks` to have an
influence on the thread it is called in. Particularly since it is only a
compile time option and cannot be controlled at runtime. With this patch
users of the kernel library can now choose their own scheduling policy.
fKnown is true during reindex (and only then), which deletes
any existing snapshot chainstate. As a result, this function can never
be called wth fKnown set and a snapshot chainstate.
Add an Assume for this, and make the code initializing a blockfile cursor
for the snapshot conditional on !fKnown.
This is a preparation for splitting the reindex logic out of
FindBlockPos in the following commits.
The extra `bilingual_str` argument of the fatal error notifications and
`node::AbortNode()` is often unused and when used usually contains the
same string as the message argument. It also seems to be confusing,
since it is not consistently used for errors requiring user action. For
example some assumeutxo fatal errors require the user to do something,
but are not translated.
So simplify the fatal error and abort node interfaces by only passing a
translated string. This slightly changes the fatal errors displayed to
the user.
Also de-duplicate the abort error log since it is repeated in noui.cpp.
dfcef536d0 blockstorage: do not flush block to disk if it is already there (Matthew Zipkin)
Pull request description:
Closes https://github.com/bitcoin/bitcoin/issues/2039
When reindexing from flat-file block storage there is no need to write anything back to disk, since the block data is already there. This PR skips flushing to disk those blocks that already have a known position in the datastore. Skipping this means that users can write-protect the `blk` files on disk which may be useful for security or even safely sharing that data between multiple bitcoind instances.
`FindBlockPos()` may also flush the undo data file, but again this is skipped if the corresponding block position is known, like during the initial stage of a reindex when block data is being indexed. Once the block index is complete the validation mechanism will call `ConnectBlock()` which will save undo data at that time.
The call stack looks like this:
```
init()
ThreadImport() <-- process fReindex flag
LoadExternalBlockFile()
AcceptBlock()
SaveBlockToDisk()
FindBlockPos()
FlushBlockFile() <-- unnecessary if block is already on disk
```
A larger refactor of this part of the code was started by mzumsande here: https://github.com/mzumsande/bitcoin/tree/202207_refactor_findblockpos including this fix, reviewers can let me know if the changes should be combined.
ACKs for top commit:
sipa:
utACK dfcef536d0
mzumsande:
re-ACK dfcef536d0
achow101:
ACK dfcef536d0
furszy:
Rebase diff ACK dfcef53.
Tree-SHA512: 385c5ac1288b325135398d0ddd3ab788fa98cc0ca19bd2474c74039f2ce70d5088c1d1c9d4dd10aefcbd4c757767ec5805d07ba8cee9289a66f96e6f9eaa5279
ReadRawBlockFromDisk assumes a non-null pos that has an nPos >= 8.
This simple check makes the function safer to call in the future,
so callers don't need to worry about causing UB if the pos is null.
This fixes the log output when -logsourcelocations is used.
Also, instead of 'ERROR:', the log will now say '[error]', like other
errors logged with LogError.
-BEGIN VERIFY SCRIPT-
sed -i --regexp-extended 's! error\("([^"]+)"! LogError("\1\\n"!g' $( git grep -l ' error(' ./src/ )
-END VERIFY SCRIPT-
This is needed for the next commit.
-BEGIN VERIFY SCRIPT-
# Separate sed invocations to replace one-line, and two-line error(...) calls
sed -i --regexp-extended 's!( +)return (error\(.*\);)!\1\2\n\1return false;!g' $( git grep -l 'return error(' )
sed -i --null-data --regexp-extended 's!( +)return (error\([^\n]*\n[^\n]*\);)!\1\2\n\1return false;!g' $( git grep -l 'return error(' )
-END VERIFY SCRIPT-
d298ff8b62 During IBD, prune as much as possible until we get close to where we will eventually keep blocks (Luke Dashjr)
Pull request description:
This should reduce pruning flushes even more, speeding up IBD with pruning on systems that have a sufficient dbcache.
Assumes 1 MB per block between tip and best header chain. Simply adds this to the buffer pruning is trying to leave available, which results in pruning almost everything up until we get close to where we need to be keeping blocks.
ACKs for top commit:
andrewtoth:
ACK d298ff8b62
fjahr:
utACK d298ff8b62
achow101:
ACK d298ff8b62
Tree-SHA512: 2a482376bfb177e2ba7c2f0bb0b58b02efdb38b34755a18d1fc3e869df5959c85b6f1009e1386fa8b89c4f90d520383e36bd3e21dec221042315134efb1a455b
d27b9a2248 test: fix feature_init.py file perturbation (Martin Zumsande)
ad66ca1e47 init: abort loading of blockindex in case of missing height. (Martin Zumsande)
Pull request description:
When the block index database is non-contiguous due to file corruption (i.e. it contains indexes of height `x-1` and `x+1`, but not `x`), bitcoind can currently crash with an assert in `BuildSkip()` / `GetAncestor()` during `BlockManager::LoadBlockIndex()`:
```
bitcoind: chain.cpp:112: const CBlockIndex* CBlockIndex::GetAncestor(int) const: Assertion `pindexWalk->pprev' failed.
```
This PR changes it such that we instead return an `InitError` to the user.
I stumbled upon this because I noticed that the file perturbation in `feature_init.py` wasn't working as intended, which is fixed in the second commit:
* Opening the file twice in one `with` statement would lead to `tf_read` being empty, so the test wouldn't perturb anything but replace the file with a new one. Fixed by first opening for read, then for write.
* We need to restore the previous state after perturbations, so that only the current perturbation is active and not a mix of the current and previous ones.
* I also added `checkblocks=200` to the startup parameters so that corruption in earlier blocks of `blk00000.dat` is detected during init verification and not ignored.
After fixing `feature_init.py` like that I'd run into the `assert` mentioned above (so running the testfix from the second commit without the first one is a way to reproduce it).
ACKs for top commit:
achow101:
ACK d27b9a2248
furszy:
Code ACK d27b9a224
fjahr:
Code review ACK d27b9a2248
Tree-SHA512: 2e54da6030c5813c86bd58f816401e090bb43c5b834764a5e3c0e55dbfe09e423f88042cab823db3742088204b274d4ad2abf58a3832a4b18328b11a30bf7094
When using an assumedvalid (snapshot) chainstate along with a background
chainstate, we are syncing two very different regions of the chain
simultaneously. If we use the same blockfile space for both of these
syncs, wildly different height blocks will be stored alongside one
another, making pruning ineffective.
This change implements a separate blockfile cursor for the assumedvalid
chainstate when one is in use.
Use the expected AssumeutxoData in order to bootstrap nChainTx values
for assumedvalid blockindex entries in the snapshot chainstate. This
is necessary because nChainTx is normally built up from nTx values,
which are populated using blockdata which the snapshot chainstate
does not yet have.
Introduces ChainstateManager::GetPruneRange().
The prune budget is split evenly between the number of chainstates,
however the prune budget may be exceeded if the resulting shares are
beneath `MIN_DISK_SPACE_FOR_BLOCK_FILES`.
d8041d4e04 blockstorage: Return on fatal undo file flush error (TheCharlatan)
f0207e0030 blockstorage: Return on fatal block file flush error (TheCharlatan)
5671c15f45 blockstorage: Mark FindBlockPos as nodiscard (TheCharlatan)
Pull request description:
The goal of this PR is to establish that fatal blockstorage flush errors should be treated as errors at their call site.
Prior to this patch `FlushBlockFile` may have failed without returning in `Chainstate::FlushStateToDisk`, leading to a potential write from `WriteBlockIndexDB` that may refer to a block that is not fully flushed to disk yet. By returning if either `FlushUndoFile` or `FlushBlockFile` fail, we ensure that no further write operations take place that may lead to an inconsistent database when crashing. Add `[[nodiscard]]` annotations to them such that they are not ignored in future.
Functions that call either `FlushUndoFile` or `FlushBlockFile`, need to handle these extra abort cases properly. Since `Chainstate::FlushStateToDisk` already produces an abort error in case of `WriteBlockIndexDB` failing, no extra logic for functions calling `Chainstate::FlushStateToDisk` is required.
Besides `Chainstate::FlushStateToDisk`, `FlushBlockFile` is also called by `FindBlockPos`, while `FlushUndoFile` is only called by `FlushBlockFile` and `WriteUndoDataForBlock`. For both these cases, the flush error is not further bubbled up. Instead, the error is logged and a comment is provided why bubbling up an error would be less desirable in these cases.
---
This pull request is part of a larger effort towards improving the shutdown / abort / fatal error handling in validation code. It is a first step towards implementing proper fatal error return type enforcement similar as proposed by theuni in this pull request [comment](https://github.com/bitcoin/bitcoin/pull/27711#issuecomment-1563561502). For ease of review of these critical changes, a first step would be checking that `AbortNode` leads to early and error-conveying returns at its call site. Further work for enforcing returns when `AbortNode` is called is done in https://github.com/bitcoin/bitcoin/pull/27862.
ACKs for top commit:
stickies-v:
re-ACK d8041d4
ryanofsky:
Code review ACK d8041d4e04
Tree-SHA512: 47ade9b873b15e567c8f60ca538d5a0daf32163e1031be3212a3a45eb492b866664b225f2787c9e40f3e0c089140157d8fd1039abc00c7bdfeec1b52ecd7e219
This was only explicitly used in the tests, where it can be replaced by
wrapping the original raw file pointer into a CAutoFile on creation and
then calling CAutoFile::fclose().
Also, it was used in LoadExternalBlockFile(), where it can also be
replaced by the (implicit call to the) CAutoFile destructor after
wrapping the original raw file pointer in a CAutoFile.
d506765199 [refactor] Remove compat.h from kernel headers (TheCharlatan)
36193af47c [refactor] Remove netaddress.h from kernel headers (TheCharlatan)
2b08c55f01 [refactor] Add CChainParams member to CConnman (TheCharlatan)
f0d1d8b35c [refactor] Add missing includes for next commit (TheCharlatan)
534b314a74 kernel: Move MessageStartChars to its own file (TheCharlatan)
9be330b654 [refactor] Define MessageStartChars as std::array (TheCharlatan)
37e2b01113 [refactor] Allow std::array<std::byte, N> in serialize.h (MarcoFalke)
Pull request description:
This removes the non-consensus critical `protocol.h` and `netaddress.h` headers from the kernel headers. With this patch, they are no longer required to include in order to use the libbitcoinkernel library. This also allows for the removal of the `compat.h` header from the kernel headers.
As an added future benefit it also reduces the number of of kernel headers that include the platform specific `bitcoin-config.h`.
For those interested, the currently required kernel headers can be inspected visually with the [sourcetrail](https://github.com/CoatiSoftware/Sourcetrail) tool by looking at the required includes of `bitcoin-chainstate.cpp`.
---
This is part of the [libbitcoinkernel project](https://github.com/bitcoin/bitcoin/issues/27587), namely its stage 1 step 3: Decouple most non-consensus headers from libbitcoinkernel.
ACKs for top commit:
stickies-v:
re-ACK d506765
hebasto:
ACK d506765199.
ajtowns:
utACK d506765199
MarcoFalke:
lgtm ACK d506765199🍛
Tree-SHA512: 6f90ea510a302c2927e84d16900e89997c39b8ff3ce9d4effeb8a134bd29cc52bd9e81e51aaa11f7496bad00025b78a58b88c5a9e0bb3f4ebbe9a76309215fb7
The protocol.h file contains many non-consensus related definitions and
should thus not be part of the libbitcoinkernel. This commit makes
protocol.h no longer a required include for users of the
libbitcoinkernel.
This commit is part of the libbitcoinkernel project, namely its stage 1
step 3: Decouple most non-consensus headers from libbitcoinkernel.
Co-Authored-By: Cory Fields <cory-nospam-@coryfields.com>
fae405556d scripted-diff: Rename CBlockTreeDB -> BlockTreeDB (MarcoFalke)
faf63039cc Fixup style of moved code (MarcoFalke)
fa65111b99 move-only: Move CBlockTreeDB to node/blockstorage (MarcoFalke)
fa8685597e index: Drop legacy -txindex check (MarcoFalke)
fa69148a0a scripted-diff: Use blocks_path where possible (MarcoFalke)
Pull request description:
The only reason for the check was to print a warning about an increase in storage use. Now that 22.x is EOL and everyone should have migrated (or decided to not care about storage use), remove the check.
Also, a move-only commit is included. (Rebased from https://github.com/bitcoin/bitcoin/pull/22242)
ACKs for top commit:
TheCharlatan:
ACK fae405556d, though I lack historical context to really judge the second commit fa8685597e.
stickies-v:
ACK fae405556d
Tree-SHA512: 9da8f48767ae52d8e8e21c09a40c949cc0838794f1856cc5f58a91acd3f00a3bca818c8082242b3fdc9ca5badb09059570bb3870850d3807b75a8e23b5222da1
By returning an error code if either `FlushUndoFile` or `FlushBlockFile`
fail, the caller now has to explicitly handle block undo file flushing
errors. Before this change such errors were non-explicitly ignored
without a clear rationale.
Besides the call to `FlushUndoFile` in `FlushBlockFile`, ignore its
return code at its call site in `WriteUndoDataForBlock`. There, a failed
flush of the undo data should not be indicative of a failed write.
Add [[nodiscard]] annotations to `FlushUndoFile` such that its return
value is not just ignored in the future.