1eac96a503 Compare FromUserHex result against other hex validators and parsers (Lőrinc)
19947863e1 Use BOOST_CHECK_EQUAL for optional, arith_uint256, uint256, uint160 (Lőrinc)
743ac30e34 Add std::optional support to Boost's equality check (Lőrinc)
Pull request description:
Enhanced `FromUserHex` coverage by:
* Added `std::optional` support to `BOOST_CHECK_EQUAL`, allowing direct comparisons of `std::optional<T>` with other `T` expected values.
* Increased fuzz testing for hex parsing to validate against other hex validators and parsers.
----
* Use BOOST_CHECK_EQUAL for https://github.com/bitcoin/bitcoin/pull/30569#discussion_r1706637780 arith_uint256, uint256, uint160
Example error before:
> unknown location:0: fatal error: in "validation_chainstatemanager_tests/chainstatemanager_args": std::bad_optional_access: bad_optional_access
test/validation_chainstatemanager_tests.cpp:781: last checkpoint
after:
> test/validation_chainstatemanager_tests.cpp:801: error: in "validation_chainstatemanager_tests/chainstatemanager_args": check set_opts({"-assumevalid=0"}).assumed_valid_block == uint256::ZERO has failed [std::nullopt != 0000000000000000000000000000000000000000000000000000000000000000]
ACKs for top commit:
stickies-v:
re-ACK 1eac96a503
ryanofsky:
Code review ACK 1eac96a503. Only changes since last review were auto type and fuzz test tweaks.
hodlinator:
ACK 1eac96a503
Tree-SHA512: f1d2c65f0ee4e97830700be5b330189207b11ed0c89a8cebf0f97d43308402a6b3732e10130c79a0c044f7d2eeabfb5359990825aadf02c4ec19428dcd982b00
Example error before:
> unknown location:0: fatal error: in "validation_chainstatemanager_tests/chainstatemanager_args": std::bad_optional_access: bad_optional_access
test/validation_chainstatemanager_tests.cpp:781: last checkpoint
after:
> test/validation_chainstatemanager_tests.cpp:801: error: in "validation_chainstatemanager_tests/chainstatemanager_args": check set_opts({"-assumevalid=0"}).assumed_valid_block == uint256::ZERO has failed [std::nullopt != 0000000000000000000000000000000000000000000000000000000000000000]
Also added extra minimum_chainwork test to make it symmetric with assumevalid
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
move/formatting-only change.
These tests do not cover uint256, so move them to the appropriate
test suite. Additionally, apply clang-format suggestions.
uint160S is a test-only function, and testing input that
is not allowed in uint160::FromHex() is superfluous.
Tests that can't use uint160::FromHex() because they use input
with non-hex digit characters are
a) modified by dropping the non-hex digit characters if that
provides useful test coverage.
b) dropped if the test without non-hex digit characters does
not provide useful test coverage, e.g. because it is now
duplicated.
uint256S was previously deprecated for being unsafe. All non-test
usage has already been removed in earlier commits.
1. Tests now use uint256::FromHex() or other constructors wherever
possible without further modification.
2. Tests that can't use uint256::FromHex() because they use input
with non-hex digit characters are
a) modified by dropping the non-hex digit characters if that
provides useful test coverage.
b) dropped if the test without non-hex digit characters does
not provide useful test coverage, e.g. because it is now
duplicated.
Additionally, use BOOST_CHECK_EQUAL where relevant on touched lines
to make error messages more readable.
Tests that are solely testing constructing from a hex string
are dropped, others are modified to use a uint256 constructor
or the arith_uint256 uint64_t constructor.
Since an arith_uint256 can not be constructed from a string
directly, we need to ensure that test coverage on
UintToArith256(uint256::FromHex()) is not reduced.
uint256::FromHex() already has good test coverage, but
the test coverage on UintToArith256() and ArithToUint256()
is increased in this commit by upgrading the `conversion`
test case.
Moreover, since `uint256.h` does not have any dependencies
on `arith_uint256.h`, the conversion tests are moved to
`arith_uint256_tests.cpp` so the dependency can be cleaned
up entirely in a future commit.
18d65d2772 test: use uint256::FromUserHex for RANDOM_CTX_SEED (stickies-v)
6819e5a329 node: use uint256::FromUserHex for -assumevalid parsing (stickies-v)
2e58fdb544 util: remove unused IsHexNumber (stickies-v)
8a44d7d3c1 node: use uint256::FromUserHex for -minimumchainwork parsing (stickies-v)
70e2c87737 refactor: add uint256::FromUserHex helper (stickies-v)
85b7cbfcbe test: unittest chainstatemanager_args (stickies-v)
Pull request description:
Since fad2991ba0, `uint256S` has been [deprecated](fad2991ba0 (diff-800776e2dda39116e889839f69409571a5d397de048a141da7e4003bc099e3e2R138)) because it is less robust than the `base_blob::FromHex()` introduced in [the same PR](https://github.com/bitcoin/bitcoin/pull/30482). Specifically, it tries to recover from length-mismatches, recover from untrimmed whitespace, 0x-prefix and garbage at the end, instead of simply requiring exactly 64 hex-only characters. _(see also #30532)_
This PR carves out the few `uint256S` callsites that may potentially prove a bit more controversial to change because they deal with user input and backwards incompatible behaviour change.
The main behaviour change introduced in this PR is:
- `-minimumchainwork` will raise an error when input is longer than 64 hex digits
- `-assumevalid` will raise an error when input contains invalid hex characters, or when it is longer than 64 hex digits
- test: the optional RANDOM_CTX_SEED env var will now cause tests to abort when it contains invalid hex characters, or when it is longer than 64 hex digits
After this PR, the remaining work to remove `uint256S` completely is almost entirely mechanical and/or test related. I will open that PR once #30560 is merged because it builds on that.
ACKs for top commit:
hodlinator:
re-ACK 18d65d2772
l0rinc:
ACK 18d65d2772
achow101:
ACK 18d65d2772
ryanofsky:
Code review ACK 18d65d2772. Very nice change that cleans up the API, adds checking for invalid values, makes parsing of values more consistent, and adds test coverage.
Tree-SHA512: ec118ea3d56e1dfbc4c79acdbfc797f65c4d2107b0ca9577c848b4ab9b7cb8d05db9f3c7fe8441a09291aca41bf671572431f4eddc59be8fb53abbea76bbbf86
FromUserHex will be used in future commits to construct
uint256 instances from user hex input without being
unnecessarily restrictive on formatting by allowing
0x-prefixed input that is shorter than 64 characters.
Complements uint256::FromHex() nicely in that it naturally does all error checking at compile time and so doesn't need to return an std::optional.
Will be used in the following 2 commits to replace many calls to uint256S(). uint256S() calls taking C-string literals are littered throughout the codebase and executed at runtime to perform parsing unless a given optimizer was surprisingly efficient. While this may not be a hot spot, it's better hygiene in C++20 to store the parsed data blob directly in the binary, without any parsing at runtime.
73e3fa10b4 doc + test: Correct uint256 hex string endianness (Hodlinator)
Pull request description:
This PR is a follow-up to #30436.
Only changes test-code and modifies/adds comments.
Byte order of hex string representation was wrongfully documented as little-endian, but are in fact closer to "big-endian" (endianness is a memory-order concept rather than a numeric concept). `[arith_]uint256` both store their data in arrays with little-endian byte order (`arith_uint256` has host byte order within each `uint32_t` element).
**uint256_tests.cpp** - Avoid using variable from the left side of the condition in the right side. Credits to @maflcko: https://github.com/bitcoin/bitcoin/pull/30436#discussion_r1688273553
**setup_common.cpp** - Skip needless ArithToUint256-conversion. Credits to @stickies-v: https://github.com/bitcoin/bitcoin/pull/30436#discussion_r1688621638
---
<details>
<summary>
## Logical reasoning for endianness
</summary>
1. Comparing an `arith_uint256` (`base_uint<256>`) to a `uint64_t` compares the beginning of the array, and verifies the remaining elements are zero.
```C++
template <unsigned int BITS>
bool base_uint<BITS>::EqualTo(uint64_t b) const
{
for (int i = WIDTH - 1; i >= 2; i--) {
if (pn[i])
return false;
}
if (pn[1] != (b >> 32))
return false;
if (pn[0] != (b & 0xfffffffful))
return false;
return true;
}
```
...that is consistent with little endian ordering of the array.
2. They have the same endianness (but `arith_*` has host-ordering of each `uint32_t` element):
```C++
arith_uint256 UintToArith256(const uint256 &a)
{
arith_uint256 b;
for(int x=0; x<b.WIDTH; ++x)
b.pn[x] = ReadLE32(a.begin() + x*4);
return b;
}
```
### String conversions
The reversal of order which happens when converting hex-strings <=> uint256 means strings are actually closer to big-endian, see the end of `base_blob<BITS>::SetHexDeprecated`:
```C++
unsigned char* p1 = m_data.data();
unsigned char* pend = p1 + WIDTH;
while (digits > 0 && p1 < pend) {
*p1 = ::HexDigit(trimmed[--digits]);
if (digits > 0) {
*p1 |= ((unsigned char)::HexDigit(trimmed[--digits]) << 4);
p1++;
}
}
```
Same reversal here:
```C++
template <unsigned int BITS>
std::string base_blob<BITS>::GetHex() const
{
uint8_t m_data_rev[WIDTH];
for (int i = 0; i < WIDTH; ++i) {
m_data_rev[i] = m_data[WIDTH - 1 - i];
}
return HexStr(m_data_rev);
}
```
It now makes sense to me that `SetHexDeprecated`, upon receiving a shorter hex string that requires zero-padding, would pad as if the missing hex chars where towards the end of the little-endian byte array, as they are the most significant bytes. "Big-endian" string representation is also consistent with the case where `SetHexDeprecated` receives too many hex digits and discards the leftmost ones, as a form of integer narrowing takes place.
### How I got it wrong in #30436
Previously I used the less than (`<`) comparison to prove endianness, but for `uint256` it uses `memcmp` and thereby gives priority to the *lower* bytes at the beginning of the array.
```C++
constexpr int Compare(const base_blob& other) const { return std::memcmp(m_data.data(), other.m_data.data(), WIDTH); }
```
`arith_uint256` is different in that it begins by comparing the bytes from the end, as it is using little endian representation, where the bytes toward the end are more significant.
```C++
template <unsigned int BITS>
int base_uint<BITS>::CompareTo(const base_uint<BITS>& b) const
{
for (int i = WIDTH - 1; i >= 0; i--) {
if (pn[i] < b.pn[i])
return -1;
if (pn[i] > b.pn[i])
return 1;
}
return 0;
}
```
(The commit documents that `base_blob::Compare()` is doing lexicographic ordering unlike the `arith_*`-variant which is doing numeric ordering).
</details>
ACKs for top commit:
paplorinc:
ACK 73e3fa10b4
ryanofsky:
Code review ACK 73e3fa10b4
Tree-SHA512: 121630c37ab01aa7f7097f10322ab37da3cbc0696a6bbdbf2bbd6db180dc5938c7ed91003aaa2df7cf4a4106f973f5118ba541b5e077cf3588aa641bbd528f4e
Follow-up to #30436.
uint256 string representation was wrongfully documented as little-endian due to them being reversed by GetHex() etc, and base_blob::Compare() giving most significance to the beginning of the internal array. They are closer to "big-endian", but this commit tries to be even more precise than that.
uint256_tests.cpp - Avoid using variable from the left side of the condition in the right side.
setup_common.cpp - Skip needless ArithToUint256-conversion.
This is a safe replacement of the previous SetHex, which now returns an
optional to indicate success or failure.
The code is similar to the ParseHashStr helper, which will be removed in
a later commit.
SetHex is fragile, because it accepts any non-hex input or any length of
input, without error feedback. This can lead to issues when the input is
truncated or otherwise corrupted.
Document the problem by renaming the method.
In the future, the fragile method should be removed from the public
interface.
-BEGIN VERIFY SCRIPT-
sed -i 's/SetHex/SetHexDeprecated/g' $( git grep -l SetHex ./src )
-END VERIFY SCRIPT-
fa63f16018 test: Add uint256 string parse tests (MarcoFalke)
facf629ce8 refactor: Remove unused and fragile string interface from arith_uint256 (MarcoFalke)
Pull request description:
The string interface (`base_uint(const std::string&)`, as well as `base_uint::SetHex`) is problematic for many reasons:
* It is unused (except in test-only code).
* It is redundant with the `uint256` string interface: `std::string -> uint256 -> UintToArith256`.
* It is brittle, because it inherits the brittle `uint256` string interface, which is brittle due to the use of `c_str()` (embedded null will be treated as end-of string), etc ...
Instead of fixing the interface, remove it since it is unused and redundant with `UintToArith256`.
ACKs for top commit:
ajtowns:
ACK fa63f16018
TheCharlatan:
ACK fa63f16018
Tree-SHA512: a95d5b938ffd0473361336bbf6be093d01265a626c50be1345ce2c5e582c0f3f73eb11af5fd1884019f59d7ba27e670ecffdb41d2c624ffb9aa63bd52b780e62
Replace the memset with C++11 value/aggregate initialisation of
the m_data array, which still ensures the unspecified values end
up as zero-initialised.
This then allows changing UINT256_ONE() from dynamically allocating an
object, to a simpler referencing a static allocation.
Remove the nType and nVersion as parameters to all serialization methods
and functions. There is only one place where it's read and has an impact
(in CAddress), and even there it does not impact any of the recursively
invoked serializers.
Instead, the few places that need nType or nVersion are changed to read
it directly from the stream object, through GetType() and GetVersion()
methods which are added to all stream classes.
Given that in default GetSerializeSize implementations created by
ADD_SERIALIZE_METHODS we're already using CSizeComputer(), get rid
of the specialized GetSerializeSize methods everywhere, and just use
CSizeComputer. This removes a lot of code which isn't actually used
anywhere.
For CCompactSize and CVarInt this actually removes a more efficient
size computing algorithm, which is brought back in a later commit.
Make sure that chainparams and logging is properly initialized. Doing
this for every test may be overkill, but this initialization is so
simple that that does not matter.
This should fix the travis issues.