73e3fa10b4 doc + test: Correct uint256 hex string endianness (Hodlinator)
Pull request description:
This PR is a follow-up to #30436.
Only changes test-code and modifies/adds comments.
Byte order of hex string representation was wrongfully documented as little-endian, but are in fact closer to "big-endian" (endianness is a memory-order concept rather than a numeric concept). `[arith_]uint256` both store their data in arrays with little-endian byte order (`arith_uint256` has host byte order within each `uint32_t` element).
**uint256_tests.cpp** - Avoid using variable from the left side of the condition in the right side. Credits to @maflcko: https://github.com/bitcoin/bitcoin/pull/30436#discussion_r1688273553
**setup_common.cpp** - Skip needless ArithToUint256-conversion. Credits to @stickies-v: https://github.com/bitcoin/bitcoin/pull/30436#discussion_r1688621638
---
<details>
<summary>
## Logical reasoning for endianness
</summary>
1. Comparing an `arith_uint256` (`base_uint<256>`) to a `uint64_t` compares the beginning of the array, and verifies the remaining elements are zero.
```C++
template <unsigned int BITS>
bool base_uint<BITS>::EqualTo(uint64_t b) const
{
for (int i = WIDTH - 1; i >= 2; i--) {
if (pn[i])
return false;
}
if (pn[1] != (b >> 32))
return false;
if (pn[0] != (b & 0xfffffffful))
return false;
return true;
}
```
...that is consistent with little endian ordering of the array.
2. They have the same endianness (but `arith_*` has host-ordering of each `uint32_t` element):
```C++
arith_uint256 UintToArith256(const uint256 &a)
{
arith_uint256 b;
for(int x=0; x<b.WIDTH; ++x)
b.pn[x] = ReadLE32(a.begin() + x*4);
return b;
}
```
### String conversions
The reversal of order which happens when converting hex-strings <=> uint256 means strings are actually closer to big-endian, see the end of `base_blob<BITS>::SetHexDeprecated`:
```C++
unsigned char* p1 = m_data.data();
unsigned char* pend = p1 + WIDTH;
while (digits > 0 && p1 < pend) {
*p1 = ::HexDigit(trimmed[--digits]);
if (digits > 0) {
*p1 |= ((unsigned char)::HexDigit(trimmed[--digits]) << 4);
p1++;
}
}
```
Same reversal here:
```C++
template <unsigned int BITS>
std::string base_blob<BITS>::GetHex() const
{
uint8_t m_data_rev[WIDTH];
for (int i = 0; i < WIDTH; ++i) {
m_data_rev[i] = m_data[WIDTH - 1 - i];
}
return HexStr(m_data_rev);
}
```
It now makes sense to me that `SetHexDeprecated`, upon receiving a shorter hex string that requires zero-padding, would pad as if the missing hex chars where towards the end of the little-endian byte array, as they are the most significant bytes. "Big-endian" string representation is also consistent with the case where `SetHexDeprecated` receives too many hex digits and discards the leftmost ones, as a form of integer narrowing takes place.
### How I got it wrong in #30436
Previously I used the less than (`<`) comparison to prove endianness, but for `uint256` it uses `memcmp` and thereby gives priority to the *lower* bytes at the beginning of the array.
```C++
constexpr int Compare(const base_blob& other) const { return std::memcmp(m_data.data(), other.m_data.data(), WIDTH); }
```
`arith_uint256` is different in that it begins by comparing the bytes from the end, as it is using little endian representation, where the bytes toward the end are more significant.
```C++
template <unsigned int BITS>
int base_uint<BITS>::CompareTo(const base_uint<BITS>& b) const
{
for (int i = WIDTH - 1; i >= 0; i--) {
if (pn[i] < b.pn[i])
return -1;
if (pn[i] > b.pn[i])
return 1;
}
return 0;
}
```
(The commit documents that `base_blob::Compare()` is doing lexicographic ordering unlike the `arith_*`-variant which is doing numeric ordering).
</details>
ACKs for top commit:
paplorinc:
ACK 73e3fa10b4
ryanofsky:
Code review ACK 73e3fa10b4
Tree-SHA512: 121630c37ab01aa7f7097f10322ab37da3cbc0696a6bbdbf2bbd6db180dc5938c7ed91003aaa2df7cf4a4106f973f5118ba541b5e077cf3588aa641bbd528f4e
Follow-up to #30436.
uint256 string representation was wrongfully documented as little-endian due to them being reversed by GetHex() etc, and base_blob::Compare() giving most significance to the beginning of the internal array. They are closer to "big-endian", but this commit tries to be even more precise than that.
uint256_tests.cpp - Avoid using variable from the left side of the condition in the right side.
setup_common.cpp - Skip needless ArithToUint256-conversion.
This is a safe replacement of the previous SetHex, which now returns an
optional to indicate success or failure.
The code is similar to the ParseHashStr helper, which will be removed in
a later commit.
SetHex is fragile, because it accepts any non-hex input or any length of
input, without error feedback. This can lead to issues when the input is
truncated or otherwise corrupted.
Document the problem by renaming the method.
In the future, the fragile method should be removed from the public
interface.
-BEGIN VERIFY SCRIPT-
sed -i 's/SetHex/SetHexDeprecated/g' $( git grep -l SetHex ./src )
-END VERIFY SCRIPT-
fa63f16018 test: Add uint256 string parse tests (MarcoFalke)
facf629ce8 refactor: Remove unused and fragile string interface from arith_uint256 (MarcoFalke)
Pull request description:
The string interface (`base_uint(const std::string&)`, as well as `base_uint::SetHex`) is problematic for many reasons:
* It is unused (except in test-only code).
* It is redundant with the `uint256` string interface: `std::string -> uint256 -> UintToArith256`.
* It is brittle, because it inherits the brittle `uint256` string interface, which is brittle due to the use of `c_str()` (embedded null will be treated as end-of string), etc ...
Instead of fixing the interface, remove it since it is unused and redundant with `UintToArith256`.
ACKs for top commit:
ajtowns:
ACK fa63f16018
TheCharlatan:
ACK fa63f16018
Tree-SHA512: a95d5b938ffd0473361336bbf6be093d01265a626c50be1345ce2c5e582c0f3f73eb11af5fd1884019f59d7ba27e670ecffdb41d2c624ffb9aa63bd52b780e62
Replace the memset with C++11 value/aggregate initialisation of
the m_data array, which still ensures the unspecified values end
up as zero-initialised.
This then allows changing UINT256_ONE() from dynamically allocating an
object, to a simpler referencing a static allocation.
Remove the nType and nVersion as parameters to all serialization methods
and functions. There is only one place where it's read and has an impact
(in CAddress), and even there it does not impact any of the recursively
invoked serializers.
Instead, the few places that need nType or nVersion are changed to read
it directly from the stream object, through GetType() and GetVersion()
methods which are added to all stream classes.
Given that in default GetSerializeSize implementations created by
ADD_SERIALIZE_METHODS we're already using CSizeComputer(), get rid
of the specialized GetSerializeSize methods everywhere, and just use
CSizeComputer. This removes a lot of code which isn't actually used
anywhere.
For CCompactSize and CVarInt this actually removes a more efficient
size computing algorithm, which is brought back in a later commit.
Make sure that chainparams and logging is properly initialized. Doing
this for every test may be overkill, but this initialization is so
simple that that does not matter.
This should fix the travis issues.
- Add license headers to source files (years based on commit dates)
in `src/test` as well as `qa`
- Add `README.md` to `src/test/data` specifying MIT license
Fixes #3848
Unit tests for uint256.h. The file uint160_tests.cpp is no longer
needed. The ad-hoc tests which were in uint256.h are also no longer
needed. The new tests achieve 100% coverage.
Use misc methods of avoiding unnecesary header includes.
Replace int typedefs with int##_t from stdint.h.
Replace PRI64[xdu] with PRI[xdu]64 from inttypes.h.
Normalize QT_VERSION ifs where possible.
Resolve some indirect dependencies as direct ones.
Remove extern declarations from .cpp files.
class template base_uint had its own private lookup table.
This is saving 256 bytes per instantiation.
The result is not spectacular as bitcoin-qt has only shrinked of
about 1Kb but it is still valid improvement.
Also, I have replaced a for loop with a memset() call.
Made CBigNum::SetHex() use the new HexDigit() function.
Signed-off-by: Olivier Langlois <olivier@olivierlanglois.net>