fe92c15f0c script/sign: avoid duplicated signature verification after signing (Sebastian Falbesoner)
080089567c bench: add benchmark for `SignTransaction` (Sebastian Falbesoner)
Pull request description:
This PR is a small performance improvement on the `SignTransaction` function, which is used mostly by the wallet (obviously) and the `signrawtransactionwithkey` RPC. The lower-level function `ProduceSignature` already calls `VerifyScript` internally as last step in order to check whether the signature data is complete:
daa56f7f66/src/script/sign.cpp (L568-L570)
If and only if that is the case, the `complete` field of the `SignatureData` is set to `true` accordingly and there is no need then to verify the script after again, as we already know that it would succeed.
This leads to a rough ~20% speed-up for `SignTransaction` for single-input ECDSA or Taproot transactions, according to the newly introduced `SignTransaction{ECDSA,Taproot}` benchmarks:
```
$ ./src/bench/bench_bitcoin --filter=SignTransaction.*
```
without commit 18185f4f578b8795fdaa75926630a691e9c8d0d4:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 185,597.79 | 5,388.00 | 1.6% | 0.22 | `SignTransactionECDSA`
| 141,323.95 | 7,075.94 | 2.1% | 0.17 | `SignTransactionSchnorr`
with commit 18185f4f578b8795fdaa75926630a691e9c8d0d4:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 149,757.86 | 6,677.45 | 1.4% | 0.18 | `SignTransactionECDSA`
| 108,284.40 | 9,234.94 | 2.0% | 0.13 | `SignTransactionSchnorr`
Note that there are already signing benchmarks in the secp256k1 library, but `SignTransaction` does much more than just the cryptographical parts, i.e.:
* calculate the unsigned tx's `PrecomputedTransactionData` if necessary
* apply Solver on the prevout scriptPubKey, fetch the relevant keys from the signing provider
* perform the actual signing operation (for ECDSA signatures, that could be more than once due to low-R grinding)
* verify if the signatures are correct by calling `VerifyScript` (more than once currently, which is fixed by this PR)
so it probably makes sense to also have benchmarks from that higher-level application perspective.
ACKs for top commit:
achow101:
ACK fe92c15f0c
furszy:
utACK fe92c15f0c
glozow:
light review ACK fe92c15f0c
Tree-SHA512: b7225ff9e8a640ca5222dea5b2a463a0f9b9de704e4330b5b9a7bce2d63a1f4620575c474a8186f4708d7d9534eab55d000393d99db79c0cfc046b35f0a4a778
Remove the variant of ChaCha20Poly1305 AEAD that was previously added in
anticipation of BIP324 using it. BIP324 was updated to instead use rekeying
wrappers around otherwise unmodified versions of the ChaCha20 stream cipher
and the ChaCha20Poly1305 AEAD as specified in RFC8439.
A memory resource similar to std::pmr::unsynchronized_pool_resource, but
optimized for node-based containers.
Co-Authored-By: Pieter Wuille <pieter@wuille.net>
db929893ef Faster -reindex by initially deserializing only headers (Larry Ruane)
c72de9990a util: add CBufferedFile::SkipTo() to move ahead in the stream (Larry Ruane)
48a68908ba Add LoadExternalBlockFile() benchmark (Larry Ruane)
Pull request description:
### Background
During the first part of reindexing, `LoadExternalBlockFile()` sequentially reads raw blocks from the `blocks/blk00nnn.dat` files (rather than receiving them from peers, as with initial block download) and eventually adds all of them to the block index. When an individual block is initially read, it can't be immediately added unless all its ancestors have been added, which is rare (only about 8% of the time), because the blocks are not sorted by height. When the block can't be immediately added to the block index, its disk location is saved in a map so it can be added later. When its parent is later added to the block index, `LoadExternalBlockFile()` reads and deserializes the block from disk a second time and adds it to the block index. Most blocks (92%) get deserialized twice.
### This PR
During the initial read, it's rarely useful to deserialize the entire block; only the header is needed to determine if the block can be added to the block index immediately. This change to `LoadExternalBlockFile()` initially deserializes only a block's header, then deserializes the entire block only if it can be added immediately. This reduces reindex time on mainnet by 7 hours on a Raspberry Pi, which translates to around a 25% reduction in the first part of reindexing (adding blocks to the index), and about a 6% reduction in overall reindex time.
Summary: The performance gain is the result of deserializing each block only once, except its header which is deserialized twice, but the header is only 80 bytes.
ACKs for top commit:
andrewtoth:
ACK db929893ef
achow101:
ACK db929893ef
aureleoules:
ACK db929893ef - minor changes and new benchmark since last review
theStack:
re-ACK db929893ef
stickies-v:
re-ACK db929893e
Tree-SHA512: 5a5377192c11edb5b662e18f511c9beb8f250bc88aeadf2f404c92c3232a7617bade50477ebf16c0602b9bd3b68306d3ee7615de58acfd8cae664d28bb7b0136
Goal 1:
Benchmark the transaction creation process for pre-selected-inputs only.
Setting `m_allow_other_inputs=false` to disallow the wallet to include coins automatically.
Goal 2:
Benchmark the transaction creation process for pre-selected-inputs and coin selection.
-----------------------
Benchmark Setup:
1) Generates a 5k blockchain, loading the wallet with 5k transactions with two outputs each.
2) Fetch 4 random UTXO from the wallet's available coins and pre-select them as inputs inside CoinControl.
Benchmark (Goal 1):
Call `CreateTransaction` providing the coin control, who has set `m_allow_other_inputs=false` and
the manually selected coins.
Benchmark (Goal 2):
Call `CreateTransaction` providing the coin control, who has set `m_allow_other_inputs=true` and
the manually selected coins.
035fa1f07a build: Remove LIBTOOL_APP_LDFLAGS for bitcoin-chainstate (Cory Fields)
3f0595095d docs: Add libbitcoinkernel_la_SOURCES explanation (Carl Dong)
94ad45deb2 ci: Build libbitcoinkernel (Carl Dong)
26b2e7ffb3 build: Extract the libbitcoinkernel library (Carl Dong)
1df44dd20c b-cs: Define G_TRANSLATION_FUN in bitcoinkernel.cpp (Carl Dong)
83a0bb7cc9 build: Separate lib_LTLIBRARIES initialization (Carl Dong)
c1e16cb31f build: Create .la library for bitcoincrypto (Carl Dong)
8bdfe057c7 build: Create .la library for leveldb (Carl Dong)
05d1525b6d build: Create .la library for crc32c (Carl Dong)
64caf94479 build: Remove vestigial LIBLEVELDB_SSE42 (Carl Dong)
1392e8e2d8 build: Don't add unrelated libs to LIBTEST_* (Carl Dong)
Pull request description:
Part of: #24303
This PR introduces a `libbitcoinkernel` static library linking in the minimal list of files necessary to use our consensus engine as-is. `bitcoin-chainstate` introduced in #24304 now will link against `libbitcoinkernel`.
Most of the changes are related to the build system.
Please read the commit messages for more details.
ACKs for top commit:
theuni:
This may be my favorite PR ever. It's a privilege to ACK 035fa1f07a.
Tree-SHA512: b755edc3471c7c1098847e9b16ab182a6abb7582563d9da516de376a770ac7543c6fdb24238ddd4d3d2d458f905a0c0614b8667aab182aa7e6b80c1cca7090bc
- LIBLEVELDB_SSE42_INT was defined, but never referenced anywhere
- LIBLEVELDB_SSE42 is referenced, but never defined anywhere
Apparently leveldb used to have platform-specific crc32 code before it
got split off into a separate lib.
This was used to, in effect, manually emulate --start-group/--end-group.
However, we can just order the libraries correctly and avoid specifying
libraries multiple times on the link line.
Note: lld (not ld.bfd) knows how to resolve out-of-order references and
doesn't seem to need the reodering
fafe06c379 bench: Sort bench_bench_bitcoin_SOURCES (MarcoFalke)
fa31dc9b71 bench: Add logging benchmark (MarcoFalke)
Pull request description:
Might make finding performance bottlenecks or regressions (https://github.com/bitcoin/bitcoin/pull/17218) easier.
For example, fuzzing relies on disabled logging to be as fast as possible.
ACKs for top commit:
dergoegge:
ACK fafe06c379
Tree-SHA512: dd858f3234a4dfb00bd7dec4398eb076370a4b9746aa24eecee7da86f6882398a2d086e5ab0b7c9f7321abcb135e7ffc54cc78e60d18b90379c6dba6d613b3f7
Note that with this change we are no-longer including PTHREAD_* flags
when building libbitcoinconsensus.
Also note that we are including PTHREAD_LIBS in AM_PTHREAD_FLAGS
This replaces the current benchmarking framework with nanobench [1], an
MIT licensed single-header benchmarking library, of which I am the
autor. This has in my opinion several advantages, especially on Linux:
* fast: Running all benchmarks takes ~6 seconds instead of 4m13s on
an Intel i7-8700 CPU @ 3.20GHz.
* accurate: I ran e.g. the benchmark for SipHash_32b 10 times and
calculate standard deviation / mean = coefficient of variation:
* 0.57% CV for old benchmarking framework
* 0.20% CV for nanobench
So the benchmark results with nanobench seem to vary less than with
the old framework.
* It automatically determines runtime based on clock precision, no need
to specify number of evaluations.
* measure instructions, cycles, branches, instructions per cycle,
branch misses (only Linux, when performance counters are available)
* output in markdown table format.
* Warn about unstable environment (frequency scaling, turbo, ...)
* For better profiling, it is possible to set the environment variable
NANOBENCH_ENDLESS to force endless running of a particular benchmark
without the need to recompile. This makes it to e.g. run "perf top"
and look at hotspots.
Here is an example copy & pasted from the terminal output:
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 2.52 | 396,529,415.94 | 0.6% | 25.42 | 8.02 | 3.169 | 0.06 | 0.0% | 0.03 | `bench/crypto_hash.cpp RIPEMD160`
| 1.87 | 535,161,444.83 | 0.3% | 21.36 | 5.95 | 3.589 | 0.06 | 0.0% | 0.02 | `bench/crypto_hash.cpp SHA1`
| 3.22 | 310,344,174.79 | 1.1% | 36.80 | 10.22 | 3.601 | 0.09 | 0.0% | 0.04 | `bench/crypto_hash.cpp SHA256`
| 2.01 | 496,375,796.23 | 0.0% | 18.72 | 6.43 | 2.911 | 0.01 | 1.0% | 0.00 | `bench/crypto_hash.cpp SHA256D64_1024`
| 7.23 | 138,263,519.35 | 0.1% | 82.66 | 23.11 | 3.577 | 1.63 | 0.1% | 0.00 | `bench/crypto_hash.cpp SHA256_32b`
| 3.04 | 328,780,166.40 | 0.3% | 35.82 | 9.69 | 3.696 | 0.03 | 0.0% | 0.03 | `bench/crypto_hash.cpp SHA512`
[1] https://github.com/martinus/nanobench
* Adds support for asymptotes
This adds support to calculate asymptotic complexity of a benchmark.
This is similar to #17375, but currently only one asymptote is
supported, and I have added support in the benchmark `ComplexMemPool`
as an example.
Usage is e.g. like this:
```
./bench_bitcoin -filter=ComplexMemPool -asymptote=25,50,100,200,400,600,800
```
This runs the benchmark `ComplexMemPool` several times but with
different complexityN settings. The benchmark can extract that number
and use it accordingly. Here, it's used for `childTxs`. The output is
this:
| complexityN | ns/op | op/s | err% | ins/op | cyc/op | IPC | total | benchmark
|------------:|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|----------:|:----------
| 25 | 1,064,241.00 | 939.64 | 1.4% | 3,960,279.00 | 2,829,708.00 | 1.400 | 0.01 | `ComplexMemPool`
| 50 | 1,579,530.00 | 633.10 | 1.0% | 6,231,810.00 | 4,412,674.00 | 1.412 | 0.02 | `ComplexMemPool`
| 100 | 4,022,774.00 | 248.58 | 0.6% | 16,544,406.00 | 11,889,535.00 | 1.392 | 0.04 | `ComplexMemPool`
| 200 | 15,390,986.00 | 64.97 | 0.2% | 63,904,254.00 | 47,731,705.00 | 1.339 | 0.17 | `ComplexMemPool`
| 400 | 69,394,711.00 | 14.41 | 0.1% | 272,602,461.00 | 219,014,691.00 | 1.245 | 0.76 | `ComplexMemPool`
| 600 | 168,977,165.00 | 5.92 | 0.1% | 639,108,082.00 | 535,316,887.00 | 1.194 | 1.86 | `ComplexMemPool`
| 800 | 310,109,077.00 | 3.22 | 0.1% |1,149,134,246.00 | 984,620,812.00 | 1.167 | 3.41 | `ComplexMemPool`
| coefficient | err% | complexity
|--------------:|-------:|------------
| 4.78486e-07 | 4.5% | O(n^2)
| 6.38557e-10 | 21.7% | O(n^3)
| 3.42338e-05 | 38.0% | O(n log n)
| 0.000313914 | 46.9% | O(n)
| 0.0129823 | 114.4% | O(log n)
| 0.0815055 | 133.8% | O(1)
The best fitting curve is O(n^2), so the algorithm seems to scale
quadratic with `childTxs` in the range 25 to 800.