0
0
Fork 0
mirror of https://github.com/bitcoin/bitcoin.git synced 2025-02-20 12:12:41 -05:00
bitcoin-bitcoin-core/contrib
Ava Chow 43e71f7498
Merge bitcoin/bitcoin#27432: contrib: add tool to convert compact-serialized UTXO set to SQLite database
4080b66cbe test: add test for utxo-to-sqlite conversion script (Sebastian Falbesoner)
ec99ed7380 contrib: add tool to convert compact-serialized UTXO set to SQLite database (Sebastian Falbesoner)

Pull request description:

  ## Problem description

  There is demand from users to get the UTXO set in form of a SQLite database (#24628). Bitcoin Core currently only supports dumping the UTXO set in a binary _compact-serialized_ format, which was crafted specifically for AssumeUTXO snapshots (see PR #16899), with the primary goal of being as compact as possible. Previous PRs tried to extend the `dumptxoutset` RPC with new formats, either in human-readable form (e.g. #18689, #24202), or most recently, directly as SQLite database (#24952). Both are not optimal: due to the huge size of the ever-growing UTXO set with already more than 80 million entries on mainnet, human-readable formats are practically useless, and very likely one of the first steps would be to put them in some form of database anyway. Directly adding SQLite3 dumping support on the other hand introduces an additional dependency to the non-wallet part of bitcoind and the risk of increased maintenance burden (see e.g. https://github.com/bitcoin/bitcoin/pull/24952#issuecomment-1163551060, https://github.com/bitcoin/bitcoin/issues/24628#issuecomment-1108469715).

  ## Proposed solution

  This PR follows the "external tooling" route by adding a simple Python script for achieving the same goal in a two-step process (first create compact-serialized UTXO set via `dumptxoutset`, then convert it to SQLite via the new script). Executive summary:
  - single file, no extra dependencies (sqlite3 is included in Python's standard library [1])
  - ~150 LOC, mostly deserialization/decompression routines ported from the Core codebase and (probably the most difficult part) a little elliptic curve / finite field math to decompress pubkeys (essentialy solving the secp256k1 curve equation y^2 = x^3 + 7 for y given x, respecting the proper polarity as indicated by the compression tag)
  - creates a database with only one table `utxos` with the following schema:
    ```(txid TEXT, vout INT, value INT, coinbase INT, height INT, scriptpubkey TEXT)```
  - the resulting file has roughly 2x the size of the compact-serialized UTXO set (this is mostly due to encoding txids and scriptpubkeys as hex-strings rather than bytes)

  [1] note that there are some rare cases of operating systems like FreeBSD though, where the sqlite3 module has to installed explicitly (see #26819)

  A functional test is also added that creates UTXO set entries with various output script types (standard and also non-standard, for e.g. large scripts) and verifies that the UTXO sets of both formats match by comparing corresponding MuHashes. One MuHash is supplied by the bitcoind instance via `gettxoutsetinfo muhash`, the other is calculated in the test by reading back the created SQLite database entries and hashing them with the test framework's `MuHash3072` module.

  ## Manual test instructions
  I'd suggest to do manual tests also by comparing MuHashes. For that, I've written a go tool some time ago which would calculate the MuHash of a sqlite database in the created format (I've tried to do a similar tool in Python, but it's painfully slow).
  ```
  $ [run bitcoind instance with -coinstatsindex]
  $ ./src/bitcoin-cli dumptxoutset ~/utxos.dat
  $ ./src/bitcoin-cli gettxoutsetinfo muhash <block height returned in previous call>
  (outputs MuHash calculated from node)

  $ ./contrib/utxo-tools/utxo_to_sqlite.py ~/utxos.dat ~/utxos.sqlite
  $ git clone https://github.com/theStack/utxo_dump_tools
  $ cd utxo_dump_tools/calc_utxo_hash
  $ go run calc_utxo_hash.go ~/utxos.sqlite
  (outputs MuHash calculated from the SQLite UTXO set)

  => verify that both MuHashes are equal
  ```
  For a demonstration what can be done with the resulting database, see https://github.com/bitcoin/bitcoin/pull/24952#pullrequestreview-956290477 for some example queries. Thanks go to LarryRuane who gave me to the idea of rewriting this script in Python and adding it to `contrib`.

ACKs for top commit:
  ajtowns:
    ACK 4080b66cbe - light review
  achow101:
    ACK 4080b66cbe
  romanz:
    tACK 4080b66cbe on signet (using [calc_utxo_hash](8981aa3e85/calc_utxo_hash/calc_utxo_hash.go)):
  tdb3:
    ACK 4080b66cbe

Tree-SHA512: be8aa0369a28c8421a3ccdf1402e106563dd07c082269707311ca584d1c4c8c7b97d48c4fcd344696a36e7ab8cdb64a1d0ef9a192a15cff6d470baf21e46ee7b
2025-02-14 15:22:10 -08:00
..
asmap Compare ASMaps with respect to specific addresses 2024-06-27 16:35:15 +02:00
completions testnet: Introduce Testnet4 2024-08-06 01:38:10 +02:00
debian doc: upgrade license to 2025. 2025-01-06 12:23:11 +00:00
devtools build: move rpc/external_signer to node library 2025-02-14 14:38:41 +01:00
guix guix: remove test-security/symbol-check scripts 2025-02-10 11:12:33 +01:00
init security: restrict abis in bitcoind.service 2023-08-24 16:54:47 -04:00
linearize contrib: support reading XORed blocks in linearize-data.py script 2024-08-07 23:53:39 +02:00
macdeploy Merge bitcoin/bitcoin#30287: macOS: rewrite some docs & swap mmacosx-version-min for mmacos-version-min 2024-06-18 10:55:46 +01:00
message-capture test: use built-in collection types for type hints (Python 3.9 / PEP 585) 2023-10-25 01:10:21 +02:00
qos scripted-diff: Bump copyright headers 2021-12-30 19:36:57 +02:00
seeds contrib: Update asmap link in seeds readme 2024-09-26 15:54:01 +02:00
shell guix: Add source-able bash prelude and utils 2021-04-05 11:00:21 -04:00
signet doc: update signet documentation related to build directories 2024-09-28 20:53:21 +02:00
testgen contrib: make gen_key_io_test_vectors deterministic 2022-04-06 17:02:50 +02:00
tracing tracing: log_p2p_connections.bt example 2025-02-04 10:25:36 +01:00
utxo-tools contrib: add tool to convert compact-serialized UTXO set to SQLite database 2024-12-28 02:38:57 +01:00
verify-binaries contrib: Fixup verify-binaries OS platform parsing 2024-06-25 11:32:56 -05:00
verify-commits add ryanofsky to trusted-keys 2023-05-08 23:30:56 -04:00
windeploy windeploy: Renew certificate 2024-05-21 23:19:51 -04:00
zmq scripted-diff: Bump copyright headers 2021-12-30 19:36:57 +02:00
filter-lcov.py scripted-diff: Bump copyright headers 2020-12-31 09:45:41 +01:00
README.md contrib: add tool to convert compact-serialized UTXO set to SQLite database 2024-12-28 02:38:57 +01:00
valgrind.supp doc: Prepend 'build/' to binary paths under 'src/' in docs 2024-08-29 15:23:12 +02:00

Repository Tools

Developer tools

Specific tools for developers working on this repository. Additional tools, including the github-merge.py script, are available in the maintainer-tools repository.

Verify-Commits

Tool to verify that every merge commit was signed by a developer using the github-merge.py script.

Linearize

Construct a linear, no-fork, best version of the blockchain.

Qos

A Linux bash script that will set up traffic control (tc) to limit the outgoing bandwidth for connections to the Bitcoin network. This means one can have an always-on bitcoind instance running, and another local bitcoind/bitcoin-qt instance which connects to this node and receives blocks from it.

Seeds

Utility to generate the pnSeed[] array that is compiled into the client.

Build Tools and Keys

Packaging

The Debian subfolder contains the copyright file.

All other packaging related files can be found in the bitcoin-core/packaging repository.

MacDeploy

Scripts and notes for Mac builds.

Test and Verify Tools

TestGen

Utilities to generate test vectors for the data-driven Bitcoin tests.

Verify-Binaries

This script attempts to download and verify the signature file SHA256SUMS.asc from bitcoin.org.

Command Line Tools

Completions

Shell completions for bash and fish.

UTXO Set Tools

UTXO-to-SQLite

This script converts a compact-serialized UTXO set (as generated by Bitcoin Core with dumptxoutset) to a SQLite3 database. For more details like e.g. the created table name and schema, refer to the module docstring on top of the script, which is also contained in the command's --help output.