Technical details
Architecture
There are 3 layers in the framework that could be used completely independent of each other:
view- visualize blockchain data (token supply, balances, prices, …)data- work with blockchain data (filter, join, export to csv, …)fetcher- fetch and cache raw blockchain data (events, function calls, balances, …)
Behind each layer there’s a powerful opensource Python library
Configuration
The configuration params can be set in two ways:
Provided directly into constructor of any class
Set globally for all classes via env variable
Params:
__init__ name |
env variable |
Default |
Description |
|---|---|---|---|
rpc |
WEB3_PROVIDER_URI |
Ethereum (or other EVM chain) endpoint |
|
cache_path |
WEB3_CACHE_PATH |
Path for sqlite3 database cache |
|
block_grid_step |
WEB3_BLOCK_GRID_STEP |
1000 |
Minimum gap between fetched blocks
(see |
WEB3_DISABLE_STDOUT |
If set (1, or |
||
w3 |
Instantiated from |
Instance of |
|
conn |
Instantiated from |
Instance of |
Block grid
It’s often desirable to convert block number to timestamp and vice versa. In a way, blocks are blockchain-readable, and timestamps are human-readable.
However, fetching every single block is impractical in many cases.
That’s why the following algorithm is used for timestamp estimation:
We make a block number grid with a width specified by the
block_grid_stepparameter.For each block number, we take the two closest grid blocks (below and above).
Fetch the grid blocks
- Assume \(a_n\) and \(a_t\) is a number
and a timestamp for the block above
- Assume \(b_n\) and \(b_t\) is a number
and a timestamp for the block below
- Assume \(c_n\) and \(c_t\) is a number
and a timestamp for the block we’re looking for
\(w = (c_n - b_n) / (a_n - b_n)\)
Then \(c_t = b_t \\cdot (1-w) + a_t * w\)
This algorithm gives a reasonably good approximation for the block timestamp and considerably reduces the number of block fetches. For example, if we have 500 events happening in the 1000 - 2000 block range, then we fetch only two blocks (1000, 2000) instead of 500.
If you still want the exact precision, use
block_grid_step = 1.
- Warning:
It’s highly advisable to use a single
block_grid_stepfor all data. Otherwise (in theory) the happens-before relationship might be violated for the data points.
Caching
All fetched data is logically cached inside the sqlite3 database.
The caching techiques is intelligent and allows to fetch additional
data in increments and not refecth anything at all.
For example, if you fetched ERC20 Transfer events from block
10_000 to block 20_000, then a subsequent request from
block 15_000 to block 21_000 will return 15_000 - 20_000 from cache
and fetch only 20_000 - 21_000 from web3 and save to cache.
A subsequent request for blocks 15_000 - 21_000 will read
everything from cache.
Additionally, it supports nesting of the argument filters. Imagine you
Fetched all Transfer events from blocks 2000 - 4000
Fetched Transfer events with filter
{"from": "0x1234..."}for blocks 4000 - 6000Fetched Transfer events with filter
{"to": "0x5678..."}for blocks 6000 - 8000
Now when you query events with filter {"from": "0x1234...", "to": "0x5678..."}
for blocks 2000 - 8000 cache is smart enough
to figure out that all events are already in cache and
serve the cached result.
The other good thing is that sharing cache is easy - just transfer the sqlite3 file to another device and you have all the data.