Technical details

Architecture

There are 3 layers in the framework that could be used completely independent of each other:

view - visualize blockchain data (token supply, balances, prices, …)
data - work with blockchain data (filter, join, export to csv, …)
fetcher - fetch and cache raw blockchain data (events, function calls, balances, …)

Behind each layer there’s a powerful opensource Python library

Configuration

The configuration params can be set in two ways:

Provided directly into constructor of any class
Set globally for all classes via env variable

Params:

__init__ name	env variable	Default	Description
rpc	WEB3_PROVIDER_URI		Ethereum (or other EVM chain) endpoint
cache_path	WEB3_CACHE_PATH		Path for sqlite3 database cache
block_grid_step	WEB3_BLOCK_GRID_STEP	1000	Minimum gap between fetched blocks (see `fetcher.core.Core`)
	WEB3_DISABLE_STDOUT		If set (1, or `true`) - no std output
w3		Instantiated from `rpc` param	Instance of `web3.Web3`
conn		Instantiated from `cache_path` param	Instance of `sqlite3.Connection`

Block grid

It’s often desirable to convert block number to timestamp and vice versa. In a way, blocks are blockchain-readable, and timestamps are human-readable.

However, fetching every single block is impractical in many cases.

That’s why the following algorithm is used for timestamp estimation:

We make a block number grid with a width specified by the block_grid_step parameter.

For each block number, we take the two closest grid blocks (below and above).

Fetch the grid blocks

Assume \(a_n\) and \(a_t\) is a number
and a timestamp for the block above

Assume \(b_n\) and \(b_t\) is a number
and a timestamp for the block below

Assume \(c_n\) and \(c_t\) is a number
and a timestamp for the block we’re looking for

\(w = (c_n - b_n) / (a_n - b_n)\)

Then \(c_t = b_t \\cdot (1-w) + a_t * w\)

This algorithm gives a reasonably good approximation for the block timestamp and considerably reduces the number of block fetches. For example, if we have 500 events happening in the 1000 - 2000 block range, then we fetch only two blocks (1000, 2000) instead of 500.

If you still want the exact precision, use block_grid_step = 1.

Warning:: It’s highly advisable to use a single block_grid_step for all data. Otherwise (in theory) the happens-before relationship might be violated for the data points.

Caching

All fetched data is logically cached inside the sqlite3 database. The caching techiques is intelligent and allows to fetch additional data in increments and not refecth anything at all. For example, if you fetched ERC20 Transfer events from block 10_000 to block 20_000, then a subsequent request from block 15_000 to block 21_000 will return 15_000 - 20_000 from cache and fetch only 20_000 - 21_000 from web3 and save to cache. A subsequent request for blocks 15_000 - 21_000 will read everything from cache.

Additionally, it supports nesting of the argument filters. Imagine you

Fetched all Transfer events from blocks 2000 - 4000
Fetched Transfer events with filter {"from": "0x1234..."} for blocks 4000 - 6000
Fetched Transfer events with filter {"to": "0x5678..."} for blocks 6000 - 8000

Now when you query events with filter {"from": "0x1234...", "to": "0x5678..."} for blocks 2000 - 8000 cache is smart enough to figure out that all events are already in cache and serve the cached result.

The other good thing is that sharing cache is easy - just transfer the sqlite3 file to another device and you have all the data.