5.1.0
Migration from 5.0 (optional)
- Run
initcommand. Now you have two conflicting hooks:on_rollbackandon_index_rollback. Follow the guide below to perform the migration.ConflictingHooksErrorexception will be raised until then.
What's New
Per-index rollback hook
In this release, we continue to improve the rollback-handling experience, which became much more important since the Ithaca protocol reached mainnet. Let's briefly recap how DipDup currently processes chain reorgs before calling a rollback hook:
- If the
buffer_sizeoption of a TzKT datasource is set to a non-zero value, and there are enough data messages buffered when a rollback occurs, data is just dropped from the buffer, and indexing continues. - If all indexes in the config are
operationones, we can attempt to process a single-level rollback. All operations from rolled back block must be presented in the next one for rollback to succeed. If some operations are missing, theon_rollbackhook will be called as usual. - Finally, we can safely ignore indexes with a level lower than the rollback target. The index level is updated either on synchronization or when at least one related operation or bigmap diff has been extracted from a realtime message.
If none of these tricks have worked, we can't process a rollback without custom logic. Here's where changes begin. Before this release, every project contained the on_rollback hook, which receives datasource: IndexDatasource argument and from/to levels. Even if your deployment has thousands of indexes and only a couple of them are affected by rollback, you weren't able to easily find out which ones.
Now on_rollback hook is deprecated and superseded by the on_index_rollback one. Choose one of the following options:
- You haven't touched the
on_rollbackhook since project creation. Runinitcommand and removehooks/on_rollbackandsql/on_rollbackdirectories in project root. Default action (reindexing) has not changed. - You have some custom logic in
on_rollbackhook and want to leave it as-is for now. You can ignore introduced changes at least till the next major release. - You have implemented per-datasource rollback logic and are ready to switch to the per-index one. Run
init, move your code to theon_index_rollbackhook and deleteon_rollbackone. Note, you can access rolled back datasource viaindex.datasource.
Token transfer index
Sometimes implementing an operation index is overkill for a specific task. An existing alternative is to use a big_map index to process only the diffs of selected big map paths. However, you still need to have a separate index for each contract of interest, which is very resource-consuming. A widespread case is indexing FA1.2/FA2 token contracts. So, this release introduces a new token_transfer index:
indexes:
transfers:
kind: token_transfer
datasource: tzkt
handlers:
- callback: transfersThe TokenTransferData object is passed to the handler on each operation, containing only information enough to process a token transfer.
config env command to generate env-files
config env command to generate env-filesGenerally, It's good to separate a project config from deployment parameters, and DipDup has multiple options to achieve this. First of all, multiple configs can be chained successively, overriding top-level sections. Second, the DipDup config can contain docker-compose-style environment variable declarations. Let's say your config contains the following content:
database:
kind: postgres
host: db
port: 5432
user: ${POSTGRES_USER:-dipdup}
password: ${POSTGRES_PASSWORD:-changeme}
database: ${POSTGRES_DB:-dipdup}You can generate an env-file to use with this exact config:
$ dipdup -c dipdup.yml -c dipdup.docker.yml config env
POSTGRES_USER=dipdup
POSTGRES_PASSWORD=changeme
POSTGRES_DB=dipdupThe environment of your current shell is also taken into account:
$ POSTGRES_DB=foobar dipdup -c dipdup.yml -c dipdup.docker.yml config env
POSTGRES_USER=dipdup
POSTGRES_PASSWORD=changeme
POSTGRES_DB=foobar # <- set from current envUse -f <filename> option to save output on disk instead of printing to stdout. After you have modified the env-file according to your needs, you can apply it the way which is more convenient to you:
With dipdup --env-file / -e option:
dipdup -e prod.env <...> runWhen using docker-compose:
services:
indexer:
...
env_file: prod.envKeeping framework up-to-date
A bunch of new tags is now pushed to the Docker Hub on each release in addition to the X.Y.Z one: X.Y and X. That way, you can stick to a specific release without the risk of leaving a minor/major update unattended (friends don't let friends use latest 😉). The -pytezos flavor is also available for each tag.
FROM dipdup/dipdup:5.1
...In addition, DipDup will poll GitHub for new releases on each command which executes reasonably long and print a warning when running an outdated version. You can disable these checks with advanced.skip_version_check flag.
Pro tip: you can also enable notifications on the GitHub repo page with 👁 Watch -> Custom -> tick Releases -> Apply to never miss a fresh DipDup release.
Changelog
See full 5.1.0 changelog here.
5.0.0
⚠ Breaking Changes
- Python versions 3.8 and 3.9 are no longer supported.
bcddatasource has been removed.- Two internal tables were added,
dipdup_contract_metadataanddipdup_token_metadata. - Some methods of
tzktdatasource have changed their signatures and behavior. - Dummy
advanced.oneshotconfig flag has been removed. - Dummy
schema approve --hashescommand flag has been removed. docker initcommand has been removed.ReindexingReasonenumeration items have been changed.
Migration from 4.x
- Ensure that you have a
python = "^3.10"dependency inpyproject.toml. - Remove
bcddatasources from config. Usemetadatadatasource instead to fetch contract and token metadata. - Update
tzktdatasource method calls as described below. - Run the
dipdup schema approvecommand on every database you use with 5.0.0. - Update usage of
ReindexingReasonenumeration if needed.
What's New
Process realtime messages with lag
Chain reorgs have occurred much recently since the Ithaca protocol reached mainnet. The preferable way to deal with rollbacks is the on_rollback hook. But if the logic of your indexer is too complex, you can buffer an arbitrary number of levels before processing to avoid reindexing.
datasources:
tzkt_mainnet:
kind: tzkt
url: https://api.tzkt.io
buffer_size: 2DipDup tries to remove backtracked operations from the buffer instead emitting rollback. Ithaca guarantees operations finality after one block and blocks finality after two blocks, so to completely avoid reorgs, buffer_size should be 2.
BCD API takedown
Better Call Dev API was officially deprecated in February. Thus, it's time to go for bcd datasource. In DipDup, it served the only purpose of fetching contract and token metadata. Now there's a separate metadata datasource which do the same thing but better. If you have used bcd datasource for custom requests, see How to migrate from BCD to TzKT API article.
TzKT batch request pagination
Historically, most TzktDatasource methods had a page iteration logic hidden inside. The quantity of items returned by TzKT in a single request is configured in HTTPConfig.batch_size and defaulted to 10.000. Before this release, three requests would be performed by the get_big_map method to fetch 25.000 big map keys, leading to performance degradation and extensive memory usage.
| affected method | response size in 4.x | response size in 5.x |
|---|---|---|
get_similar_contracts | unlimited | max. datasource.request_limit |
get_originated_contracts | unlimited | max. datasource.request_limit |
get_big_map | unlimited | max. datasource.request_limit |
get_contract_big_maps | unlimited | max. datasource.request_limit |
get_quotes | first datasource.request_limit | max. datasource.request_limit |
All paginated methods now behave the same way. You can either iterate over pages manually or use iter_... helpers.
datasource = ctx.get_tzkt_datasource('tzkt_mainnet')
batch_iter = datasource.iter_big_map(
big_map_id=big_map_id,
level=last_level,
)
async for key_batch in batch_iter:
for key in key_batch:
...Metadata interface for TzKT integration
Starting with 5.0 you can store and expose custom contract and token metadata in the same format DipDup Metadata service does for TZIP-compatible metadata.
Enable this feature with advanced.metadata_interface flag, then update metadata in any callback:
await ctx.update_contract_metadata(
network='mainnet',
address='KT1...',
metadata={'foo': 'bar'},
)Metadata stored in dipdup_contract_metadata and dipdup_token_metadata tables and available via GraphQL and REST APIs.
Prometheus integration
This version introduces initial Prometheus integration. It could help you set up monitoring, find performance issues in your code, and so on. To enable this integration, add the following lines to the config:
prometheus:
host: 0.0.0.0Changes since 4.2.7
Added
- config: Added
customsection to store arbitrary user data. - metadata: Added
metadata_interfacefeature flag to expose metadata in TzKT format. - prometheus: Added ability to expose Prometheus metrics.
- tzkt: Added ability to process realtime messages with lag.
- tzkt: Added missing fields to the
HeadBlockDatamodel. - tzkt: Added
iter_...methods to iterate over item batches.
Fixed
- config: Fixed default SQLite path (
:memory:). - prometheus: Fixed invalid metric labels.
- tzkt: Fixed pagination in several getter methods.
- tzkt: Fixed data loss when
skip_historyoption is enabled. - tzkt: Fixed crash in methods that do not support cursor pagination.
- tzkt: Fixed possible OOM while calling methods that support pagination.
- tzkt: Fixed possible data loss in
get_originationsandget_quotesmethods.
Changed
- tzkt: Added
offsetandlimitarguments to all methods that support pagination.
Removed
- bcd: Removed
bcddatasource and config section. - cli: Removed
docker initcommand. - cli: Removed dummy
schema approve --hashesflag. - config: Removed dummy
advanced.oneshotflag.
Performance
- dipdup: Use fast
orjsonlibrary instead of built-injsonwhere possible.
4.2.0
What's new
ipfs datasource
ipfs datasourceWhile working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup now has a separate datasource to perform such requests.
datasources:
ipfs:
kind: ipfs
url: https://ipfs.io/ipfsYou can use this datasource within any callback. Output is either JSON or binary data.
ipfs = ctx.get_ipfs_datasource('ipfs')
file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'
file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'You can tune HTTP connection parameters with the http config field, just like any other datasource.
Sending arbitrary requests
DipDup datasources do not cover all available methods of underlying APIs. Let's say you want to fetch protocol of the chain you're currently indexing from TzKT:
tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
method='get',
url='v1/protocols/current',
cache=False,
weigth=1, # ratelimiter leaky-bucket drops
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'Datasource HTTP connection parameters (ratelimit, backoff, etc.) are applied on every request.
Firing hooks outside of the current transaction
When configuring a hook, you can instruct DipDup to wrap it in a single database transaction:
hooks:
my_hook:
callback: my_hook
atomic: TrueUntil now, such hooks could only be fired according to jobs schedules, but not from a handler or another atomic hook using ctx.fire_hook method. This limitation is eliminated - use wait argument to escape the current transaction:
async def handler(ctx: HandlerContext, ...) -> None:
await ctx.fire_hook('atomic_hook', wait=False)Spin up a new project with a single command
Cookiecutter is an excellent jinja2 wrapper to initialize hello-world templates of various frameworks and toolkits interactively. Install python-cookiecutter package systemwide, then call:
cookiecutter https://github.com/dipdup-io/cookiecutter-dipdupAdvanced scheduler configuration
DipDup utilizes apscheduler library to run hooks according to schedules in jobs config section. In the following example, apscheduler spawns up to three instances of the same job every time the trigger is fired, even if previous runs are in progress:
advanced:
scheduler:
apscheduler.job_defaults.coalesce: True
apscheduler.job_defaults.max_instances: 3See apscheduler docs for details.
Note that you can't use executors from apscheduler.executors.pool module - ConfigurationError exception raised then. If you're into multiprocessing, I'll explain why in the next paragraph.
About the present and future of multiprocessing
It's impossible to use apscheduler pool executors with hooks because HookContext is not pickle-serializable. So, they are forbidden now in advanced.scheduler config. However, thread/process pools can come in handy in many situations, and it would be nice to have them in DipDup context. For now, I can suggest implementing custom commands as a workaround to perform any resource-hungry tasks within them. Put the following code in dipdup_indexer/cli.py:
from contextlib import AsyncExitStack
import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper
@cli.command(help='Run heavy calculations')
@click.pass_context
@cli_wrapper
async def do_something_heavy(ctx):
config: DipDupConfig = ctx.obj.config
url = config.database.connection_string
models = f'{config.package}.models'
async with AsyncExitStack() as stack:
await stack.enter_async_context(tortoise_wrapper(url, models))
...
if __name__ == '__main__':
cli(prog_name='dipdup', standalone_mode=False)Then use python -m dipdup_indexer.cli instead of dipdup as an entrypoint. Now you can call do-something-heavy like any other dipdup command. dipdup.cli:cli group handles arguments and config parsing, graceful shutdown, and other boilerplate. The rest is on you; use dipdup.dipdup:DipDup.run as a reference. And keep in mind that Tortoise ORM is not thread-safe. I aim to implement ctx.pool_apply and ctx.pool_map methods to execute code in pools with magic within existing DipDup hooks, but no ETA yet.
That's all, folks! As always, your feedback is very welcome 🤙
4.1.0
Migration from 4.0 (optional)
- Run
dipdup schema initon the existing database to enabledipdup_head_statusview and REST endpoint.
What's New
Index only the current state of big maps
big_map indexes allow achieving faster processing times than operation ones when storage updates are the only on-chain data your dapp needs to function. With this DipDup release, you can go even further and index only the current storage state, ignoring historical changes.
indexes:
foo:
kind: big_map
...
skip_history: never|once|alwaysWhen this option is set to once, DipDup will skip historical changes only on initial sync and switch to regular indexing afterward. When the value is always, DipDup will fetch all big map keys on every restart. Preferrable mode depends on your workload.
All big map diffs DipDup pass to handlers during fast sync have action field set to BigMapAction.ADD_KEY. Keep in mind that DipDup fetches all keys in this mode, including ones removed from the big map. If needed, you can filter out the latter by BigMapDiff.data.active field.
New datasource for contract and token metadata
Since the first version DipDup allows to fetch token metadata from Better Call Dev API with bcd datasource. Now it's time for a better solution. Firstly, BCD is far from being reliable in terms of metadata indexing. Secondly, spinning up your own instance of BCD requires significant effort and computing power. Lastly, we plan to deprecate Better Call Dev API soon (but do not worry - it won't affect the explorer frontend).
Luckily, we have dipdup-metadata, a standalone companion indexer for DipDup written in Go. Configure a new datasource in the following way:
datasources:
metadata:
kind: metadata
url: https://metadata.dipdup.net
network: mainnet|ghostnet|limanetNow you can use it anywhere in your callbacks:
datasource = ctx.datasources['metadata']
token_metadata = await datasource.get_token_metadata(address, token_id)bcd datasource will remain available for a while, but we discourage using it for metadata processing.
Nested packages for hooks and handlers
Callback modules are no longer have to be in top-level hooks/handlers directories. Add one or multiple dots to the callback name to define nested packages:
package: indexer
hooks:
foo.bar:
callback: foo.barAfter running init command, you'll get the following directory tree (shortened for readability):
indexer
├── hooks
│ ├── foo
│ │ ├── bar.py
│ │ └── __init__.py
│ └── __init__.py
└── sql
└── foo
└── bar
└── .keepThe same rules apply to handler callbacks. Note that callback field must be a valid Python package name - lowercase letters, underscores, and dots.
New CLI commands and flags
schema initis a new command to prepare a database for running DipDip. It will create tables based on your models, then callon_reindexSQL hook to finish preparation - the same things DipDup does when run on a clean database.hasura configure --forceflag allows to configure Hasura even if metadata hash matches one saved in database. It may come in handy during development.init --keep-schemasflag makes DipDup preserve contract JSONSchemas. Usually, they are removed after generating typeclasses withdatamodel-codegen, but you can keep them to convert to other formats or troubleshoot codegen issues.
Built-in dipdup_head_status view and REST endpoint
dipdup_head_status view and REST endpointDipDup maintains several internal models to keep its state. As Hasura generates GraphQL queries and REST endpoints for those models, you can use them for monitoring. However, some SaaS monitoring solutions can only check whether an HTTP response contains a specific word or not. For such cases dipdup_head_status view was added - a simplified representation of dipdup_head table. It returns OK when datasource received head less than two minutes ago and OUTDATED otherwise. Latter means that something's stuck, either DipDup (e.g., because of database deadlock) or TzKT instance. Or maybe the whole Tezos blockchain, but in that case, you have problems bigger than indexing.
$ curl "http://127.0.0.1:41000/api/rest/dipdupHeadStatus?name=https%3A%2F%2Fapi.tzkt.io"
{"dipdupHeadStatus":[{"status":"OUTDATED"}]}%Note that dipdup_head update may be delayed during sync even if the --early-realtime flag is enabled, so don't rely exclusively on this endpoint.
Changelog
Added
- cli: Added
schema initcommand to initialize database schema. - cli: Added
--forceflag tohasura configurecommand. - codegen: Added support for subpackages inside callback directories.
- hasura: Added
dipdup_head_statusview and REST endpoint. - index: Added an ability to skip historical data while synchronizing
big_mapindexes. - metadata: Added
metadatadatasource. - tzkt: Added
get_big_mapandget_contract_big_mapsdatasource methods.
4.0.0
⚠ Breaking Changes
run --oneshotoption is removed. The oneshot mode (DipDup stops after the sync is finished) applies automatically whenlast_levelfield is set in the index config.clear-cachecommand is removed. Usecache clearinstead.
Migration from 3.x
- Run
dipdup initcommand to generateon_synchronizedhook stubs. - Run
dipdup schema approvecommand on every database you want to use with 4.0.0. Runningdipdup migrateis not necessary sincespec_versionhasn't changed in this release.
What's New
Performance optimizations
Overall indexing performance has been significantly improved. Key highlights:
- Configuration files are loaded 10x times faster. The more indexes in the project, the more noticeable difference is.
- Significantly reduced CPU usage in realtime mode.
- Datasource default HTTP connection options optimized for a reasonable balance between resource consumption and indexing speed.
Also, two new flags were added to improve DipDup performance in several scenarios: merge_subscriptions and early_relatime. See this paragraph for details.
Configurable action on reindex
There are several reasons that trigger reindexing:
| reason | description |
|---|---|
manual | Reindexing triggered manually from callback with ctx.reindex. |
migration | Applied migration requires reindexing. Check release notes before switching between major DipDup versions to be prepared. |
rollback | Reorg message received from TzKT, and can not be processed. |
config_modified | One of the index configs has been modified. |
schema_modified | Database schema has been modified. Try to avoid manual schema modifications in favor of SQL hooks. |
Now it is possible to configure desirable action on reindexing triggered by the specific reason.
| action | description |
|---|---|
exception (default) | Raise ReindexingRequiredError and quit with error code. The safest option since you can trigger reindexing accidentally, e.g., by a typo in config. Don't forget to set up the correct restart policy when using it with containers. |
wipe | Drop the whole database and start indexing from scratch. Be careful with this option! |
ignore | Ignore the event and continue indexing as usual. It can lead to unexpected side-effects up to data corruption; make sure you know what you are doing. |
To configure actions for each reason, add the following section to DipDup config:
advanced:
reindex:
manual: wipe
migration: exception
rollback: ignore
config_modified: exception
schema_modified: exceptionNew CLI commands and flags
| command or flag | description |
|---|---|
cache show | Get information about file caches used by DipDup. |
config export | Print config after resolving all links and variables. Add --unsafe option to substitute environment variables; default values from config will be used otherwise. |
run --early-realtime | Establish a realtime connection before all indexes are synchronized. |
run --merge-subscriptions | Subscribe to all operations/big map diffs during realtime indexing. This flag helps to avoid reaching TzKT subscriptions limit (currently 10000 channels). Keep in mind that this option could significantly improve RAM consumption depending on the time required to perform a sync. |
status | Print the current status of indexes from the database. |
advanced top-level config section
advanced top-level config sectionThis config section allows users to tune system-wide options, either experimental or unsuitable for generic configurations.
| field | description |
|---|---|
early_realtimemerge_subscriptionspostpone_jobs | Another way to set run command flags. Useful for maintaining per-deployment configurations. |
reindex | Configure action on reindexing triggered. See this paragraph for details. |
CLI flags have priority over self-titled AdvancedConfig fields.
aiosignalrcore replaced with pysignalr
aiosignalrcore replaced with pysignalrIt may not be the most noticeable improvement for end-user, but it still deserves a separate paragraph in this article.
Historically, DipDup used our own fork of signalrcore library named aiosignalrcore. This project aimed to replace the synchronous websocket-client library with asyncio-ready websockets. Later we discovered that required changes make it hard to maintain backward compatibility, so we have decided to rewrite this library from scratch. So now you have both a modern and reliable library for SignalR protocol and a much more stable DipDup. Ain't it nice?
Changes since 3.1.3
This is a combined changelog of -rc versions released since the last stable release until this one.
Added
- cli: Added
run --early-realtimeflag to establish a realtime connection before all indexes are synchronized. - cli: Added'run --merge-subscriptions` flag to subscribe to all operations/big map diffs during realtime indexing.
- cli: Added
statuscommand to print the current status of indexes from the database. - cli: Added
config export [--unsafe]command to print config after resolving all links and variables. - cli: Added
cache showcommand to get information about file caches used by DipDup. - config: Added
first_levelandlast_leveloptional fields toTemplateIndexConfig. These limits are applied after ones from the template itself. - config: Added
daemonboolean field toJobConfigto run a single callback indefinitely. Conflicts withcrontabandintervalfields. - config: Added
advancedtop-level section. - hooks: Added
on_synchronizedhook, which fires each time all indexes reach realtime state.
Fixed
- cli: Fixed config not being verified when invoking some commands.
- cli: Fixed crashes and output inconsistency when piping DipDup commands.
- cli: Fixed missing
schema approve --hashesargument. - cli: Fixed
schema wipe --immuneflag being ignored. - codegen: Fixed contract address used instead of an alias when typename is not set.
- codegen: Fixed generating callback arguments for untyped operations.
- codegen: Fixed missing imports in handlers generated during init.
- coinbase: Fixed possible data inconsistency caused by caching enabled for method
get_candles. - hasura: Fixed unnecessary reconfiguration in restart.
- http: Fixed increasing sleep time between failed request attempts.
- index: Fixed
CallbackErrorraised instead ofReindexingRequiredErrorin some cases. - index: Fixed crash while processing storage of some contracts.
- index: Fixed incorrect log messages, remove duplicate ones.
- index: Fixed invocation of head index callback.
- index: Fixed matching of untyped operations filtered by
sourcefield (@pravin-d). - tzkt: Fixed filtering of big map diffs by the path.
- tzkt: Fixed
get_originated_contractsandget_similar_contractsmethods whose output was limited toHTTPConfig.batch_sizefield. - tzkt: Fixed lots of SignalR bugs by replacing
aiosignalrcorelibrary withpysignalr. - tzkt: Fixed processing operations with entrypoint
default. - tzkt: Fixed regression in processing migration originations.
- tzkt: Fixed resubscribing when realtime connectivity is lost for a long time.
- tzkt: Fixed sending useless subscription requests when adding indexes in runtime.
Changed
- cli:
schema wipecommand now requires confirmation when invoked in the interactive shell. - cli:
schema approvecommand now also causes a recalculation of schema and index config hashes. - index: DipDup will recalculate respective hashes if reindexing is triggered with
config_modified: ignoreorschema_modified: ignorein advanced config.
Removed
- cli: Removed deprecated
run --oneshotargument andclear-cachecommand.
Performance
- config: Configuration files are loaded 10x times faster.
- index: Checks performed on each iteration of the main DipDup loop are slightly faster now.
- index: Number of operations processed by matcher reduced by 40%-95% depending on the number of addresses and entrypoints used.
- tzkt: Improved performance of response deserialization.
- tzkt: Rate limit was increased. Try to set
connection_timeoutto a higher value if requests fail withConnectionTimeoutexception.