Introduction
In order to design an token transaction indexer, I will compare two existing indexers currently in operation at Helius: DAS (Digital Asset RPC Infrastructure) and Photon and cherry pick some of its features.
- DAS is used to index all digital assets on Solana, including Tokens, NFTs, and cNFTs. You can explore more about DAS here.
- Photon is a new indexer launched by Helius to store compressed accounts. More details can be found here.
DAS vs Photon
Streaming Mechanism
- DAS: Uses a custom Geyser plugin to stream accounts and transactions into its indexer. This plugin is specific to DAS and can be found here.
- Photon: Utilizes the yellowstone-grpc gRPC Geyser plugin to stream accounts and transactions.
Message Buffering
- DAS: Uses Redis streams as a message buffer, necessitating the running of a Redis server.
- Photon: Leverages Rust’s stream primitives to stream data from Geyser. It fetches blocks concurrently and indexes them serially.
Data Backfilling
- DAS: Requires a Geyser restart to perform data backfilling.
- Photon: Employs polling via RPC calls and auto-healing techniques to fill gaps automatically.
Code Structure
- DAS: Has multiple binaries, one for the indexer and another for the API.
- Photon: Consolidates functionality into a single binary, which can be run in multiple modes.
Database and APIs
- Database: Both use Postgres as their database.
- API: Both use JSON RPCs for their APIs.
Design Choices for the Transaction Indexer
Streaming Mechanism
- The Transaction Indexer will use the gRPC Geyser plugin to stream transactions from the TokenKeg and Token2022 programs. This is a natural choice given the existing infrastructure.
Polling and Backfilling
- Like Photon, the Transaction Indexer will use polling via RPC as a fallback mechanism to fill in any gaps in the data.
Message Passing
- Given the expected volume of updates, I will use
mpsc
channels for message passing between the polling logic and the indexing logic. This approach is similar to using Redis streams and will help minimize indexing latencies.
Code Structure
- The indexer will be split into two binaries: one for indexing and another for the API. This separation keeps the API and indexing logic isolated and easier to manage.
Database Choice
- I will use TimescaleDB instead of vanilla Postgres. The reasons for this choice are detailed in the next section.
API Format
- Like both DAS and Photon, the Transaction Indexer will use JSON RPCs.
TimescaleDB
What is TimescaleDB?
TimescaleDB is a time-series database built on top of Postgres. It offers powerful features like automatic partitioning, compression, and scaling, making it well-suited for handling time-series data.
Why Use TimescaleDB for the Transaction Indexer?
- Performance: TimescaleDB’s optimizations for time-series data make it ideal for indexing transactions, which inherently involve time-based data.
- Scalability: As transaction volume grows, TimescaleDB can scale efficiently, handling large datasets with ease.
- Flexibility: TimescaleDB allows for preserving history for a defined period (e.g., 3 months) while still enabling custom access to data from the genesis block.
Why Not Use ClickHouse?
- While ClickHouse is another popular choice for time-series data, TimescaleDB’s integration with Postgres provides a smoother transition for projects already using Postgres. Additionally, TimescaleDB’s native support for SQL and compatibility with Postgres tooling is advantageous.
I also referred this doc[https://www.timescale.com/blog/what-is-clickhouse-how-does-it-compare-to-postgresql-and-timescaledb-and-how-does-it-perform-for-time-series-data/] which compares the performance of Timescale DB and Clickhouse and based on their findings, I will be using Timescale DB
DB schema
Blocks Table
Purpose: Tracks the blocks indexed and identifies gaps for automatic backfilling.
Columns:
slot
: bigintparent_slot
: bigintblock_height
: bigintblock_time
: bigint
Indexes:
- Primary Key:
slot, block_time
- Primary Key:
Token Transfers Table
Purpose: Indexes all token transfer instructions and associated data.
Columns:
signature
: byteasource_address
: byteaprogram_id
: byteadestination_address
: byteasource_ata
: bytea (nullable)destination_ata
: bytea (nullable)mint_address
: bytea (nullable)slot
: bigintamount
: biginterror
: text (nullable)block_time
: timestamp with time zonecreated_at
: timestamp without time zone
Indexes:
- Primary Key:
signature, source_address, destination_address, block_time
- Primary Key:
The token_transfers table records token transfer transactions. It includes the signature of the transaction, the addresses involved (source_address, destination_address), and other details like program_id, slot, amount, and any potential error messages. The table also records the slot and block_time timestamp of when the transfer was created.
The blocks
table is used to keep track of indexed blocks and to automatically backfill any gaps. The token_transfers
table records token transfer transactions, including details like the involved addresses, program ID, slot, amount, and timestamps. The focus is on indexing only the Token and TokenExtensions programs, keeping the schema simple and targeted.
Also, the token_transfers
table will be a hypertable and compression feature will be used to compress data older than 3 months.
APIs
We’re going to just 1 single API to retrive transaction data
GetTransactionsByAddress
This endpoint retrieves a list of transactions filtered by a source address and/or destination address or mint address. The results can be paginated and sorted based on the provided parameters.
Request
{
"jsonrpc": "2.0",
"id": "0",
"method": "get_transactions_by_address",
"params": {
"source": "string", // Public key of the source address (optional)
"destination": "string", // Public key of the destination address (optional)
"mint": "string", // Public key of the mint address (optional)
"limit": "u32", // Limit on the number of transactions returned (optional)
"page": "u32", // Page number for pagination (optional)
"before": "string", // Filter transactions before this date (dd/mm/yyyy format, optional)
"after": "string", // Filter transactions after this date (dd/mm/yyyy format, optional)
"sort_by": { // Sorting options (optional)
"sort_by": "TransactionSortBy", // Field to sort by, default is "slot"
"sort_direction": "TransactionSortDirection" // Sorting direction, default is "desc"
}
}
}
Response
{
"jsonrpc": "2.0",
"result": {
"total": "u32", // Total number of transactions found
"limit": "u32", // Limit on the number of transactions returned
"page": "u32", // Current page number
"items": [ // List of transaction items
{
"signature": "string", // Transaction signature
"program_address": "string", // Public key of the program address
"mint_address": "string", // Public key of the mint address
"source_address": "string", // Public key of the source address
"destination_address": "string",// Public key of the destination address
"source_ata": "string", // Associated token account of the source address
"destination_ata": "string", // Associated token account of the destination address
"amount": "u64", // Amount transferred in the transaction
"slot": "u64", // Slot number of the transaction
"block_time": "string" // Block time of the transaction (ISO 8601 format)
}
]
},
"id": "0"
}