A Comparison of Crypto Data Providers and APIs

Comparison of features between popular blockchain data APIs and providers
Comparison of features between popular blockchain data providers

As the usage and number of blockchains has increased in recent years, so has the volume of data these platforms produce. Every transaction, NFT minting, and smart contract call creates an ever growing ocean of data that can be tapped into for a variety of analytical purposes. Whether it’s a VC or hedge fund looking to make smarter crypto investments, developers tracking key product metrics of their dApps, or Web3 security firms monitoring smart contract protocols, easily accessible blockchain data is quickly becoming a necessity.

In this post we look at some of the most popular ways to access blockchain data and compare what each crypto data provider has to offer (data coverage, API accessibility, SQL queryable etc). We will examine the following sources:

  • DIY: building custom ETL pipelines
  • Dune Analytics
  • Nansen
  • The Graph
  • BigQuery public crypto datasets
  • Luabase

DIY: Custom ETL on Hosted Nodes


For those willing to get their hands dirty or have a team of data engineers, doing it yourself is an option. One can either self host a node or spin one up quickly using a service such as Quicknode or Alchemy. From there, most blockchains support clients that can connect to nodes via RPC and return data in JSON format.

Pros of DIY ETL:

  • Complete control over your data pipeline; utmost flexibility with what is extracted and how it is stored

Cons of DIY ETL:

  • A lot of work to setup and maintain infrastructure
  • Expensive - you have to pay a node provider for billions of API calls just to get started. Storage and compute costs are high to add the entirety of a blockchain and it can take weeks to sync the full history
  • May require full time effort/team to keep up with multiple chains and protocol changes (e.g. forks)

Dune Analytics


A post on blockchain data sources wouldn’t be complete without mentioning the OG of crypto data, Dune. One of the earliest in the blockchain data space, Dune has built a considerable following amongst crypto data analysts thanks to its SQL query-able data sets, its graphing/charting UI, and its social-media sharing features. A large library of public community-built dashboards and datasets allow new queries to easily build off existing efforts.

Pros of Dune Analytics:

  • SQL friendly
  • Free community tier
  • Graphical interface for building charts
  • Decoded data for select contracts
  • Large community makes getting started and asking for help easy

Cons of Dune Analytics:

  • No API - they are working on the Dune API, but no release date yet
  • Uncurated datasets - many community built views of the same data / projects can make it difficult to determine which is right or what is the source of truth
  • Slow - Dune’s v1 infrastructure is built on Postgres and can be slow for analytical queries (e.g. summing over billions of rows). A query such as "select count(1) from ethereum.transactions" takes over 5 minutes to run. The Dune v2 engine (beta) running on Databricks/Spark addresses some of this, however many of Dune’s datasets and dashboards still run on v1.

Nansen


Described by its CEO as a service that “​​aggregates data on millions of addresses, overlaying it on all the activity that happens on the blockchain, and then presenting easy-to-digest analytics,” Nansen is another leader in the blockchain data space. Nansen offers precomputed metrics that can be sliced and diced by analysts. The platform is akin to BI for blockchain data with out-of-the-box dimensions and measures readily available.

Pros of Nansen:

  • Great low-code interface
  • Deep datasets on NFTs
  • Integrates a lot of off-chain data to enrich on-chain data (e.g. wallet labeling)

Cons of Nansen:

  • Limited flexibility if your use case requires raw blockchain data or complex joins
  • Nansen API access is available but locked behind an enterprise sales process
  • Data is read-only for majority of plans. CSV export available for $3990 / quarter plan
  • Certain Nansen datasets are on Google BigQuery which incur additional costs to query

The Graph


In the full ethos of crypto, The Graph is a decentralized protocol for querying and indexing blockchain data. The Graph allows developers and dApps to query blockchain data via API using GraphQL. The Graph offers two versions of its protocol – Hosted and Decentralized. The Hosted version is scheduled to be sunsetted by Q3 2023 and is a centralized version of the protocol run by The Graph Foundation. By Q3 2023 the goal is to fully move all operations to the Decentralized service.

Pros of The Graph:

  • Decentralized - if decentralization is key (data not sourced from a single off-chain entity), The Graph may be your best bet
  • Supports a GraphQL based API

Cons of The Graph:

  • Queries are paid for in GRT, The Graph’s native token, which can be volatile
  • The Graph Decentralized network currently only supports Ethereum data (Hosted supports more)
  • The learning curve for GraphQL can be steep if you’re coming from a data science / analyst background
  • Less support than what you might get from a more centralized service

BigQuery Public Crypto Datasets


Google BigQuery has publicly available datasets containing raw data for several popular chains including Bitcoin and Ethereum that are routinely updated. The data tables are accessible to anyone with a Google Cloud account.

Pros of BigQuery Public Crypto Data:

  • Easy to get started if you’re already familiar with BigQuery and have a GCP account
  • Good performance on large queries thanks to BigQuery engine

Cons of BigQuery Public Crypto Data:

  • Limited number of chains supported and no decoded data
  • It’s unclear if these datasets are being actively maintained and there is no support if you have questions about the data
  • Expensive for large or frequent queries (Google charges by “bytes scanned” per query)

Luabase


Luabase is a SQL and API-first blockchain data stack. Built with developers and data scientists/analysts in mind, Luabase offers clean on and off-chain data in an easily accessible format via both API and SQL. Queries on Luabase are fast thanks to infrastructure built largely around Clickhouse by a team of modern data stack experts. Luabase also attempts to maximize use case flexibility by offering both raw blockchain data and the ability for users to define and save their own metrics/aggregations.
Pros of Luabase:

Cons of Luabase:

  • Currently supports EVM chains and Bitcoin, quickly adding others in coming months
  • Limited charting/dashboarding capability, as primary goal is to provide developers and/or data scientists with a better API experience

Features

DIY

Dune

Nansen

The Graph

BigQuery Public Data

Luabase

SQL/GraphQL Support

Depends on setup

SQL

None

GraphQL

SQL

SQL

API

(Build your own)

N

(Enterprise Only)

Y

(Build your own)

Y

Raw Data

Y

Y

N

Y

Y

Y

Decoded Data

Depends on setup

Y

(Certain contracts)

Y

Y

N

Y

Off-Chain Data

Depends on setup

(community curated)

Y

N

N

Y

Chains Supported

Depends on setup

EVM, Solana

EVM, Solana

Decentralized: EVM
Hosted: EVM, Solana, NEAR

List

EVM, Solana Q3 2022

Data Export Integrations

Depends on setup

CSV Download

CSV Download (only available with $3990 / quarter plan)

Depends on setup

Depends on setup

S3, Google Cloud Storage, Webhook, Slack, Google Sheets, + more

Subscribe to Luabase Blog

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe