Industry: Web3 / Blockchain

Service: Databricks Genie Implementation

Service: Databricks Genie Implementation

Customize (Refine) and Expand (Grow) High‑Intent Marketing Audience in Real Time—at Scale

Stack: Google Cloud, BigQuery, Dataproc, Vertex AI, Spark

4-1-edit
4-2 edit

The Challenge

With hundreds of millions of users, marketing teams at digital wallet platforms face a major personalization challenge. Each user holds a unique mix of tokens and NFTs, which are blockchain-based digital assets, with varying user behaviors. Adding to the complexity, a single wallet can contain more than 20,000 distinct data points. These together make market segmentation and identification far beyond the reach of simple filtering approaches and the technical problem a nightmare.
The client wanted an AI system that could take the marketing team’s segment criteria from CRM and identify similar wallets and generate target audience instantly for marketing initiatives.

The Product

A real-time recommendation engine was designed and deployed on Google Cloud, using a ScaNN index for similarity matching across 65 million wallets.

The system uses  wallet data from BigQuery, including token holdings, NFT assets, and behavioral signals, and sends it through a dimensionality reduction  model to reduce a massive feature space of 100,000 down to a compact, meaningful representation. This feature set is then indexed using Vertex AI’s Matching Engine, allowing  the system to query any wallet ID and return its nearest neighbors instantly.

The output is a target group of wallets that share similar characteristics, which can be used directly for personalized recommendations or audience segmentation.

Architecture and data processing

The project started with Dataproc for large-scale data handling, with Vertex AI managing index creation and updates. Setup included configuring a VPC network, storage buckets, and the necessary APIs to support the full pipeline.

Principal Component Analysis (PCA) modeling

We worked with both standard and sparse matrices to handle the variability in wallet data, ensuring the model represented token and NFT holdings accurately across very different portfolio types.

Deployment

The ScaNN index was deployed as a cloud function, accessible via UI, CLI, and workbench, then integrated directly with BigQuery queries for real-time execution.

Technical Challenges Worth Noting

Sparse vector problem
Wallets with high-value but narrow portfolios were influencing recommendations toward similar high-value wallets, skewing results. We addressed this by analyzing PCA feature distributions and adjusting the distance calculations to produce more balanced, relevant matches.
Scaling from prototype to production
The initial proof of concept ran on a single-node compute. Moving to Spark allowed the pipeline to scale and handle the full dataset efficiently making the system production-ready rather than a demo.
End-to-end verification
Before scaling up, we ran the full pipeline on a sample dataset to validate accuracy at each step. These validations helped identify any issues at an early stage giving much more confidence in the results to the client before committing to full deployment.

The Result

The deployed engine can query any wallet in the database and return a ranked list of its nearest neighbors in real time, enabling dynamic audience generation at a scale that was not possible before
Marketing and product teams can now generate highly targeted audiences based on wallet behavior and holdings rather than broad demographic segments or manual curation. These segments are accurate, live, and configurable at massive scale—enabling campaigns to reach the right users with far less manual work and in a fraction of the time. The result is improved conversion and ROI, reduced costs, and full end‑to‑end data governance.
Turn your data into high-intent audiences—instantly.
Let’s build a real-time AI engine tailored to your product and scale.
4-1-edit (3)
4-1-edit (2)