Industry: Blockchain

JetBrains products used: Datalore

Organization Size: 500-1000

Country: United States

Chainalysis

Chainalysis provides data, software, services, and research to government agencies, exchanges, financial institutions, and insurance and cybersecurity companies in over 70 countries. Our data platform powers investigation, compliance, and risk management tools that have been used to solve some of the world’s most high-profile cases of cyber crimes and safely expand consumer access to cryptocurrency.

How Chainalysis Uses Datalore for Blockchain Analytics

About Chainalysis

Could you please introduce yourself?

Hi, I am Surya Rastogi, a Senior Staff Data Scientist at Chainalysis. I spend a lot of time analyzing a variety of blockchain data and providing analytical tools to many teams within the company. Currently, I lead the development of our research tools.

What kind of projects is Chainalysis involved in?

Chainalysis provides data, software, services, and research to government agencies, exchanges, financial institutions, and insurance and cybersecurity companies in over 70 countries. Our data platform powers investigation, compliance, and risk management tools that have been used to solve some of the world’s most high-profile cases of cyber crimes and safely expand consumer access to cryptocurrency.


Problems to solve

What made you look for Datalore or alternative solutions? What challenges did you face?

One of our biggest challenges is that the blockchain space is rapidly expanding and there is always new data to be acquired and analyzed. As a company we have a lot of data acquisition and processing functions, and we expect them to keep growing.


“Datalore provides us with a top-level interface over all that data, one where data scientists can poke around each of these different data sources and combine them to derive insights.”

— Surya Rastogi, Senior Staff Data Scientist, Chainalysis


The Datalore experience

Who uses Datalore in your team?

Overall there are 35 people at Chainalysis who have access to Datalore. The Research department, which focuses on R&D and deep tech, manages the Datalore installation and provides access to other data science functions. We have product data scientists analyzing data to ship to the product and auditing this data to look for any potential outliers and intricacies. Data science engineers have started using Datalore more than ever since the recent introduction of the Scheduling feature. Traditionally these engineers would write Airflow DAGs, but we’ve been transitioning to using scheduled runs for some of our use cases.

Your team has expanded quite a lot in the past year. Has the onboarding process changed after adopting Datalore?

The onboarding process has matured and become streamlined with Datalore. Before, we had documentation spread across Git repos and we had autodoc webpages, but now with Datalore we can give newcomers a “Getting Started” notebook, which they can copy and use to get started. Additionally, since static reports can include code cells, we can easily create documentation reports that analysts can copy and paste example snippets from.


“Datalore has been really useful for reducing the friction of onboarding and documenting our workflows.”


When initially onboarding onto Datalore, we thought we’d use the real-time collaboration features a lot more when onboarding new people, but oddly enough we don’t. We do, however, use real-time collaboration for multi-person calls – effectively mob programming – but in most scenarios it is one particular person driving the code.

What kind of data do you work with?

We have binary “scratch” data that sits in stores such as S3 or minIO, and we also leverage S3 as a data lake layer upstream of our data warehouses and lakehouses. We also have numerous classic SQL databases like Postgres. Database integrations, which weren’t initially present as a feature, have been a really nice addition in Datalore. As the feature developed, a lot of our SQL analysts were empowered to use Datalore more, as they had access to the features they relied on from DataGrip.

When we started, Datalore was not installed in AWS, but we migrated it to AWS so that we could benefit from some of the services we were already leveraging, like Athena. Since then it has been pretty easy to add all of our data sources and even more AWS stores.

How do you share the results of your work?

When sharing the results of our work, we mainly like to leverage theReports feature. It lets us annotate our workflows with markdown, allowing us to publish Reports walking through the data sources and transformations that were applied to achieve certain results.

Additionally, we’ve started leveraging Datalore to populate analytical databases with the results of our work. Traditionally we utilized Airflow for these use cases, but with the addition ofScheduling, we have been able to use Datalore instead. We used to have a DAG that was responsible for some database population, but we’ve replaced that with a Datalore notebook that runs every hour. Initially we mainly used Datalore as a read-only tool for data sources, but since the addition of Scheduling, we have started populating some databases purely through Datalore. This workflow is easier than starting with an investigation and then migrating code to a DAG for Airflow.


“Scheduling is my personal favorite new feature.”


Lastly, there are investigators and analysts who perform domain-specific analyses. Their work is shared as analytical “runbooks” for investigations by publishing an Interactive report for their peers. Whenever a similar analysis is needed, the report can be reused by just sharing the link.

Could you give an example of how your team collaborates?

In our core research team there are project-based groups. These groups will have meetings where they all open shared notebooks in Datalore and go through them together. As I’ve mentioned before, data engineers have recently begun to collaborate with data scientists, utilizing scheduled runs to populate data.

What’s next?

First, we are continuing to consolidate some of our data science infrastructure. Datalore lets us remove the need for tools like nbviewer (to showcase notebooks) and Google Colab (to collaborate on notebooks). And now with the Scheduling feature, we have started to consolidate some of our Airflow use cases into Datalore.

Second, when I initially introduced Datalore at Chainalysis, I just brought together everyone who uses Python for Data Science and then more SQL-centric analysts. In the future we might also want to expand our installation to handle Business Intelligence use cases (e.g. business dashboards).

And last but not least, we’ve started focusing on UIs for data science and we’ve built an internal tool with links to the most important interactive reports and other dashboards. We’ve been able to embed this within Datalore, allowing us to create navigational iframes between our various data science frontends.

Similar Customer Stories

Hunters

Netanel Golani, a Threat Hunting Expert at Hunters

It has only been a month since the data science team at Hunters started using Datalore, and we have already seen productivity and usability improvements in our daily workflow – especially when working with numerous customer data sources.

The Center for New Data

Chad Rosenberg, Head of Technology, The Center for New Data

Datalore just gives us ways to work on our data that we won’t get in Airflow, like debugging the pipeline results, trying the webhooks, and quickly visualizing the data with automatic plotting features. Being able to use the native Snowflake connector in Datalore, as well as the programmatic ones in pandas, has definitely been a time saver when working on shared notebooks.

TrueLayer

Moreno Raimondo Vendra, Senior Machine Learning engineer, TrueLayer

Datalore enabled our team to ergonomically access our data while meeting the security requirements, which was a game changer for us. As a result, we could collaborate much more easily both within our Machine Learning team and with our stakeholders.

More customer stories