Industry: Research

JetBrains products used: Datalore

Organization Size: 15

Country: United States

The Center for New Data

Center for New Data works with civic groups, technology companies, and academics to bring pioneering methodologies into the democracy movement — all powered by big data.

How The Center for New Data Processes 300 GB of Data Daily With Datalore and Airflow

About The Center for New Data

Could you please introduce yourself?

Hi, I am Chad Rosenberg, Head of Technology at The Center for New Data (newdata.org). I lead infrastructure operations and manage New Data’s national volunteer corps of data scientists, engineers, and statisticians.

What kind of projects is New Data involved in?

Our flagship program, using big data to measure voting access in the United States, is about analyzing the wait times for millions of voters at tens of thousands of polling locations across the country, correlating them to regions and individual socioeconomic status, and showing that different locations have different access levels to voting. Another program has us finding communities of interest using mobility data as opposed to just conceptual approaches. Finding out how we can better organize these often marginalized communities into voting districts helps to reduce polarization and build a healthier democracy.


Problems to solve

What made you look for Datalore or alternative solutions? What challenges did you face?

The previous notebook solution was really hard to maintain. It had some Kubernetes dependencies issues which were hard to solve. The migration from one version to another was very difficult. DevOps time is very important for us as a volunteer organization, and we love that Datalore was a turnkey solution that we were able to easily set up in our Kubernetes cluster on AWS. We need things to just work, and having support included makes things easier as well.


“Datalore allows our team to rapidly prototype and share results with anyone on the team. It’s become a game-changing tool for collaboration across our organization.”

— Chad Rosenberg, Head of Technology, The Center for New Data


The Datalore experience

Who uses Datalore in your team?

We have around 15 seats in Datalore, and most of the team is working on data quality. The data quality team uses Datalore to troubleshoot Apache Airflow schedule results, do exploratory analysis, and build reports on data.

What kind of data do you work with?

We are currently using Snowflake as the main database. We ingest around 300 GB of anonymized cell phone location data from our data providers, calculate the major metrics using Apache Airflow, and then put the resulting datasets into Snowflake.

What key benefits do you get from using Datalore?

Datalore just gives us ways to work on our data that we won’t get in Airflow, like debugging the pipeline results, trying the webhooks, and quickly visualizing the data with automatic plotting features. Being able to use the native Snowflake connector in Datalore, as well as the programmatic ones in pandas, has definitely been a time saver when working on shared notebooks.

We also love the publishing reports feature. This allows the broader audience to view the results of our work. We can just assemble a quick report, publish it, and say, “here’s a URL,” without giving viewers the ability to download data.

When do you use the native Snowflake database connection and SQL cells? And when do you access your database via Python?

It is very easy to use the native SQL cells and the Snowflake connection when you are starting to assemble the query. When we have to run a loop on SQL, we use pandas and copy-paste SQL strings there.

Could you give an example of how your team collaborates?

Someone will import the data using SQL cells and prepare the resulting dataframe. Other team members then start investigating the data in the same notebook, produce data quality reports, and then we compare the results with previous runs.

Have you noticed any improvements in your data team’s workflow?

Datalore allows our team to rapidly prototype and share results with anyone on the team. It’s become a game-changing tool for collaboration across our organization.

What’s next?

We haven’t had time to configure centralized authentication in Datalore yet, but we will work on it in the coming months. We also want to take care of horizontal scaling in our Kubernetes (K8s) cluster so that we save some compute time.

In the meantime, we are actively preparing for the midterm elections this fall, and Datalore will be an integral part of our preparations.

Similar Customer Stories

Hunters

Netanel Golani, a Threat Hunting Expert at Hunters

It has only been a month since the data science team at Hunters started using Datalore, and we have already seen productivity and usability improvements in our daily workflow – especially when working with numerous customer data sources.

Chainalysis

Surya Rastogi, Senior Staff Data Scientist, Chainalysis

One of our biggest challenges is that the blockchain space is rapidly expanding and there is always new data to be acquired and analyzed. As a company we have a lot of data acquisition and processing functions, and we expect them to keep growing.

TrueLayer

Moreno Raimondo Vendra, Senior Machine Learning engineer, TrueLayer

Datalore enabled our team to ergonomically access our data while meeting the security requirements, which was a game changer for us. As a result, we could collaborate much more easily both within our Machine Learning team and with our stakeholders.

More customer stories