Hi, I am Chad Rosenberg, Head of Technology at The Center for New Data (newdata.org). I lead infrastructure operations and manage New Data’s national volunteer corps of data scientists, engineers, and statisticians.
Our flagship program, using big data to measure voting access in the United States, is about analyzing the wait times for millions of voters at tens of thousands of polling locations across the country, correlating them to regions and individual socioeconomic status, and showing that different locations have different access levels to voting. Another program has us finding communities of interest using mobility data as opposed to just conceptual approaches. Finding out how we can better organize these often marginalized communities into voting districts helps to reduce polarization and build a healthier democracy.
The previous notebook solution was really hard to maintain. It had some Kubernetes dependencies issues which were hard to solve. The migration from one version to another was very difficult. DevOps time is very important for us as a volunteer organization, and we love that Datalore was a turnkey solution that we were able to easily set up in our Kubernetes cluster on AWS. We need things to just work, and having support included makes things easier as well.
“Datalore allows our team to rapidly prototype and share results with anyone on the team. It’s become a game-changing tool for collaboration across our organization.”
— Chad Rosenberg, Head of Technology, The Center for New Data
We have around 15 seats in Datalore, and most of the team is working on data quality. The data quality team uses Datalore to troubleshoot Apache Airflow schedule results, do exploratory analysis, and build reports on data.
We are currently using Snowflake as the main database. We ingest around 300 GB of anonymized cell phone location data from our data providers, calculate the major metrics using Apache Airflow, and then put the resulting datasets into Snowflake.
Datalore just gives us ways to work on our data that we won’t get in Airflow, like debugging the pipeline results, trying the webhooks, and quickly visualizing the data with automatic plotting features. Being able to use the native Snowflake connector in Datalore, as well as the programmatic ones in pandas, has definitely been a time saver when working on shared notebooks.
We also love the publishing reports feature. This allows the broader audience to view the results of our work. We can just assemble a quick report, publish it, and say, “here’s a URL,” without giving viewers the ability to download data.
It is very easy to use the native SQL cells and the Snowflake connection when you are starting to assemble the query. When we have to run a loop on SQL, we use pandas and copy-paste SQL strings there.
Someone will import the data using SQL cells and prepare the resulting dataframe. Other team members then start investigating the data in the same notebook, produce data quality reports, and then we compare the results with previous runs.
Datalore allows our team to rapidly prototype and share results with anyone on the team. It’s become a game-changing tool for collaboration across our organization.
We haven’t had time to configure centralized authentication in Datalore yet, but we will work on it in the coming months. We also want to take care of horizontal scaling in our Kubernetes (K8s) cluster so that we save some compute time.
In the meantime, we are actively preparing for the midterm elections this fall, and Datalore will be an integral part of our preparations.
Netanel Golani, a Threat Hunting Expert at Hunters
It has only been a month since the data science team at Hunters started using Datalore, and we have already seen productivity and usability improvements in our daily workflow – especially when working with numerous customer data sources.
Surya Rastogi, Senior Staff Data Scientist, Chainalysis
One of our biggest challenges is that the blockchain space is rapidly expanding and there is always new data to be acquired and analyzed. As a company we have a lot of data acquisition and processing functions, and we expect them to keep growing.
Moreno Raimondo Vendra, Senior Machine Learning engineer, TrueLayer
Datalore enabled our team to ergonomically access our data while meeting the security requirements, which was a game changer for us. As a result, we could collaborate much more easily both within our Machine Learning team and with our stakeholders.