
We only counted queries being run by a user - not queries that Periscope was running in the background to keep charts up-to-date. We didn’t count weekends because the cluster usage pattern differed too much versus weekdays: since we were tracking our statistics daily, when query volume dropped on the weekends it created visual noise that detracted from analyzing normal weekday patterns. We only considered weekday data points, and only after the entire day’s worth of data was available. More so, this first view looked at all the queries being run, but we wanted to value queries that were important to the success of the business. Looking into the data we saw that the p90 runtime (the sparkline in the top right corner in the image above) was fairly volatile, ranging from high single digits to tens of seconds week to week. Weekly Redshift Query performance at Plaid from Dec. Fortunately, Periscope offers a great set of meta-tables related to the usage of the tool, for example tables showing which charts were being run by different users and how long the query ran from Periscope’s point of view. With “data” in our team name, we couldn’t go further in this post without establishing some key metrics. We also hadn’t broadly invested into Redshift settings like workload management (WLM) queues and the data stored in the cluster was a comprehensive dumping ground of data, not a schema tailored for analytics. The original use-case for our Redshift cluster wasn’t centered around an organization-wide analytics deployment, so initial query performance was fairly volatile: the tables hadn’t been setup with sort and distribution keys matching query patterns in Periscope, which are important table configuration settings for controlling data organization on-disk, and have a huge impact on performance. The tool was being rapidly adopted internally by both a collection of power users who were proficient in SQL, and by less experienced folks who had just started to get their feet wet. We decided to also own the query runtime performance of the SQL statements being written in Periscope to ensure our data collection efforts were maximally useful to the company.

Snapshot of a Periscope DashboardĪt Plaid, we take a lot of pride in being a data driven company, and as such, the DSI team took on the responsibility for getting the data into our AWS Redshift data warehouse which powers the charts connected to Periscope.

After testing and analyzing, we had decided to use Periscope for tracking metrics around our core product usage, go-to-market strategy, and internal operations for customer support and project management. DSI: It’s a lifestyleĪfter building a scalable monitoring pipeline with Kinesis, Prometheus, & Grafana and beefing up our ETL efforts with Airflow, Plaid was in the midst of a transition to Periscope Data as a business intelligence tool. This post is a look at how we rebuilt internal analytics around rollup tables and materialized views at Plaid. The Data Science & Infrastructure team at Plaid has grown significantly over the past few months into a team whose mission is to empower Plaid with a data-first culture.
