I’ll be honest. When I first started working with Samasource’s SamaUSA team, I had no clue who Eric Ries was nor had I heard of The Lean Startup. And when our CEO assigned the book as required reading, having a copy placed on my desk, I figured I would read it to save face during our next staff meeting more than anything else. What I didn’t see coming was how much the principles and stories in The Lean Startup would directly inform how I do my job as a data analyst and impact evaluator.
SamaUSA is a non-profit startup with the goal of bringing work opportunities to high poverty areas in the U.S. We train low-income community college students in digital literacy and technical skills aimed at helping them find and succeed in online work — a very promising industry in terms of growth in both available jobs and wages. I didn’t know it at the time, but I had signed on to a super lean organization. Our team had started with one member and scaled to four over the course of four months. In that same time, our team created a ten-week minimum viable program (MVP) from scratch.
What’s Lean Got To Do With It?
Our MVP will be tested throughout this year in small batches, cohorts of 20-30 students at our two pilot locations. According to our current schedule, each cohort follows the next by about a month, giving us just enough time to collect and implement feedback and then get the next program going with improvements. The thinking here is that iterating an MVP with small cohorts is the most efficient and effective way to create a program that we know works and can be replicated.
But is the same true on the data side? As the team’s Data & Evaluation Lead, I determine the types of data that need to be collected, when they need to be collected, and how. What, if any, influence does the lean approach have on how we collect and use data? Below are three responses based on my experiences with our inaugural cohort.
I. Lean helps determine which data to collect
Just like the MVP, the data that we collect and, even more so the processes by which we do so, can be improved greatly through multiple iterations. When I first started working with our students, I had imagined that collecting information such as ‘total number of credits the student is currently taking’ or ‘has ever taken’ would be relatively simple. In my experience, most students have a pretty solid grasp on these numbers throughout their college careers, so I didn’t give it too much thought except to include those questions as part of my data intake process. Little did I know that these two data points alone would prove incredibly difficult to collect and verify. The students in our program have typically been in and out of college for a number of years and are dealing with many other life issues, so their credit count is not at the top of their minds. On top of that, I tried to confirm the verbal data by asking those same students again in our web-based surveys. The numbers didn’t match at all! I never would have known this without engaging with our target communities on the ground. Now, after just one iteration of the program, I have a very good sense of what kinds of data can be relied on via self-reporting and what kinds should require verification. I also better understand how to build that data collection into the student application, enrollment, and onboarding process.
Testing and verifying what works before scaling and codifying the program prevents us from burdening our students with verification requirements in places that are not necessary. It also helps us require data in other places where it would have been more helpful. The opportunity to correct and enhance my data collection and verification methods based on actual student experiences gets me closer to creating a reliable data collection system, which is any impact evaluator’s dream.
II. Lean allows me to confirm data points
The iterative nature of the build-measure-learn cycle allows me to confirm data points. By doing small batches more often, I am able to confirm learnings from multiple experiments rather than infer conclusions about one large data set. We recently graduated our inaugural class, and going into the program our team had no rigorous or data-based expectations for our graduation rate. We can now say that for our first cohort, the graduation rate from the program is at 78%. With multiple smaller cohorts planned for the rest of the year, we can test to see if those numbers can be replicated. Will future cohorts replicate similar graduation rates and other relevant statistics? This goes against the evaluative methods championed in more academic settings, where large sample sizes and random assignment are stressed. But in the absence of having access to large pools of candidates and the ability to deny services via random assignment, I need real-world solutions. My sample sizes may be small, but at this early stage of the program I’d much rather see consistent numbers in small batches than one set of stats from one group.
III. Lean makes me present data in a way people understand
Because the data collected from each cohort is so critical to shaping the next cohort’s experience as well as the “final” version of the program (and given the quick turn-around time between cohorts), I have to think much more about how I am presenting the stories that the data tell and how to convey that information effectively to various audiences. In a lean non-profit, data is king. And that has forced me to create data collection and presentation tools that are user-friendly and understandable to my whole team, not just myself as the data analyst. As a team, this urges us to be much more critical and specific about the questions we are trying to answer and why.
I don’t know where the SamaUSA program will end up and in what form. But I’m okay with that because I know whatever that may be, it will be backed by relevant data collected over multiple runs of the program.
Want to learn more? Visit us here.