At Code for America, we have spent nearly four years building GetCalFresh, a service to make it as easy as possible for all eligible Californians to successfully enroll in CalFresh (AKA SNAP, food stamps) and get the help they need. To date, we’ve helped about 360,000 people across about 33 California counties, about 15,000 applications per month statewide.
But getting from 0 to 1 user, then 10, and 100, in 2014 and 2015 — during that period of significant uncertainty we had to be ruthlessly efficient in testing our ideas. And government programs are a very different context for hypothesis testing than consumer technology generally. This is Part 2 of 2 describing that work.
Test 2: A Simplified Application in One County (One to Tens)
With what we had learned, we moved to the next level of testing. We added the additional questions we heard were high-value and switched the app to send the PDF via secure email.
Version 2: A few more questions, one county.
The question we needed to answer was not is this easier for clients, but will this actually get people approved? If we had something easier, but only 10 percent of applications were approved, we were off the mark.
So we set up another test:
- Users would be recruited via a low level of Google AdWords, one county only
- Applications (with documents) would be sent to the county as PDFs via secure email
- One eligibility worker was given time weekly to give us status (approved, denied, pending) and denial reasons on the applications we submitted.
And we were able to get real data on the critical hypotheses:
- Even with a simplified application (15 questions rather than 200) about 60% of applications were approved. Speaking with the county, this was close to the overall approval rate, and somewhat higher than the existing online application approval rate.
- There were basically three denial reasons, in roughly equal shares:
- one-third for missed interviews
- one-third for missing verification documents
- one-third for ineligibility, e.g. income too high or ineligible college student
In some ways we were trying to disprove a null hypothesis: “simplified applications will have an incredibly low approval rate.”
Running real applications through the actual business process end to end gave us high-quality data disproving it.
The denial data also brought a new insight. When asked, county staff generally gave two major reasons people were denied: ineligible or missing documents. But few mentioned missed interviews. Finding interview denials about as prevalent let us focus on that under-discussed barrier.
Test 3: Simplified application, submitted via online portals, in multiple counties (Tens to a Thousand)
Once we had evidence our simplified application service was working, we started talking to other counties.
From these conversations we learned something new: no county was interested in receiving applications as a secure email with a PDF. This made sense. Our original county partner already had a trusted relationship with us, and we were operating at a small scale. Other counties were very reluctant to engage a new business process. It could require staff reallocation and new protocols.
So we tested a different approach: submitting via the existing online client portals, just like any other user would do. This tested a few hypotheses:
- The roughly 60 percent approval rate from our initial county would hold in other counties
- Submitting online would not change the approval rate significantly
- Submitting online with our existing questions was feasible! (unlike paper, a web site can have field requirements)
- Submitting online could be automated eventually, e.g. we learned about a non-beatable CAPTCHA in one system.
Designing the test, we made a critical decision: we threw away our existing Rails app. Instead, we used an off-the-shelf “web forms as a service” tool to take our existing questions and get something up quickly.
Version 2, using an off the shelf form tool.
Why did we throw the original version away? First, it would have been significant effort to modify the existing Rails app. Using an off-the-shelf service took a few hours. More importantly, none of our hypotheses were served by having a custom app. As long as we used the same questions, we were testing our hypothesis. And submitting users’ data manually was far less work at our scale than writing automation code.
The only question modifying the existing app would have answered was, can we build a web form? And, yes, of course we can. An off-the-shelf web form worked perfectly well to test our hypotheses.
The test went like this:
- We recruited users across a number of counties via Google AdWords ads
- The user would fill out the form (our simplified subset of questions)
- We would manually fill out the online application via the existing client portals
- We would follow up with users 10, 20 and 30 days into the process — again, manually — to find out if they had gotten approved and denial reasons
From this, we learned:
- Barring a few details, submitting the fields we collected online was absolutely feasible
- We saw a similar approval rate to our first county (60 percent)
- We did not receive any reports from users about county staff confusion by the information submitted online (less than all)
With this, we could now explain to interested counties that our applications would come in just like any other online app — and we had confidence the outcomes were about the same.Why did we throw the original version away? It would have been significant effort to modify the existing Rails app & none of our hypotheses were served by having a custom app. @allafarce Click To Tweet
GetCalFresh: Lessons from the Early Days
From this early hypothesis testing, we’ve developed and scaled a service that helps about 15,000 low-income Californians apply for benefits per month. Scaling has been its own journey of years, but our root assumptions were validated in days and weeks.
A number of lessons can be drawn from our early hypothesis testing experience in GetCalFresh:
- A real, working thing — even if not scalable — changes the conversation from “should we” to “how can we” (especially in government.) Even though our very first prototype was clearly not scalable, it was real enough to shift the conversation from “should we do this?” to “how do we make this work?” While not testing assumptions directly, it created a context for informed stakeholders to reveal information only they had.
- The best way to learn about a complex service is to actually instrument it. Everyone working in a large organization has their own mental model of how things work. But often the size and division of labor entailed in delivering services like government does means no single individual’s model is truly complete. By instrumenting from input to outcome we had empirical data on how such black boxes actually work. (And real data wins arguments.)
- Major patterns emerge with a small amount of (real) data. We found the third-third-third breakdown of denial reasons on the scale of fewer than 50 applications. As we scaled to thousands, we saw a similar distribution. We didn’t require massive scale to find the biggest problems.
- Build only what you need to learn. A common mistake is writing code when an off-the-shelf service gives you what you need to test your idea. Building is costly in effort and time. And for many problems, the least uncertain hypothesis is whether you can build it.
Thank you to Dave Guarino, the Director of GetCalFresh, for contributing this piece. Please check out Part 1 of this series From Root Hypothesis to Functional Software with Code for America.
If you’re excited about the potential of Lean Startup for social good, check out the newly released Lean Impact: How to Innovate for Radically Greater Social Good. Also, check out the rest of our blog series for more inspiring stories.