WE TESTED, BUT NOBODY USES OUR SITE. NOW WHAT?
Eric Ries | Photo: The Lean Startup Conference
In our “Now What?” series, experienced entrepreneurs discuss issues that real-life startups face. In this piece, Eric Ries talks about testing two-sided markets and gets real about usability testing, too (if you’re new to those terms, we’ve defined them below). If you have a startup challenge, and you’d like insight from an experienced entrepreneur, let us know in this short form. – Eds
The startup’s problem
We’re trying to create a marketplace for consumers (think: eBay, Etsy, or UrbanSitter—but a little more specialized). We’ve talked on the phone or in person to 100 buyers and 100 sellers who’ve told us they’d use the site. We’ve gotten 20 sellers to give us basic info on what they’re offering, and at what price. We’ve tested the idea by email, matching up two buyers with sellers; the transactions were completed, everyone had good experiences and was enthusiastic about using the service again, so we built a bare-bones site. But after having contacted 150 more potential buyers by email and after having run a Google ad to draw buyers from outside our own network, nobody is buying. Now what?
About Eric Ries
In addition to serving as Editor at Large for The How, Eric Ries is an entrepreneur and author of the New York Times bestseller The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Business, published by Crown Business. He graduated in 2001 from Yale University with a B.S. in Computer Science. While an undergraduate, he co-founded Catalyst Recruiting. Ries continued his entrepreneurial career as a Senior Software Engineer at There.com, leading efforts in agile software development and user-generated content. He later co-founded and served as CTO of IMVU, his third startup. In 2007, BusinessWeek named Ries one of the Best Young Entrepreneurs of Tech. In 2008, he served as a venture advisor at Kleiner Perkins Caufield & Byers before moving on to advise startups independently. Today he serves on the board of directors for Code for America and on the advisory board of a number of technology startups and venture capital firms. In 2009, Ries was honored with a TechFellow award in the category of Engineering Leadership. In 2010, he was named entrepreneur-in-residence at Harvard Business School and is currently an IDEO Fellow. The Lean Startup methodology has been written about in the New York Times, the Wall Street Journal, Harvard Business Review, Inc.,Wired, Fast Company, and countless blogs. He lives in San Francisco with his wife, Tara.
Interview with Eric Ries, August 2014. Edited and condensed here.
Sarah Milstein & Mercedes Kraus: What would be your first step here?
Eric: We don’t have a lot of detail on this situation, so here’s what I going to assume: This entrepreneur thinks that they’re going to be able to replace eBay by creating a much better buying and selling experience for some kind of product category that they’re very passionate about. I’m going to make a further assumption that this is what we call a sticky engine of growthbusiness. The idea here is that once you start using this product, you pretty much can’t stop. Products that have that character to them have a very specific kind of growth pattern. They have network effects, just like viral products have network effects, and they’re theoretically very similar, but the phenomena that you measure in the world is very different.
In the case of a company like eBay, the network effect is that, once you start using it, you can’t stop because everybody else is there, and it becomes the the de facto place where you buy or sell the product in question. So if you’re a Beanie Baby buyer, you can’t go anywhere else because all the product inventory is there. If you’re a Beanie Baby seller, then you can’t go anywhere else because all the customers are there. You’re stuck. But just because you buy your Beanie Babies on Ebay doesn’t mean you’re going to go tell your friends about it. You may be very private about the fact that you’re a Beanie Baby collector, and that’s fine. No problem. A Paypal or a Facebook is very different; it doesn’t work if people you know don’t participate. So I’m going to assume that the eBay model is the goal, this product is for people who are obsessively buying and selling a collectible, like anime collectibles and Star Wars dolls and stuff like that, those kind of collectibles. Classic two-sided market.
Now, here’s the issue. Rule number one in a situation like this is always: Have you facilitated a transaction to show it can be done? We have, so we know that we can create some value. Now, what I want to know is: Can we get someone to stick to this, whatever the experience is that we’re trying to create? We want someone who’s going to use our product to say, “This is my place to be.” So we have to ask ourselves, “OK, now, what do I have to accomplish in order to make that a reality?”
As soon as the words came out of my mouth, I’m thinking, “I’m screwed.” Because buyers want the maximum inventory, and of course they’re going to check a lot of sites. How can we make this the place they want to go? We might try to figure out how to create a massive amount of inventory, so that they don’t need to go anywhere else. Here’s a great example: Reverb.com is a company I’m an investor in (in Chicago) that sells musical equipment like vintage guitars and amps, pedals, and stuff. The founder of Reverb had a guitar store already. So when the site launched, it had unbelievable inventory that you couldn’t find anywhere else. But the best part is that the store also buys used equipment. So if you were a customer, and you listed something, he would buy it, and you’d have a great experience. If you were looking, you’d find cool stuff, and you’d have a great experience, too.
That actually might be a model we could do with this startup. We could say, “I want to be a general purpose collectible site, but I can’t corner the market in all collectibles. But maybe I could corner the market in some specific kind where I could build up an inventory. I personally could put my own capital to work buy my stuff.”
A lot of people who want to create two-sided markets are chicken. They want to do e-commerce, but they don’t want to hold inventory. So just get over it. Or you can try the Airbnb trick of finding existing inventory at Craigslist and porting it over. There are a million different ways to create that additional inventory, and you can go crazy with it. But what startups forget is the goal: We want one customer to feel like they have to stick to our product. This is the cool thing about network effects. Very few people, if any, in a network experience the whole network at once. My telephone has value for me because I can call the other people in the network. The larger the network, the more people I call, the more valuable it is. But, how many people do I actually call in a day as an individual customer? For me as a customer, the value of the network is the number of other nodes in the network that I actually interact with, which, in a lot of cases, can be very, very, very small.
“YOU COULD IMAGINE THAT WE’RE GOING TO TARGET THIS CUSTOMER AND TRY TO MAKE THEIR LIFE PERFECT.”
It’s possible in the early days of a network-effects business to simulate the experience of network saturation for an individual customer, especially if you identify that customer in advance and cheat. You could imagine that we’re going to target this customer and try to make their life perfect. If you knew everyone I’m supposed to call tomorrow on a new phone network, and you went and signed them all up and made them available for me to call, I would have a great experience because everyone I need to call is there. I would be like, “Wow, this product is awesome.”
But most entrepreneurs are too chicken to actually do the work to create that good experience for the initial customer. If that’s your situation, you can cheat by penetrating an extremely dense network subnode [i.e., a small, tight-knit group within the larger population -Eds] and get all those people signed up at once. That’s why so many people love college-campus products like Facebook. It can be incredibly valuable with just one school signed up. So back to our case. We don’t want to start by going after all collectibles. We’re going to go after vintage Star Wars dolls. There’s only 25 people in the world who buy and sell those things because they’re psychotic collectors, and we’re going to go sign all twenty-five of them up and make this the place where they interact with each other.
SM & MK: OK, so our entrepreneur has some inventory, and let’s say they’ve cornered a small market, but nothing is closing. You were facing a similar problem at IMVU, what was the first thing you did? [IMVU, a company that Eric co-founded, lets users create 3D avatars and exchange virtual goods. Successful now—and the basis for much of Eric’s Lean Startup methodology—it started off poorly. Wikipedia has some history on it. – Eds]
ER: I hit my head against the wall for months, so I wouldn’t recommend that. It was so unbelievably hard. Like, you have on the order of 100 customers or something, and you’re not seeing traction. Starting small is no problem. But having 100 customers try your product and only three of them think it’s all right—that’s different. I’ve heard a lot about startups’ spending $5 a day on Adwords and bringing in 100 clicks with that, of which one person would stick. That feels awful. You just want to die pretty much.
“PEOPLE CALL ME ALL THE TIME, ‘HEY, I NEED TO TELL YOU ABOUT MY STORY AND THEN YOU HELP ME DECIDE IF IT’S TIME TO PIVOT.’ IT’S VERY EASY: IF YOU’RE CALLING ME, IT’S TIME.”
So what I wished I’d understood then is that that means you have to pivot. It’s not working. It’s not ambiguous. People call me all the time, “Hey, I need to tell you about my story and then you help me decide if it’s time to pivot.” It’s very easy: If you’re calling me, it’s time. People who know it’s working, you absolutely 100 percent will know that it’s working. You will not be asking an expert whether you should pivot or not. It will be clear. So something’s not right in the value proposition here. I can’t tell you what’s wrong. All you can do is investigate.
For us at IMVU, we eventually brought people in for usability tests and watched them use the product.
SM & MK: So in your case, that was bringing people in and having them physically sitting next to you and they’re trying to use the product?
ER: That’s right. Listen, when in doubt, that is always a good thing to do, because when you watch people trying to use your product, it’s extremely educational. And yet it took me months to see the problem. The reason I wasted so much time was because when people came in and tried to use our product and failed, I assumed we had a usability problem. So I tried to make the product easier to use. Now, making the wrong product easier to use just makes it easier for people to realize they don’t want to use it. So it’s like the definition of a lose, lose, lose. It’s worse for the customer, worse for you, worse for the metrics. It’s bad, and it’s very frustrating.
Every entrepreneur is like, “I’ll just explain it better. I’ll get better marketing. If customers just weren’t quite so stupid, then all these good things will happen.” But unless you’re looking at the right non-vanity metrics, you can never figure it out. When I was having this experience, I didn’t know the term “vanity metrics.” I didn’t know about pivots. I didn’t about any of this stuff. So I spent a lot of time banging my head against the wall, trying to get my vanity metrics to go up and failing.
SM & MK: So how did you refine your metrics?
ER: We could easily have solved this problem for ourselves if we’d had a very simple conversion-rate metric and a very simple retention metric. For example: What percentage of people who try to use the product succeed? And what percentage of those people came back two days later? That would have been enough to show us that our improvements were not making the situation better. We didn’t need some fancy analytic software. It could have been really very basic.
SM & MK: Tell us more about your usability tests.
ER: Usability tests are in a category of qualitative testing, as opposed to quantitative testing. Quantitative testing is for validating a hypothesis. The classic entrepreneur belief is: I believe that everyone in the whole world is going to love my product. Easy experiment to run. If it’s true that everyone in the world will love your product, it’s also true that a hundred people will love your product. So you launch it to a hundred people, and if all one hundred love the product, then keep going. Have fun. People in that situation tend not to be very interested in qualitative testing because they’re too busy turning the crank. No problem.
But if you’re doing a usability test, it’s probably because something is not going as well as you expected. Like you thought everybody in the world would want to use your product and only 10 percent of people do. Or in the case here, we did 100 interviews and the first couple of customers seemed to work, and then it died. What people told you and what they’re doing are very different things. Happens all the time, because nobody wants to tell you your baby isn’t beautiful.
Now that we know people won’t use our product, we can have a usability test where we watch them very carefully to understand why not. You can ask them questions about what they’re doing. You can try to really understand their mindset at the time that the action takes place. We always say, “Metrics are people, too.” Everything we measure in Lean Startup is the behavior of an individual person. If you want to change that behavior, the most important thing to understand is: What is in the mind of the person the moment before the behavior happens? What were they thinking at the time? It doesn’t matter why they’re not signing up or buying the product.
If somebody uses your product every day, that means that there was literally a moment when they were looking at their phone, and there were hundreds of apps on their phone, and they chose to use your app. What were they thinking in the minute before? Those are the kind of things we want to understand, and that data is very useful if it’s around a specific quantitative result that we already know.
I remember one time when I was doing usability testing around a software product, and in the usability test, customers always succeeded at the task in question, yet the metrics said very few customers would succeed. It was a paradox. We knew from the data in the real world that nobody was able to figure out how to use this particular feature, but in the usability test everything seemed fine. Customers used it, no problem.
We finally realized that, in the focus groups before the quantitative testing, we gave them a nudge so small we didn’t even notice it. It was an unconscious tip like, “Hey, just look here.” Just very subtle, and that totally messed up the results. Once we resolved to sit there, and we knew what we were looking for, then we could ask, “What were you thinking during the 25 minutes you just stared there, not clicking any buttons?” And the customer’s like, “I’m pissed. I was afraid, I didn’t know what to do.” They would express what they were actually feeling.
SM & MK: So how long long should we expect usability testing to take? Is this something that is days’ worth of work, and how long does the usability test run for anyway?
ER: An individual test is usually something like 20 minutes long. But how many you do is 100 percent context-dependent. You keep doing them until you get the result you need. Sometimes the answer is completely evident in one usability test. You say, “Eureka, I’ve found it.” You make the change, the numbers move, and you move on, and that’s a day’s work. Sometimes, though, you say, “Eureka. Ah, its so obvious.” You go fix the thing, and then you do another usability test and realize it didn’t make a damn difference. “Eureka, I have found it again and again and again.” I have gone through hundreds of Eurekas before I finally said, “Wait a minute.”
The reason I think that happens is because we’re optimistic, and we think that we’re almost there. So we often wind up focusing on micro-optimization when the problem is bigger. We’re improving usability instead of focusing on the core value proposition. This kind of testing is great because you eventually run out of things to test, and it forces you to take a step back and say, “Wait a minute, am I asking a big enough question about my strategy here?” Then you can get out of micro-optimization and enter a strategic conversation. But depending on how smart you are and how much experience you have, that could take days, weeks or unfortunately, months or years. And how much money you have, which is part of being a startup.
People say raising too much money is dangerous for start ups. This is one of the mechanisms by which that danger becomes manifest because, if you have unlimited runway, you often don’t have the motivation to think bigger about what’s actually going on. You just keep micro-optimizing your product.
SM & MK: On usability testing, what do you wish you’d done differently in the past?
ER: God. So many things. I wish I had understood this quantitative/qualitative thing I was just talking about. That would have saved me so much time. The number one biggest thing that I wish I had known in the past was how to reconcile vision with customer feedback. It sounds like a big picture, abstract, lofty, philosophical thing, but it’s where the rubber hits the road of the usability test. This is a place where if you don’t have confidence in your vision and an understanding of what that vision means, you just get so screwed.
“SO YOU’VE GOT TO FIND THIS PLACE WHERE YOU CAN SAY, ‘WHAT IS THE RIGHT SYNTHESIS OF WHAT I BELIEVE AND WHAT REALITY WILL ACCEPT?’ A USABILITY TEST IS THE PLACE WHERE YOU’RE THE MOST EMOTIONALLY CHALLENGED TO DO THAT.”
Let’s say I just produced the most amazing album that’s going to be the biggest hit record of all time. I played it for one person, and they’re like, “It sucks. Your music sucks, dude.” What do you do? Do you give up? Are you not a musician anymore? If your goal is to produce pure art, then you can just say, “You don’t like my music, you’re a moron.” But most entrepreneurs want to have a very specific impact on the world. As a matter of fact, so do most artists. So you’ve got to find this place where you can say, “What is the right synthesis of what I believe and what reality will accept?” A usability test is the place where you’re the most emotionally challenged to do that. Okay, someone doesn’t like your product, but what does that really mean? Does it mean you have the wrong product? Does it mean they’re not the target market? If you don’t have solid conviction, you can wind up on a weekly cycle of pivoting from idea to idea to idea too soon.
But if you have too strong a conviction, you can watch stubbornly and not take in the feedback that you need, and therefore never get to the next level. How do you find the confidence that you were on the right track—while also listening to the person to really understand what they were saying?
The critical thing to understand is that feedback tells you about the person giving the feedback, not about yourself. If someone says your product sucks, that doesn’t mean anything about your product. You’ve learned zero about your product. All you’ve learned is that this person doesn’t like your product. Then the question is: What do I extrapolate from that? You have a data point—this kind of person doesn’t like your product very much—so let’s try again.
It used to take me dozens and dozens of interactions. Now I’ve gotten much better at it. But it used to be that, until I saw every kind of person in the world not like my product, I could still be like, “Well, that’s just a type thing. Once we find the right kind of person, then they’re going to like my product.” But when young and old, and every race, ethnicity, gender, and age all consistently didn’t like it, I started to be like, “Hmm, maybe I don’t have the right product after all.”
Now that I’ve done this a bit more, I’m much better at drawing more reasonable inferences. I can see, “You know what, even from just three data points here, I know something’s not right.”
SM & MK: What’s the pattern that you see now that helps you recognize that it is a pattern?
ER: That’s a great question. The pattern is simply that people say the same thing. This teenager and this 45-year-old—who have nothing in the whole world in common and can’t agree on anything—happen to agree that my product is terrible. That’s odd. In fact, it’s pretty impressive that they agree about the specific thing that’s wrong with it.
Of course, sometimes people give different reasons. For one person, it’s too expensive; for another person, it’s too hard to use. So it’s confusing, and you have to get more information. But if you wait to get a definitive answer, you’ll be waiting forever. You have to make the best decision you can on the basis of the information you have now.
The good news is, it’s a process of continuous experimentation. So your new hypothesis will be immediately put it to the test tomorrow. And a cumulative sample size of the information you collect actually turns out to be quite large over the course of a month or year. You can make relatively rash decisions based on small data points, knowing that if you make a wrong decision, you can backtrack.
SM & MK: With usability testing, what steps would somebody else be tempted to take that you’d say they can ignore?
ER: This is going to sound totally contradictory. The first is insisting that every person you do the test with absolutely matches your customer archetype: being a prima donna and not accepting anybody’s feedback. I know some teams that they never do a usability test because they can never find a target customer. It’s like, “Hmmm. Maybe the fact it’s so hard to find a target customer is indicative of a problem.” I mean, you can be wrong about your target customer. I once had a product where the kind of customer that was using it was different than my target customer, and I kept trying to kick them out. Until somebody else pointed out I was being really dumb.
The opposite is accepting feedback indiscriminately from everybody. That’s also a mistake. Take a random person off the street and say, “Hey, here’s my new high-tech medical device.” And they’re like, “Huh?” Not valuable.
It all comes back to remembering that the true goal of all this testing and data collection is to validate the hypothesis that underlies the vision. If I have a strong belief that every doctor in the world will understand this new medical breakthrough, then it makes sense to talk to doctors and to validate and to take their information seriously. The fact that a person on the street does not understand it, that’s irrelevant. But if I really believe I have a product that’s for everybody, then I should accept the validity of everybody’s feedback.
You can scale the feedback you get by how close to the archetype a person is. If they’re not too close, you can say, “I’ll take this 10 percent seriously.” I’ll need 10 times more data points from that kind of person before I consider it valid. But the other mistake I made was—and this is common among engineers especially—whatever number of data points we had, I would say that it wasn’t a statistically significant sample. That’s the ultimate excuse to get you out of any data. It’s almost always wrong.
SM & MK: Why?
ER: The intuition people have about sample size comes from things like presidential polling, where the thing we’re trying to detect is actually a relatively small change in preference. So we’re looking at the tenth of a percent because it matters a lot. A candidate that gets 50.1% of the vote wins; 49.9% of the vote could lose. We’re really looking for the minute changes, so we need a big sample to detect that thing. In startups, what we need to know is: How big is the underlying signal? If I want to know if people like to breathe air or not, I don’t actually need that big of a sample to figure that out because it’s very, very, very obvious. Do people walk with their feet or with their hands? I don’t have to sample a million people to find that out. It’s very, very, very obvious, because the signal is quite strong. The kind of tests we do in a startup are supposed to be high signal-to-noise ratio things. Like: Are we on the right track? Do people like our product at all? Given that we’re looking for high-signal things, if the signal is ever ambiguous, the answer is no. [To really understand this issue, check out Nate Silver’s book, The Signal and the Noise. – Eds]
I’ve never heard an entrepreneur say, “We have a small number of customers, but they absolutely adore us or think we’re awesome. But it’s an insignificant sample, so we need to do more testing.”
To learn more about how to do usability testing yourself, check out Andre Glusman’s straightforward slide deck, “Lean Usability,” David Peter Simon’s useful post on guerrilla usability testing, and Laura Klein’s practical book, UX for Lean Startups.
If you have a story to share about the actionable metrics you use to measure your value to customers, join the discussion over here.
If you have a startup challenge, and you’d like insight from an experienced entrepreneur, let us know in this short form. – Eds
Sarah Milstein is Editor in Chief of The How.
Mercedes Kraus is Startup Managing Editor of The How.
If there was a term you didn’t know that we haven’t defined, please let us know—we want to help! Also, if you have a better definition or an addition to a definition, shoot us a note.
Sticky, viral, and paid engines of growth. As Eric explains above: The idea with a sticky product is that once you start using it, you pretty much can’t stop. A news site that you check daily is sticky.In this handy post on engines of growth, David Link explains that the viral engine “depends on users acquiring and activating other users as a mere and necessary consequence of normal product use… Modern examples are Hotmail (with the viral hook being the footer in every e-mail),Facebook (with the viral hooks being the friend suggestions and others), and Zynga (with the viral hooks being the various opportunities for social interactions within its games).” The paid engine of growth is the other basic approach, and it relies on ads, referral bonuses, and other cash outlays. For more on engines of growth, check out David’s piece and this post from Eric.
Network effects. When a product or service that becomes more valuable the more people that use it. Common examples include the phone system, markets like eBay, and platforms like Twitter—all of which are fundamentally more useful when more people are connected to them. This Wikipedia article is a little geeky, but it gets into some good detail.
Two-sided market. A product or service that brings together two distinct groups of people for a shared transaction of some sort. eBay brings together buyers and sellers. Uber brings together riders and drivers. App stores bring together mobile developers and phone users. Etcetera. Harvard Business Review has a tidy summary of two-sided markets.
Pivot. A pivot is a change in strategy based on what you’ve learned so far. They’re super-common in startups, even though the stories aren’t always well known. For example, YouTube started as a video-dating site. When the dating part didn’t take off, the company pivoted to focus on video sharing, which seemed to hold promise. Here, Eric explains pivots in depth, and this Forbes piece has a nice rundown of common kinds of pivots.
Usability testing. “Usability testing refers to evaluating a product or service by testing it with representative users. Typically, during a test, participants will try to complete typical tasks while observers watch, listen, and takes notes. The goal is to identify any usability problems, collect qualitative and quantitative data, and determine the participant’s satisfaction with the product.” That straightforward definition is, surprisingly, from a U.S. Department of Health & Human Services site, Usability.gov, which has the clearest basic info around about usability testing.
Metrics. A fancy term for measurements. Actionable metrics—those you can make meaningful decisions around—measure specific customer behaviors and patterns. During the interview above, Eric described them like this: “Anything denominated on per customer or per human being basis tends to be the right thing. The percentage of customers who subscribe to our article and become long term readers. The percentage of customers who read the article today and come back to read a new article, versus the ones that come back and read a new article tomorrow. The average revenue per customer. Those kinds of numbers tend to be really useful. Say we have 10 customers come in, and three of them love our product. We have 10 more customers look at the next version. Four of them like it, and then five, and then six, then seven, then eight. Eventually 10 out of 10 people like our product. You can see progress is being made even if the total number of customers might only be 100, because we chunk them up 10 at a time.” (That kind of chunking up is called cohort analysis.) Here’s Eric’s cornerstone post on actionable vs vanity metrics.
Vanity metrics. Measurements that are appealing to look at—and that shout for attention—but don’t tell you anything meaningful about your value to customers. For example, it’s fun to watch your number of Twitter followers increase or focus on how much total revenue you’ve taken. But Twitter followers aren’t necessarily customers, and gross revenue without contextual information doesn’t tell you whether you’re looking at sustained growth or scattershot injections of cash.
During our interview (but not quoted above), Eric explained: “The shorthand is that vanity metrics are gross numbers and large quantities. So: total revenue, total customers, number of clicks—any number that’s big, the kind of thing you like to brag about. “Oh, my god, we have 2,000 page views! And now we’ve hit 20,000. Lo! We have 2 million page views!” That could be 2 million people looked at our site one time and hit close. Could be that one guy loves our product way too much. He just started hitting refresh, refresh, refresh. Or it could be anything in-between. You actually really don’t know what’s going on. It means that you are subject to all kinds of variation and gyration due to external factors, and those numbers are not necessarily correlated to anything that you did. So they’re totally worthless.”
Conversion rate. The percentage of people who perform a desired action, like filling out a form or completing a purchase. Mashable gives a good overview of the term.
Retention rate. The percentage of customers who return in a period. Retention takes into account churn—those who leave—so you can see overall growth. Inc explains how to calculate it.
Hypothesis. What you think will happen when customers come into contact with your product. The basic structure looks like this: “I believe [customers like this] will [behave like this] in [this measurable way]. “Validating a hypothesis” means you’re running experiments that prove it true; “invalidating a hypothesis” means your experiments are proving it false. Ben Yoskovitz has a clear write-up on how to craft a useful hypothesis.
Runway. The amount of time a startup has until it runs out of cash. The term is most often applied to companies that have cash in the bank from investors but bring in little or no revenue.