Thoughts on Metric Design

I have noticed that when I bring up the topic of “metrics” in Silicon Valley people seem to either love me or hate me. Tracking and…

I have noticed that when I bring up the topic of “metrics” in Silicon Valley people seem to either love me or hate me. Tracking and possibly acting on some measure that you believe captures an important dimension of your business or product is a productivity panacea for some and a myopic creativity killer for others. Regardless of how strongly you feel about the benefit of defining and tracking metrics, if you decide to do so it is worth spending the time to think about the why and how of metric design. We’ve recently been doing a bit of this at Wealthfront and I thought I’d share what I’ve learned from that process.

Resources on metrics

There are myriad books and TED Talks (and even companies) out there devoted to the topic of metrics. Different people use different names and initialisms, and they define and use metrics in slightly different ways, but they ultimately get at the same point. Rather than summarize all the details I’ll leave a list of the resources I consulted as we were going through the process of defining business and product metrics at Wealthfront.

What to use metrics for

Metrics can be used prospectively and retrospectively.

Prospective

By prospectively, I mean using a metric as a framework for assessing the potential impact of a decision. One problem that this use of metrics helps remedy is disagreement among a team as to what they are trying to optimize. Having agreed upon metrics enables teams to engage in productive discussion about the relative priority of different projects. Without metrics, debates about what a team should do can derail as there isn’t a common understanding of what everyone is working towards. For example, I chose “number of experiments run” as an internal metric for the data science team, which helped me prioritize the work we did to correct arrival rate bias. An additional step in the prospective use of metrics is to set goals for moving them although this is not necessary and does not imply the metric is tied to performance evaluation.

Retrospective

Once a team is using metrics prospectively, they can consider using them retrospectively. This means measuring changes in metrics as a mechanism for evaluating a project or team. This is the use case that people who are anti-metric tend to have a problem with, especially when the measurement is used as a basis for individual or team recognition and promotion. This use case can be challenging when a team does not have a high degree of control over their metrics, or if external factors that impact their metrics can’t be properly accounted for. An example of this at Wealthfront might be a metric that is dependent on the market, such as changes in assets under management (AUM).

What a good metric is

A useful metric should be meaningful, measurable and movable.

Meaningful

First, a metric should capture something that matters to the business, either directly or indirectly. For Wealthfront, a directly important metric could be something like net deposits or the number of new clients funding accounts. A metric that matters indirectly might be something that feeds into either of those higher-level metrics. For example, reducing churn would mean that fewer clients withdraw money, which would increase net deposits, all else equal. While this seems obvious, it is often the piece that teams have the hardest time agreeing on.

Measurable

If a metric can’t be measured then it can’t be used retrospectively. It can also be challenging to use prospectively if it isn’t clear whether the metric can or should be moved. For example, word-of-mouth is notoriously challenging to measure or estimate accurately. If it turned out your clients were already talking about your product a lot it might be futile to try and increase this as a metric. Some metrics can’t be directly measured but may still have prospective value. For example, Customer Lifetime Value is estimated rather than measured (at least on useful timescales). However, it can be a healthy way to frame a discussion about which products and features to prioritize.

Movable

Finally, a good metric needs to be movable, at least in theory. If a team can’t influence it, a metric is useless both prospectively and retrospectively. For example, pre-tax, pre-fee returns would be a bad metric for Wealthfront because we don’t control global investment markets and there is nothing we can do to move them. Retention, however, would be a valid candidate for a metric because we can ostensibly impact it by making changes to our product that delight our clients and solve their problems, thereby causing them to stick around. Even if a metric is estimated rather than measured it can still be moveable, as is the case with CLV.

The price of VTI (“the market”) would make a poor metric because it is not moveable (unless you control Donald Trump’s Twitter account)

Metrics have limitations

A metric doesn’t need to do everything.

Counterbalance

One challenge that came up with many of the teams I worked with at Wealthfront was the concern that their proposed metric had some blind spot or area of weakness. For example, our financial planning team wanted to have a metric that captured engagement with Path. However, they were concerned that this metric could be over-optimized to the degree that clients would be logging in more often than is beneficial for them. When I pointed out that one of their other area metrics effectively measured the quality of our clients’ financial outcomes, we agreed this would hedge against the risk of over-indexing on engagement. Keith Rabois refers to the practice measuring counterbalancing metrics as “pairing indicators”.

Culture

The other factor that mitigates the risk of metric blind spots is the culture in which they are used. For example, we have company values and product principles at Wealthfront that outline how we behave and how we build products, respectively. These artifacts and the culture they represent make it unlikely that a team will do something that is bad for our clients or our business because it is an easy way to optimize one of their metrics. For example, we could probably reduce withdrawals by making it extremely difficult to close an account with us (like Bank of America, Chase, Wells Fargo, and other big banks do) but this would be in violent contradiction of our principles and values and would therefore never happen.

Measure actions, not words

Choose metrics that capture behavior, not sentiment.

Proximity

Metrics are proxies for something you value. If you think about what it is you really want to capture with your metric you should find that some adhere more closely to this ideal than others. For example, referrals made by a client are probably a better measure of their delight than the length of their session. This is because a referral is the result of a moment of delight and these two events are closely connected in the causal diagram of events. It is not a stretch to imagine that a longer session could be driven by myriad factors totally unrelated to delight. In this sense, referrals are “closer” to delight than is session length. Proximity can be a useful dimension for choosing between two metrics or asking yourself if there is something that better captures what you really want to measure (or, if you’re really lost, what you value to begin with [see the section on “meaningful” above.])

“Never measure user happiness” -Cassie Kozyrkov

Animus

It is said that actions speak louder than words. I take the more severe view that words are mute. What your clients tell you is a red herring, unless it correlates strongly and consistently with what they do (I have yet to encounter this in the wild.) For example, if you survey users after your onboarding flow for an average rating of 4.9 but less than 10% of them convert to paying clients you’d be a sucker to use NPS. There are plenty of reasons for clients to lie to you and to themselves, and the selection bias in these samples can be grounds enough to render them effectively useless. Furthermore, it is doubly foolish to listen to the vocal minority of (maybe) users on social media who complain loudest. Whether you’re using it as an explicit metric or not, relinquishing decision making authority to perceived Twitter sentiment is an abdication of duty. The only thing that matters is what your clients do.

Simply put

Whether you plan to use metrics prospectively or retrospectively, remember to make sure they are meaningful, measurable and moveable. Don’t worry if there are gaps between your metrics as long as they counterbalance one another and you use them in the context of a culture and set of values that you believe in. And whatever you do, please, measure actions not words.