To explore or exploit?

Navigating the explore-exploit dilemma.

May 10, 2023

two roads between trees — Photo by Jens Lelie on Unsplash

In mathematical programming, we were introduced to two types of algorithms: explorative and exploitative. Exploration is collecting new information in order to make a new decision. Whereas exploitation is making the best use of the information one has to arrive at something.

One simple example to illustrate this. You take the same route to go to the office every day. You know it takes 30 mins. Now someday you decide to change your known route and explore the other possible routes possible. Now the new route may take 35 mins or 23 mins. You’ll only know that once you actually take the new route. The first one in this case exploitation, where you know that it will take 30 mins and you just want to reach your office within that time. The 2nd one is the case of exploration.

A lot of choices we make come down to this exploration vs exploitation framework, specially when we have the full authority of making the decision.

Recently I was talking to a peer who has recently entered the arranged marriage set-up. His first-ever match had great compatibility with him. Even both of them started liking each other. But on a fine morning, he thought of “exploring” other options to see if someone was more compatible. It has been more than 6 months, and he is yet to find another suitable partner like the first one.

So the question comes here: when to explore and when to exploit?

I think one key component to look into here is the opportunity cost of a decision. Simply put: it is the difference in value between the return from the best option not chosen and the option chosen. For example, take the case of driving. There exists a path which will lead you to your office in 25 mins. But you didn’t explore that, and continue following the same old path. The opportunity cost here is 5 mins.

So, the cases where the opportunity cost is very little, you can afford to explore more and more. Even if instead of the route with 25 mins, you ended up in a route that takes 40 mins. 10 mins (from your current known duration) won’t harm you much.

But, on the other hand, the opportunity cost of losing a compatible partner is quite high. In cases like these, it is best to exploit the already-known state.

Opportunity cost is also something that has helped me tackle anxiety in daily life, but more on that later.

There are areas in life where you need to explore and exploit at the same time. Take for example the standard Machine Learning roles in the industry. What generally happens here is that you’re presented with a problem statement. You narrow down 2-3 suitable algorithms (if you don’t know the math this number will be 7-8) and from that you first explore and try out different algos. And once you finalise the most suitable one, you go ahead with exploiting it (with hyperparameter tuning and other improvements).

The kind of work I (or my/our current team) do requires exploiting first and then exploring. We first send out our initial analyses and reports to the stakeholders. Then we receive new information (feedback and abuses) and then we use the new info to improve the current model and analysis.

My current manager, a few months ago, had summed up the exploration vs exploitation dilemma eloquently (knowingly or unknowingly): “Don’t let the prospect of the perfect work disrupt your current good work”

Often in life, the good is all we need.

Ordinary Analysis

Discussion about this post