[This is about finding ways to quickly learn from past experiences to inform future actions. We briefly touch upon different learning models. This post is also fairly Obvious, so it’s perhaps not as insight-dense as other posts. [META: This and the next few weeks’ posts may not be as high-quality / on-time as I expect to have my taken up by other projects.]]
A commonly used distinction in psychology is that between goal-directed action and habitual action. As the name suggests, goal-directed action involves deliberate decision-making. Habitual behavior is a more automatic behavior that happens without much conscious input.
This seems to mirror a distinction in reinforcement learning between model-based and model-free reinforcement learning.
Admittedly, I know very little about RL, but my cursory understanding of the two is this: Model-based RL tries to build up an internal map to sketch out the consequences of taking different actions. On the other hand, model-free RL tries to cache values of certain actions based on past history, leading to a more habit-like response to stimuli.
Several friends of mine have found it useful to model humans as reinforcement learning agents. Given that I’m not well-versed in the field, I don’t think I can say the same for me, but I’ve definitely tried to take some cues from what I currently understand.
One general example of this is trying to assume a sort of telos with regards to apparent bugs I face. This typically cashes out into asking myself something like, “What purposes does this behavior serve? How is it trying to help me?” This is one of the approaches that I think is useful to take if you’re consistently breaking your own commitments.
Also, I definitely think that I’ve been building robust models from the greater field of learning theory. For one, the idea of shaping can be used to “scale up” habits: In shaping, you gradually reward the agent as it gets closer to the target behavior. As an example, if the end goal is to get someone to swim, you might first start with getting into the water, then staying underwater, and finally propelling oneself.
This roughly the approach Mark Bao takes in sustainable habit design, where he suggests starting off with a small action and ramping it up over time as you become comfortable with the current level. It’s the basic idea of trying not to bite off more than you can chew. I think people recognize that this is obvious in some domains like exercise (where it’s basically the norm), but not everyone generalizes this as a principle when it comes to other areas.
I think that one of the interesting in Bao’s post is how he points out that you can update your System 1 intuitions about potentially aversive actions. His solution, though, is willpower, which I think is a little misguided; as most MLU readers probably now know, I’m not a fan of willpower as a cure-all. The standard view in rationality (and mine) seems to favor the Internal Double Crux model, where you’re trying to act as a mediator in communicating between your conflicting sides.
However, I think that aversions don’t cover the entire space of “actions we don’t take”. Aside from things we consciously avoid because of internal conflict, I think there’s also actions we’re biased against because of cached computations and our current habits.
For example, I used to feel like I was the sort of person who didn’t approach strangers. I think fear was part of it, but I also just didn’t feel like it was compatible with my self-image. Recently, though, I’ve had several great experiences where chatting with strangers has turned out well.
Still, when I see new people, my cached thoughts about shyness come to mind a lot quicker than any recent, updated encounters.
I think this distinction of “aversion vs cached” is useful because there are times where being able to bring up the memory of the updated experience can serve as a sufficient intuition pump to get you to take action.
Another example of this is remembering that the last time you ate food, your portion sizes were actually smaller than your appetite could handle. Thus, it’s important to replace your cached expectations of how much you can eat with a more accurate number. There do seem to be situations where you could act more optimally, but you’re stopped by pesky habits of mind.
Given that my short-term goal is to try and explore more, I think this very important because I’m currently trying to bias towards change. I’d like a setup that looks like:
- Explore something new / a twist on something old.
- Get new information about the event.
- Cache the memory of 2 as a salient example to further future action.
In a way, this is a little similar to both exposure therapy and Comfort Zone Expansion, CFAR’s take on the concept.
I guess the actual thing that I want is a systemized way for keeping track of new experiences and a way to cache them. I want them to trigger, in a habit-like way, when I encounter situations where I’d like the updated intuition to be in mind.
A TAP like [Feel surprised by an action] → [Write down a to-do to write down the encounter in detail] could work.
At its core, the real question here seems to be something like, “How to actually update your gut-level intuitions in response to new data?”
At which point, we might just end up coming back to the question of how to remember / believe things in the first place, which feels like a tricky topic. In addition, there seems to be something here about how my self-image plays into this, which is entangled with my beliefs. So I think there’s at least some hidden depth to this framing here.
Suggested Exercise: Learning from Past Experiences
[META: I’ll likely be including this with every new MLU post as a way of adding more practicality. If you don’t think the exercise will help, no offense taken. It’s just an affordance given to you if you think you can use it.]
- Set a 10 minute timer.
- What is one thing that surprised your models this week? How did the situation go? Write down a paragraph describing the encounter.
- How do you expect to do better in the future. Write down 3 examples.