Hey everyone,
I’ve been playing around with some small reinforcement learning environments, and I started wondering if there’s any practical benefit to using unsupervised clustering (like K-Means or DBSCAN) to shape rewards or better define states.
For example, could clustering be used to detect patterns in state trajectories and somehow influence the reward function dynamically? Or maybe as a preprocessing step to reduce state complexity?
Thanks!
Hi @SiteDiane,
Excellent question!
Clustering can help identify structure in the state space that isn’t obvious from raw state features. Clustered Reinforcement Learning method uses clustering to divide the collected states into several clusters, based on which a bonus reward in the neighboring cluster of the current state is given to the agent.
This paper discusses the problem of direct application of clustering to reinforcement learning, which can lead to the issue where states may have different state transition processes under the same action, resulting in poor policy performance.