The visualization begins with randomly distributed particles (red coordinate axes) scattered across the map.
As the ego-vehicle (green car) starts to move, SparseLocs' observation likelihood starts to refine the particle distribution
based on matching correspondances of detected landmarks with map points.
Watch how quickly the particles converge toward the actual position, with the red car representing our system's estimated pose.
This demonstrates SparseLoc's ability to efficiently narrow down location hypotheses using minimal semantic cues in a heavily sparse map, ultimately achieving reliable global localization.
Use the slider to visualize the particle filter's convergence process on KITTI Sequence 00.
Our framework taps into the capabilities of Open-World Perception models for localization through intuitive,
zero-shot prompting to identify landmarks static to the scene. We use Llama-3.2-Vision to generate the language-landmark database.
The VLM showed impressive semantic understanding by automatically creating
a landmark database, shown in the image below, that proved to be both sufficient and distinctive,
working effectively across multiple KITTI sequences without any changes.
The inherent sparsity of our landmark-based mapping and localization approach introduces a challenge that some landmarks appear with
high frequency, such as trees, dominate over others. This uneven distribution leads to perceptual aliasing, continuously challenging the particle filter.
Despite these challenges, our framework shows impressive localization accuracy by exploiting the multimodal hypothesis capabilities of particle filter.