Multi-Modality

VisionTraj: A Noise-Robust Trajectory Recovery Framework based on Large-scale Camera Network

Trajectory recovery based on the snapshots from the city-wide multi-camera network facilitates urban mobility sensing and driveway optimization. The state-of-the-art solutions devoted to such a vision-based scheme typically incorporate predefined …

PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning

Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which …

Relation-Aware Distribution Representation Network for Person Clustering With Multiple Modalities

Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks, such as movie parsing and identity-based movie editing. Related methods such as multi-view clustering mainly project multi-modal features …