facebook/sam3.1 · Hugging Face
Key Points
- 1SAM 3 (Segment Anything with Concepts) is a unified foundation model from Meta for promptable segmentation, capable of detecting, segmenting, and tracking objects using text or visual prompts.
- 2SAM 3 notably introduced the ability to exhaustively segment all instances of open-vocabulary concepts specified by short text phrases, handling over 50 times more unique concepts than existing benchmarks.
- 3SAM 3.1 builds upon this by incorporating "Object Multiplex," a shared-memory approach that achieves approximately 7x faster multi-object tracking inference without sacrificing accuracy, alongside improved VOS performance.
SAM 3 (Segment Anything with Concepts) is a unified foundation model developed by Meta, designed for promptable segmentation across both images and videos. This model exhibits capabilities in object detection, segmentation, and tracking, leveraging diverse prompt types including text descriptions, as well as visual cues such as points, bounding boxes, and masks. A core advancement introduced by SAM 3 is its ability to perform exhaustive segmentation of all instances related to an open-vocabulary concept, which can be specified via a concise text phrase. This innovation significantly expands the model's conceptual understanding, enabling it to handle over 50 times more unique concepts compared to previous benchmarks.
SAM 3.1 represents an iteration upon SAM 3, introducing methodological enhancements primarily in multi-object tracking and inference efficiency. The key advancement in SAM 3.1 is "Object Multiplex," which is characterized as a shared-memory approach specifically designed for joint multi-object tracking. This methodology allows for concurrent and efficient processing of numerous objects by optimizing memory access and computational resource utilization. The practical benefits of Object Multiplex are substantial: it facilitates approximately 7 times faster inference when tracking 128 objects on a single NVIDIA H100 GPU, critically achieving this speedup without compromising segmentation or tracking accuracy. Furthermore, SAM 3.1 demonstrates improved Video Object Segmentation (VOS) performance, showing gains on 6 out of 7 evaluated benchmarks. This particular repository serves exclusively as a host for the SAM 3.1 model checkpoints, with detailed installation procedures, code, usage examples, and comprehensive documentation available through the dedicated SAM 3 GitHub repository.