Segment Anything Model的核心愿景:




Segment Anything Model已经实现的功能:

⭐ SAM已经学会了物体的概念
⭐ 可以为图像或视频中的物体生成掩码,甚至是没遇见过的
⭐ 通用性很强,无论是水下照片还是细胞显微镜






,启发与nlp的prompt机制,叫做提示性分割(promptable segmentation),实现这个任务的模型


叫做分割一切模型(Segment Anything Model SAM),训练这个模型的数据集


叫(SA-1B Segment Anything-1 Billion)。



(data annotation),其他任务


迁移(zero-shot transfer)


摘自原文:Large language models pre-trained on web-scale datasets are revolutionizing NLP with strong zero-shot and few-shot generalization [10]. These “foundation models” [8] can generalize to tasks and data distributions beyond those seen during training. This capability is often implemented withprompt engineering in which hand-crafted text is used to prompt the language model to generate a valid textual response for the task at hand. When scaled and trained with abundant text corpora from the web, these models’ zero and few-shot performance compares surprisingly well to (even matching in some cases) fine-tuned models. Empirical trends show this behavior improving with model scale, dataset size, and total training compute.

🟨 模型规模,数据集,计算量影响着模型表现,所以要做大模型。

摘自原文: In this work, our goal is to build a foundation model for image segmentation. That is, we seek to develop a promptable model and pre-train it on a broad dataset using a task that enables powerful generalization. With this model, we aim to solve a range of downstream segmentation problems on new data distributions using prompt engineering.


3️⃣Task: promptable segmentation


performing well at this task is challenging and requires specialized modeling and training loss choices


focal loss

dice loss

downstream tasks can be solved by engineering appropriate prompts.For example, if one has a bounding box detector for cats, cat instance segmentation can be solved by providing the detector’s box output as a prompt to our model.
to perform instance segmentation, a promptable segmentation model is combined with an existing object detector.




We anticipate that composable system design, powered by techniques such as prompt engineering, will enable a wider variety of applications than systems trained specifically for a fixed set of tasks.




4️⃣ Model: Segment Anything Model

摘自原文:The promptable segmentation task and the goal of real-world use impose constraints on the model architecture. In particular,the model must support flexible prompts, needs to compute masks in amortized real-time to allow interactive use, and must be ambiguity-aware.Surprisingly, we find that a simple design satisfies all three constraints: a powerful image encoder computes an image embedding, a prompt encoder embeds prompts, and then the two information sources are combined in a lightweight mask decoder that predicts segmentation masks. We refer to this model as the Segment Anything Model, or SAM (see Fig. 1b). By separating SAM into an image encoder and a fast prompt encoder / mask decoder, the same image embedding can be reused (and its cost amortized) with different prompts. Given an image embedding, the prompt encoder and mask decoder predict a mask from a prompt in ∼50ms in a web browser. We focus on point, box, and mask prompts, and also present initial results with free-form text prompts. To make SAM ambiguity-aware, we design it to predict multiple masks for a single prompt allowing SAM to naturally handle ambiguity, such as the shirt vs. person example.


5️⃣ Data: data engine & dataset

摘自原文: Data engine (§4). To achieve strong generalization to new data distributions, we found it necessary to train SAM on a large and diverse set of masks, beyond any segmentation dataset that already exists. While a typical approach for foundation models is to obtain data online [82], masks are not naturally abundant and thus we need an alternative strategy. Our solution is to build a “data engine”, i.e., we co-develop our model with model-in-the-loop dataset annotation (see Fig. 1c). Our data engine has three stages:assisted-manual, semi-automatic, and fully automatic. In the first stage, SAM assists annotators in annotating masks, similar to a classic interactive segmentation setup. In the second stage, SAM can automatically generate masks for a subset of objects by prompting it with likely object locations and annotators focus on annotating the remaining objects, helping increase mask diversity. In the final stage, we prompt SAM with a regular grid of foreground points, yielding on average ∼100 high-quality masks per image.Dataset (§5). Our final dataset, SA-1B, includes more than1B masks from 11M licensed and privacy-preserving images (see Fig. 2). SA-1B, collected fully automatically using the final stage of our data engine, has 400× more masks than any existing segmentation dataset [66, 44, 117, 60], and as we verify extensively, the masks are of high quality and diversity. Beyond its use in training SAM to be robust and general, we hope SA-1B becomes a valuable resource for research aiming to build new foundation models.
Responsible AI (§6). We study and report on potential fairness concerns and biases when using SA-1B and SAM. Images in SA-1B span a geographically and economically diverse set of countries and we found that SAM performs similarly across different groups of people. Together, we hope this will make our work more equitable for real-world use cases. We provide model and dataset cards in the appendix.

🟨使用这种方法,通过模型辅助注释者,半自动半注释,模型全自动分割掩码这三个等级,造就了SAM数据集SA-1B达到1100万张图像,超过10亿个有效的高质量掩码, 比现有的分割数据集多400多倍,比COCO完全手动基于多边形的掩码注释快6.5倍。
🟨SA-1B数据集不仅能获取的更快 更多 更方便,也 更平均,来自不同国家地区 不同收入


1️⃣ 配环境


python>=3.8, as well as pytorch>=1.7 and torchvision>=0.8.


pip install opencv-python pycocotools-windows matplotlib onnxruntime onnx


我用的vit_l, 即使是l,精确度完全可以接受,很不错的,记住下载路径一会填写代码里面

  • default or vit_h: ViT-H SAM model.
  • vit_l: ViT-L SAM model.
  • vit_b: ViT-B SAM model. 预训练模型下载地址


import numpy as np
import cv2 as cv2
from segment_anything import SamAutomaticMaskGenerator, sam_model_registry
# 权重路径
sam = sam_model_registry["vit_l"](checkpoint="./sam_vit_l_0b3195.pth")
mask_generator = SamAutomaticMaskGenerator(sam)
image = cv2.imread("./11.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
masks = mask_generator.generate(image)# 这里只取了第一个掩码,他会返回一个列表,“不同于nlp返回一个结果,sam会返回多个近似值”
mask =(masks[1]["segmentation"]*255.0).astype(np.uint8)
mask = np.array(mask)print(mask)print(mask.shape())print(image.shape())
cv2.namedWindow("image", cv2.WINDOW_NORMAL)
cv2.namedWindow("mask", cv2.WINDOW_NORMAL)
cv2.imshow("image", image)
cv2.imshow("mask", mask)
cv2.imwrite("C://Users//rg16x//Desktop//seg_result/11.png", mask)



问题不大 留言吧。


🍏Segmenting anything also Detect anything

原文下载地址❗❗❗ 本篇为预印本,预印本文章尚未经过同行评审,因此可能存在错误或不完整的信息,仅供个人学术研究和交流使用,不应用于商业用途,预印本文章可能会在最终发表时进行修改和修订。
左为原图,中为SAM分割图,右为检测框图。Segmenting anything also Detect anything 是将SAM嵌入到目标检测任务中,当做目标检测的下游模型,以SAM分割的高细粒度结果指导目标检测中低细粒度的边界框的生成,也就是,SAM分出所有的鱼,🐋🐬🐟🐠🐡🐙🐟 ,这叫高细粒度,而检测模型更细致,只要求检测分红小章鱼🐙,这叫低细粒度,进而言,先检测到所有鱼,再在其中检测粉红小章鱼,会更准确。











Segmenting anything also Detect anything的贡献点有三

摘自原文:1. To improve the accuracy of object detection annotation bounding box generation, we suggest utilizing a pixel-level classification model named SAM as a guide. 2. We propose high fine grain fill-in augmentation (HFGFA) using SAM, which reduces image augmentation information redundancy, resolves data imbalance and small object problems. 3. We propose to use SAM to guide the development of the open world object detection (OWOD) task, breaking the closed world assumption

  1. 用SAM作分类模型指导检测任务:对应上面示例图就是先用SAM高维度检测所有鱼,再用检测模型精确检测紫色大丑鱼。
  2. 高细粒度填充增强数据,减少了图像增强信息的冗余,解决了数据不平衡和小对象问题:就是将SAM抠出来的掩码和随机一张背景结合成新图像,增加了训练数据集的多样性和复杂性,并提高了模型的通用性和性能。
  3. 应用于开放世界。开放世界:在没有明确监督的情况下识别未知实例,在不忘记先前实例的情况下升级知识。


🍐Segment Everything Everywhere All at Once

🍊SegGPT: Segmenting Everything In Context

🍋Anything-3D: Towards Single-view Anything Reconstruction in the Wild

🍌SAM Fails to Segment Anything? -SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, and More

🍒Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications



🍇SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model

🍓Accuracy of Segment-Anything Model (SAM) in Medical Image Segmentation Tasks

🍈When SAM Meets Medical Images: An Investigation of Segment Anything Model (SAM) on Multi-phase Liver Tumor Segmentation

🥭Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging

🍈Can SAM Segment Polyps?

