目录
了解如何设置Azure中 JSONL 文件格式,以便在训练和推理期间在计算机视觉任务的自动化 ML 实验中使用数据。
关注TechLead,分享AI全维度知识。作者拥有10+年互联网服务架构、AI产品研发经验、团队管理经验,同济本复旦硕,复旦机器人智能实验室成员,阿里云认证的资深架构师,项目管理专业人士,上亿营收AI产品研发负责人。
一、用于训练的数据架构
Azure 机器学习的图像 AutoML 要求以 JSONL(JSON 行)格式准备输入图像数据。 本部分介绍多类图像分类、多标签图像分类、对象检测和实例分段的输入数据格式或架构。 我们还将提供最终训练或验证 JSON 行文件的示例。
图像分类(二进制/多类)
每个 JSON 行中的输入数据格式/架构:
{
"image_url":"AmlDatastore://data_directory/../Image_name.image_format",
"image_details":{
"format":"image_format",
"width":"image_width",
"height":"image_height"
},
"label":"class_name",
}
密钥说明示例
image_url
Azure 机器学习数据存储中的图像位置
Required, String
"AmlDatastore://data_directory/Image_01.jpg"
image_details
图像详细信息
Optional, Dictionary
"image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format
图像类型(支持 Pillow 库中所有可用的图像格式)
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif","bmp", "tif", "tiff"}
"jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width
图像的宽度
Optional, String or Positive Integer
"400px" or 400
height
图像的高度
Optional, String or Positive Integer
"200px" or 200
label
图像的类/标签
Required, String
"cat"
多类图像分类的 JSONL 文件示例:
{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details":{"format": "jpg", "width": "400px", "height": "258px"}, "label": "can"}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "397px", "height": "296px"}, "label": "milk_bottle"}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "1024px", "height": "768px"}, "label": "water_bottle"}
多标签图像分类
下面是每个 JSON 行中用于图像分类的输入数据格式/架构示例。
{
"image_url":"AmlDatastore://data_directory/../Image_name.image_format",
"image_details":{
"format":"image_format",
"width":"image_width",
"height":"image_height"
},
"label":[
"class_name_1",
"class_name_2",
"class_name_3",
"...",
"class_name_n"
]
}
密钥说明示例
image_url
Azure 机器学习数据存储中的图像位置
Required, String
"AmlDatastore://data_directory/Image_01.jpg"
image_details
图像详细信息
Optional, Dictionary
"image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format
图像类型(支持 Pillow 库中所有可用的图像格式)
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif", "bmp", "tif", "tiff"}
"jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width
图像的宽度
Optional, String or Positive Integer
"400px" or 400
height
图像的高度
Optional, String or Positive Integer
"200px" or 200
label
图像中的类/标签列表
Required, List of Strings
["cat","dog"]
多标签图像分类的 JSONL 文件示例:
{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details":{"format": "jpg", "width": "400px", "height": "258px"}, "label": ["can"]}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "397px", "height": "296px"}, "label": ["can","milk_bottle"]}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "1024px", "height": "768px"}, "label": ["carton","milk_bottle","water_bottle"]}
对象检测
下面是用于对象检测的示例 JSONL 文件。
{
"image_url":"AmlDatastore://data_directory/../Image_name.image_format",
"image_details":{
"format":"image_format",
"width":"image_width",
"height":"image_height"
},
"label":[
{
"label":"class_name_1",
"topX":"xmin/width",
"topY":"ymin/height",
"bottomX":"xmax/width",
"bottomY":"ymax/height",
"isCrowd":"isCrowd"
},
{
"label":"class_name_2",
"topX":"xmin/width",
"topY":"ymin/height",
"bottomX":"xmax/width",
"bottomY":"ymax/height",
"isCrowd":"isCrowd"
},
"..."
]
}
其中:
xmin
= 边界框左上角的 x 坐标ymin
= 边界框左上角的 y 坐标xmax
= 边界框右下角的 x 坐标ymax
= 边界框右下角的 y 坐标
密钥说明示例
Azure 机器学习数据存储中的图像位置image_url
Required, String
"AmlDatastore://data_directory/Image_01.jpg"
image_details
图像详细信息
Optional, Dictionary
"image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format
图像类型(支持 Pillow 库中提供的所有图像格式。但对于 YOLO,仅支持 opencv 允许的图像格式)
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif", "bmp", "tif", "tiff"}
"jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width
图像的宽度
Optional, String or Positive Integer
"499px" or 499
height
图像的高度
Optional, String or Positive Integer
"665px" or 665
label
(外部键)边界框列表,其中每个框都是其左上方和右下方坐标的
label, topX, topY, bottomX, bottomY, isCrowd
字典
Required, List of dictionaries
[{"label": "cat", "topX": 0.260, "topY": 0.406, "bottomX": 0.735, "bottomY": 0.701, "isCrowd": 0}]
label
(内部键)边界框中对象的类/标签
Required, String
"cat"
topX
边界框左上角的 x 坐标与图像宽度的比率
Required, Float in the range [0,1]
0.260
topY
边界框左上角的 y 坐标与图像高度的比率
Required, Float in the range [0,1]
0.406
bottomX
边界框右下角的 x 坐标与图像宽度的比率
Required, Float in the range [0,1]
0.735
bottomY
边界框右下角的 y 坐标与图像高度的比率
Required, Float in the range [0,1]
0.701
isCrowd
指示边界框是否围绕对象群。 如果设置了此特殊标志,我们在计算指标时将跳过此特定边界框。
Optional, Bool
0
用于对象检测的 JSONL 文件示例:
{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "can", "topX": 0.260, "topY": 0.406, "bottomX": 0.735, "bottomY": 0.701, "isCrowd": 0}]}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "carton", "topX": 0.172, "topY": 0.153, "bottomX": 0.432, "bottomY": 0.659, "isCrowd": 0}, {"label": "milk_bottle", "topX": 0.300, "topY": 0.566, "bottomX": 0.891, "bottomY": 0.735, "isCrowd": 0}]}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "carton", "topX": 0.0180, "topY": 0.297, "bottomX": 0.380, "bottomY": 0.836, "isCrowd": 0}, {"label": "milk_bottle", "topX": 0.454, "topY": 0.348, "bottomX": 0.613, "bottomY": 0.683, "isCrowd": 0}, {"label": "water_bottle", "topX": 0.667, "topY": 0.279, "bottomX": 0.841, "bottomY": 0.615, "isCrowd": 0}]}
实例分段
对于实例分段,自动化 ML 仅支持多边形作为输入和输出,不支持掩码。
下面是实例分段的示例 JSONL 文件。
{
"image_url":"AmlDatastore://data_directory/../Image_name.image_format",
"image_details":{
"format":"image_format",
"width":"image_width",
"height":"image_height"
},
"label":[
{
"label":"class_name",
"isCrowd":"isCrowd",
"polygon":[["x1", "y1", "x2", "y2", "x3", "y3", "...", "xn", "yn"]]
}
]
}
密钥说明示例
image_url
Azure 机器学习数据存储中的图像位置
Required, String
"AmlDatastore://data_directory/Image_01.jpg"
image_details
图像详细信息
Optional, Dictionary
"image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format
映像类型
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif", "bmp", "tif", "tiff" }
"jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width
图像的宽度
Optional, String or Positive Integer
"499px" or 499
height
图像的高度
Optional, String or Positive Integer
"665px" or 665
label
(外部键)掩码列表,其中每个掩码都是
label, isCrowd, polygon coordinates
的字典
Required, List of dictionaries
[{"label": "can", "isCrowd": 0, "polygon": [[0.577, 0.689,
0.562, 0.681,
0.559, 0.686]]}]
label
(内部键)掩码中对象的类/标签
Required, String
"cat"
isCrowd
指示掩码是否围绕对象群
Optional, Bool
0
polygon
对象的多边形坐标
Required, List of list for multiple segments of the same instance. Float values in the range [0,1]
[[0.577, 0.689, 0.567, 0.689, 0.559, 0.686]]
实例分段的 JSONL 文件示例:
{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "can", "isCrowd": 0, "polygon": [[0.577, 0.689, 0.567, 0.689, 0.559, 0.686, 0.380, 0.593, 0.304, 0.555, 0.294, 0.545, 0.290, 0.534, 0.274, 0.512, 0.2705, 0.496, 0.270, 0.478, 0.284, 0.453, 0.308, 0.432, 0.326, 0.423, 0.356, 0.415, 0.418, 0.417, 0.635, 0.493, 0.683, 0.507, 0.701, 0.518, 0.709, 0.528, 0.713, 0.545, 0.719, 0.554, 0.719, 0.579, 0.713, 0.597, 0.697, 0.621, 0.695, 0.629, 0.631, 0.678, 0.619, 0.683, 0.595, 0.683, 0.577, 0.689]]}]}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "carton", "isCrowd": 0, "polygon": [[0.240, 0.65, 0.234, 0.654, 0.230, 0.647, 0.210, 0.512, 0.202, 0.403, 0.182, 0.267, 0.184, 0.243, 0.180, 0.166, 0.186, 0.159, 0.198, 0.156, 0.396, 0.162, 0.408, 0.169, 0.406, 0.217, 0.414, 0.249, 0.422, 0.262, 0.422, 0.569, 0.342, 0.569, 0.334, 0.572, 0.320, 0.585, 0.308, 0.624, 0.306, 0.648, 0.240, 0.657]]}, {"label": "milk_bottle", "isCrowd": 0, "polygon": [[0.675, 0.732, 0.635, 0.731, 0.621, 0.725, 0.573, 0.717, 0.516, 0.717, 0.505, 0.720, 0.462, 0.722, 0.438, 0.719, 0.396, 0.719, 0.358, 0.714, 0.334, 0.714, 0.322, 0.711, 0.312, 0.701, 0.306, 0.687, 0.304, 0.663, 0.308, 0.630, 0.320, 0.596, 0.32, 0.588, 0.326, 0.579]]}]}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "water_bottle", "isCrowd": 0, "polygon": [[0.334, 0.626, 0.304, 0.621, 0.254, 0.603, 0.164, 0.605, 0.158, 0.602, 0.146, 0.602, 0.142, 0.608, 0.094, 0.612, 0.084, 0.599, 0.080, 0.585, 0.080, 0.539, 0.082, 0.536, 0.092, 0.533, 0.126, 0.530, 0.132, 0.533, 0.144, 0.533, 0.162, 0.525, 0.172, 0.525, 0.186, 0.521, 0.196, 0.521 ]]}, {"label": "milk_bottle", "isCrowd": 0, "polygon": [[0.392, 0.773, 0.380, 0.732, 0.379, 0.767, 0.367, 0.755, 0.362, 0.735, 0.362, 0.714, 0.352, 0.644, 0.352, 0.611, 0.362, 0.597, 0.40, 0.593, 0.444, 0.494, 0.588, 0.515, 0.585, 0.621, 0.588, 0.671, 0.582, 0.713, 0.572, 0.753 ]]}]}
二、用于推理的数据格式
在本部分中,我们将记录在使用部署的模型时进行预测所需的输入数据格式。 可以接受内容类型为
application/octet-stream
的任何上述图像格式。
输入格式
下面是使用特定于任务的模型终结点对任何任务生成预测所需的输入格式。 部署模型后,我们可以使用以下代码段来获取所有任务的预测。
# input image for inference
sample_image = './test_image.jpg'
# load image data
data = open(sample_image, 'rb').read()
# set the content type
headers = {'Content-Type': 'application/octet-stream'}
# if authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'
# make the request and display the response
response = requests.post(scoring_uri, data, headers=headers)
输出格式
根据任务类型,对模型终结点进行的预测遵循不同的结构。 本部分将探讨多类、多标签图像分类、对象检测和实例分段任务的输出数据格式。
图像分类
图像分类的终结点返回数据集中的所有标签及其在输入图像中的概率分数,格式如下:
{
"filename":"/tmp/tmppjr4et28",
"probs":[
2.098e-06,
4.783e-08,
0.999,
8.637e-06
],
"labels":[
"can",
"carton",
"milk_bottle",
"water_bottle"
]
}
多标签图像分类
对于多标签图像分类,模型终结点返回标签及其概率。
{
"filename":"/tmp/tmpsdzxlmlm",
"probs":[
0.997,
0.960,
0.982,
0.025
],
"labels":[
"can",
"carton",
"milk_bottle",
"water_bottle"
]
}
对象检测
对象检测模型返回多个框,其中包含缩放后的左上角和右下角坐标,以及框标签和置信度分数。
{
"filename":"/tmp/tmpdkg2wkdy",
"boxes":[
{
"box":{
"topX":0.224,
"topY":0.285,
"bottomX":0.399,
"bottomY":0.620
},
"label":"milk_bottle",
"score":0.937
},
{
"box":{
"topX":0.664,
"topY":0.484,
"bottomX":0.959,
"bottomY":0.812
},
"label":"can",
"score":0.891
},
{
"box":{
"topX":0.423,
"topY":0.253,
"bottomX":0.632,
"bottomY":0.725
},
"label":"water_bottle",
"score":0.876
}
]
}
实例分段
在实例分段中,输出包含多个框,其中包含缩放后的左上角和右下角坐标、标签、置信度和多边形(非掩码)。 此处,多边形值与我们在“架构”部分中讨论的格式相同。
{
"filename":"/tmp/tmpi8604s0h",
"boxes":[
{
"box":{
"topX":0.679,
"topY":0.491,
"bottomX":0.926,
"bottomY":0.810
},
"label":"can",
"score":0.992,
"polygon":[
[
0.82, 0.811, 0.771, 0.810, 0.758, 0.805, 0.741, 0.797, 0.735, 0.791, 0.718, 0.785, 0.715, 0.778, 0.706, 0.775, 0.696, 0.758, 0.695, 0.717, 0.698, 0.567, 0.705, 0.552, 0.706, 0.540, 0.725, 0.520, 0.735, 0.505, 0.745, 0.502, 0.755, 0.493
]
]
},
{
"box":{
"topX":0.220,
"topY":0.298,
"bottomX":0.397,
"bottomY":0.601
},
"label":"milk_bottle",
"score":0.989,
"polygon":[
[
0.365, 0.602, 0.273, 0.602, 0.26, 0.595, 0.263, 0.588, 0.251, 0.546, 0.248, 0.501, 0.25, 0.485, 0.246, 0.478, 0.245, 0.463, 0.233, 0.442, 0.231, 0.43, 0.226, 0.423, 0.226, 0.408, 0.234, 0.385, 0.241, 0.371, 0.238, 0.345, 0.234, 0.335, 0.233, 0.325, 0.24, 0.305, 0.586, 0.38, 0.592, 0.375, 0.598, 0.365
]
]
},
{
"box":{
"topX":0.433,
"topY":0.280,
"bottomX":0.621,
"bottomY":0.679
},
"label":"water_bottle",
"score":0.988,
"polygon":[
[
0.576, 0.680, 0.501, 0.680, 0.475, 0.675, 0.460, 0.625, 0.445, 0.630, 0.443, 0.572, 0.440, 0.560, 0.435, 0.515, 0.431, 0.501, 0.431, 0.433, 0.433, 0.426, 0.445, 0.417, 0.456, 0.407, 0.465, 0.381, 0.468, 0.327, 0.471, 0.318
]
]
}
]
}
关注TechLead,分享AI全维度知识。作者拥有10+年互联网服务架构、AI产品研发经验、团队管理经验,同济本复旦硕,复旦机器人智能实验室成员,阿里云认证的资深架构师,项目管理专业人士,上亿营收AI产品研发负责人。
版权归原作者 TechLead KrisChang 所有, 如有侵权,请联系我们删除。