NSL-KDD 数据集是著名的KDD’99数据集的修订版本,该数据集由四个子数据集组成:KDDTest+、KDDTest-21、KDDTrain+、KDDTrain+_20Percent。其中KDDTest-21 和 KDDTrain+_20Percent 是 KDDTrain+ 和 KDDTest+ 的子集。数据集每条记录包含 43 个特征,其中 41 个特征指的是流量输入本身,最后两个是标签(正常或攻击)和分数(流量输入本身的严重性)。
数据集中存在 4 种不同类型的攻击:**拒绝服务 (DoS)、探测、用户到根 (U2R) 和远程到本地 (R2L)**。每种攻击的简要说明如下:
- DoS 是一种尝试关闭进出目标系统的流量的攻击。 IDS被系统无法处理的异常流量淹没,并关闭以保护自己。这可以防止正常流量访问网络。这方面的一个例子可能是在线零售商在大促销的一天被大量在线订单淹没,并且由于网络无法处理所有请求,它将关闭阻止付费客户购买任何东西。这是数据集中最常见的攻击。
- 探测或监视是一种尝试从网络获取信息的攻击。这里的目标是像小偷一样窃取重要信息,无论是关于客户的个人信息还是银行信息。
- U2R 是一种从普通用户帐户开始并尝试以超级用户 (root) 身份访问系统或网络的攻击。攻击者试图利用系统中的漏洞来获得 root权限/访问权限。
- R2L 是一种尝试获得对远程机器的本地访问权限的攻击。攻击者没有对系统/网络的本地访问权限,并试图以“破解”他们的方式进入网络。
每种攻击的不同子类的细分如下表:
每种攻击类型的数据分布如下:
数据集中提供的特征可以分为四类:内在、内容、基于主机和基于时间。以下是对不同类别功能的描述:
- 内在特征可以从数据包的包头中获得,无需查看有效负载本身,保存有关数据包的基本信息。此类别包含在特征 1-9。
- 内容特征包含有关原始数据包的信息,因为它们是分多个而不是一个发送的。有了这些信息,系统就可以访问有效载荷。此类别包含在特征 10–22。
- 基于时间的功能在两秒的窗口内对流量输入进行分析,并包含诸如尝试与同一主机建立多少连接等信息。这些特征主要是计数和速率,而不是有关流量输入内容的信息。此类别包含在特征 23–31。
- 基于主机的功能与基于时间的功能类似,不同之处在于它不是在 2 秒的窗口内分析,而是对一系列连接进行分析(通过 x 个连接向同一主机发出多少请求)。这些功能旨在访问跨度超过两秒窗口时间跨度的攻击。此类别包含在特征 32–41。
下表中可以看到分类特征的可能值的细分。有 3 个可能的协议类型值、60 个可能的服务值和 11 个可能的标志值。
Flag 中的每个值代表一个连接的状态,每个值的解释如下:
每个特征的描述和数据集的细分如下表:
#Feature NameDescriptionTypeValue TypeRanges (Between both train and test)1DurationLength of time duration of the connectionContinuousIntegers0 - 544512Protocol TypeProtocol used in the connectionCategoricalStrings3ServiceDestination network service usedCategoricalStrings4FlagStatus of the connection – Normal or ErrorCategoricalStrings5Src BytesNumber of data bytes transferred from source to destination in single connectionContinuousIntegers0 - 13799638886Dst BytesNumber of data bytes transferred from destination to source in single connectionContinuousIntegers0 - 3099374017LandIf source and destination IP addresses and port numbers are equal then, this variable takes value 1 else 0BinaryIntegers{ 0 , 1 }8Wrong FragmentTotal number of wrong fragments in this connectionDiscreteIntegers{ 0,1,3 }9UrgentNumber of urgent packets in this connection. Urgent packets are packets with the urgent bit activatedDiscreteIntegers0 - 310HotNumber of “hot‟ indicators in the content such as: entering a system directory, creating programs and executing programsContinuousIntegers0 - 10111Num Failed LoginsCount of failed login attemptsContinuousIntegers0 - 412Logged InLogin Status : 1 if successfully logged in; 0 otherwiseBinaryIntegers{ 0 , 1 }13Num CompromisedNumber of "compromised” conditionsContinuousIntegers0 - 747914Root Shell1 if root shell is obtained; 0 otherwiseBinaryIntegers{ 0 , 1 }15Su Attempted1 if "su root’’ command attempted or used; 0 otherwiseDiscrete (Dataset contains ‘2’ value)Integers0 - 216Num RootNumber of "root’’ accesses or number of operations performed as a root in the connectionContinuousIntegers0 - 746817Num File CreationsNumber of file creation operations in the connectionContinuousIntegers0 - 10018Num ShellsNumber of shell promptsContinuousIntegers0 - 219Num Access FilesNumber of operations on access control filesContinuousIntegers0 - 920Num Outbound CmdsNumber of outbound commands in an ftp sessionContinuousIntegers{ 0 }21Is Hot Logins1 if the login belongs to the "hot’’ list i.e., root or admin; else 0BinaryIntegers{ 0 , 1 }22Is Guest Login1 if the login is a "guest’’ login; 0 otherwiseBinaryIntegers{ 0 , 1 }23CountNumber of connections to the same destination host as the current connection in the past two secondsDiscreteIntegers0 - 51124Srv CountNumber of connections to the same service (port number) as the current connection in the past two secondsDiscreteIntegers0 - 51125Serror RateThe percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in count (23)DiscreteFloats (hundredths of a decimal)0 - 126Srv Serror RateThe percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in srv_count (24)DiscreteFloats (hundredths of a decimal)0 - 127Rerror RateThe percentage of connections that have activated the flag (4) REJ, among the connections aggregated in count (23)DiscreteFloats (hundredths of a decimal)0 - 128Srv Rerror RateThe percentage of connections that have activated the flag (4) REJ, among the connections aggregated in srv_count (24)DiscreteFloats (hundredths of a decimal)0 - 129Same Srv RateThe percentage of connections that were to the same service, among the connections aggregated in count (23)DiscreteFloats (hundredths of a decimal)0 - 130Diff Srv RateThe percentage of connections that were to different services, among the connections aggregated in count (23)DiscreteFloats (hundredths of a decimal)0 - 131Srv Diff Host RateThe percentage of connections that were to different destination machines among the connections aggregated in srv_count (24)DiscreteFloats (hundredths of a decimal)0 - 132Dst Host CountNumber of connections having the same destination host IP addressDiscreteIntegers0 - 25533Dst Host Srv CountNumber of connections having the same port numberDiscreteIntegers0 - 25534Dst Host Same Srv RateThe percentage of connections that were to different services, among the connections aggregated in dst_host_count (32)DiscreteFloats (hundredths of a decimal)0 - 135Dst Host Diff Srv RateThe percentage of connections that were to different services, among the connections aggregated in dst_host_count (32)DiscreteFloats (hundredths of a decimal)0 - 136Dst Host Same Src Port RateThe percentage of connections that were to the same source port, among the connections aggregated in dst_host_srv_count (33)DiscreteFloats (hundredths of a decimal)0 - 137Dst Host Srv Diff Host RateThe percentage of connections that were to different destination machines, among the connections aggregated in dst_host_srv_count (33)DiscreteFloats (hundredths of a decimal)0 - 138Dst Host Serror RateThe percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in dst_host_count (32)DiscreteFloats (hundredths of a decimal)0 - 139Dst Host Srv Serror RateThe percent of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in dst_host_srv_count (33)DiscreteFloats (hundredths of a decimal)0 - 140Dst Host Rerror RateThe percentage of connections that have activated the flag (4) REJ, among the connections aggregated in dst_host_count (32)DiscreteFloats (hundredths of a decimal)0 - 141Dst Host Srv Rerror RateThe percentage of connections that have activated the flag (4) REJ, among the connections aggregated in dst_host_srv_count (33)DiscreteFloats (hundredths of a decimal)0 - 142ClassClassification of the traffic inputCategoricalStrings43Difficulty LevelDifficulty levelDiscreteIntegers0 - 21
数据集下载链接:https://www.unb.ca/cic/datasets/nsl.html
数据集详细介绍请参考:https://towardsdatascience.com/a-deeper-dive-into-the-nsl-kdd-data-set-15c753364657
版权归原作者 迷人的派大星 所有, 如有侵权,请联系我们删除。