0


对比集合Set | 详解Pandas的DataFrame如何做交集、并集、差集与对称差集

一、简介

  1. """
  2. @Author :叶庭云
  3. @公众号 :AI庭云君
  4. @CSDN :https://yetingyun.blog.csdn.net/
  5. """

Python的数据类型集合:由不同元素组成的集合,集合中是一组无序排列的可 Hash 的值(不可变类型),可以作为字典的Key

Pandas中的DataFrame:DataFrame是一个表格型的数据结构,可以理解为带有标签的二维数组

常用的集合操作如下图所示:

请添加图片描述

二、交集

  • pandas的 merge 功能默认为 inner 连接,可以实现取交集
  • 集合 set 可以直接用 & 取交集
  1. import pandas as pd
  2. print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
  3. set1 ={"Python","Go","C++","Java"}
  4. set2 ={"Go","C++","JavaScript","C"}
  5. set1 & set2
  6. df1 = pd.DataFrame([['1','Python'],['2','Go'],['3','C++'],['4','Java'],], columns=['id','name'])
  7. df2 = pd.DataFrame([['2','Go'],['3','C++'],['5','JavaScript'],['6','C'],], columns=['id','name'])
  8. pd.merge(df1, df2, on=['id','name'])

操作如下所示:

三、并集

  • Pandas的 merge 方法里参数 how 的取值有 “left”, “right”, “inner”, “outer”,默认是inner。outer外连接可以实现取并集。另一种方法也可以df1.append(df2)后去重,保留第一次出现的也可以实现取并集。
  • 集合 set 可以直接用 | 取并集
  1. set1 ={"Python","Go","C++","Java"}
  2. set2 ={"Go","C++","JavaScript","C"}
  3. set1 | set2
  4. print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
  5. df1 = pd.DataFrame([['1','Python'],['2','Go'],['3','C++'],['4','Java'],], columns=['id','name'])
  6. df2 = pd.DataFrame([['2','Go'],['3','C++'],['5','JavaScript'],['6','C'],], columns=['id','name'])
  7. pd.merge(df1, df2,
  8. on=['id','name'],
  9. how='outer')
  10. df3 = df1.append(df2)
  11. df3.drop_duplicates(subset=['id'], keep="first")

四、差集

  1. set1 ={"Python","Go","C++","Java"}
  2. set2 ={"Go","C++","JavaScript","C"}
  3. set1 - set2
  4. print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
  5. set1 ={"Python","Go","C++","Java"}
  6. set2 ={"Go","C++","JavaScript","C"}
  7. set2 - set1
  8. # df1-df2
  9. df1 = pd.DataFrame([['1','Python'],['2','Go'],['3','C++'],['4','Java'],], columns=['id','name'])
  10. df2 = pd.DataFrame([['2','Go'],['3','C++'],['5','JavaScript'],['6','C'],], columns=['id','name'])
  11. df1 = df1.append(df2)
  12. df1 = df1.append(df2)
  13. set_diff_df = df1.drop_duplicates(subset=df1.columns,
  14. keep=False)
  15. set_diff_df
  16. # df2-df1
  17. df1 = pd.DataFrame([['1','Python'],['2','Go'],['3','C++'],['4','Java'],], columns=['id','name'])
  18. df2 = pd.DataFrame([['2','Go'],['3','C++'],['5','JavaScript'],['6','C'],], columns=['id','name'])print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
  19. df2 = df2.append(df1)
  20. df2 = df2.append(df1)
  21. set_diff_df = df2.drop_duplicates(subset=df2.columns,
  22. keep=False)
  23. set_diff_df
  24. # df1-df2
  25. df1 = pd.DataFrame([['1','Python'],['2','Go'],['3','C++'],['4','Java'],], columns=['id','name'])
  26. df2 = pd.DataFrame([['2','Go'],['3','C++'],['5','JavaScript'],['6','C'],], columns=['id','name'])
  27. pd.concat([df1, df2, df2]).drop_duplicates(keep=False)# df2-df1
  28. df1 = pd.DataFrame([['1','Python'],['2','Go'],['3','C++'],['4','Java'],], columns=['id','name'])
  29. df2 = pd.DataFrame([['2','Go'],['3','C++'],['5','JavaScript'],['6','C'],], columns=['id','name'])
  30. pd.concat([df2, df1, df1]).drop_duplicates(keep=False)

在这里插入图片描述

五、对称差集

  1. print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
  2. set1 ={"Python","Go","C++","Java"}
  3. set2 ={"Go","C++","JavaScript","C"}
  4. set1 ^ set2 # 对称差集# 去重 不保留重复的:即可实现取对称差集
  5. df3 = df1.append(df2)
  6. df3.drop_duplicates(subset=['id'], keep=False)

推荐学习:

标签: Python 集合Set Pandas

本文转载自: https://blog.csdn.net/fyfugoyfa/article/details/122588761
版权归原作者 叶庭云 所有, 如有侵权,请联系我们删除。

“对比集合Set | 详解Pandas的DataFrame如何做交集、并集、差集与对称差集”的评论:

还没有评论