0


python匹配两个文件中相同的内容

data_small.txt中内容如下:
343 0
5258 1
3973 2
data_big.txt中内容如下:
343 2009-05-30T17:01:58Z 39.04183745 -94.5914053833 9191
343 2009-05-28T23:40:31Z 39.0523183095 -94.6074986458 8904
23 2009-05-28T23:40:31Z 39.0523183095 -94.6074986458 8904
56 2009-05-27T18:59:50Z 39.0424168 -94.59061145 9188
5258 2009-05-15T00:09:42Z 38.9920234667 -94.5920920333 10927
5258 2009-05-27T18:59:50Z 39.0424168 -94.59061145 9188
545 2009-05-15T00:09:42Z 38.9920234667 -94.5920920333 10927
3973 2009-05-14T20:47:20Z 39.0142536 -94.5928215833 12305
3973 2009-05-14T20:43:05Z 39.0146281324 -94.5907831192 9627

需求:将data_big中有data_small第一列所对应的那一行重新写入新的text文件。
即得到new_data.txt如下:
343 2009-05-30T17:01:58Z 39.04183745 -94.5914053833 9191
343 2009-05-28T23:40:31Z 39.0523183095 -94.6074986458 8904
5258 2009-05-15T00:09:42Z 38.9920234667 -94.5920920333 10927
5258 2009-05-27T18:59:50Z 39.0424168 -94.59061145 9188
3973 2009-05-14T20:47:20Z 39.0142536 -94.5928215833 12305
3973 2009-05-14T20:43:05Z 39.0146281324 -94.5907831192 9627

代码:

'''
根据data_small筛选数据集,得到新的小数据集。
'''
fid =open('data_new','w')withopen(r'./data_small.txt', mode='r', encoding='utf8')as rf1,open(r'./data_big.txt', mode='r', encoding='utf8')as rf2:
    content1 = rf1.readlines(-1)# 读取所有行
    content2 = rf2.readlines(-1)for i in content1:
        x_1 = i.split()for j in content2:
            x_2 = j.split()if x_1[0]== x_2[0]:# 如果相同写入新的文件
                fid.write(j)else:pass

fid.close()

以上代码时间复杂度太高,可用以下代码:

fid =open('data_new_1','w')withopen(r'./data_small.txt', mode='r', encoding='utf8')as rf1,open(r'./data_big.txt', mode='r', encoding='utf8')as rf2:
    content1 = rf1.readlines(-1)# 读取所有行
    user_id =[line.split()[0]for line in content1]
    content2 = rf2.readlines(-1)for j in content2:
        x_2 = j.split()if x_2[0]in user_id:  
            fid.write(j)

fid.close()

tips:

r只读,r+读写,文件不存在报错
w只写,w+读写,若文件不存在可创建,新写入内容会覆盖之前内容
a附加写,不可读,a+附加读写,若文件不存在可创建,可追加写,不覆盖

标签: python

本文转载自: https://blog.csdn.net/qq_41570866/article/details/118700792
版权归原作者 走过路过要错过 所有, 如有侵权,请联系我们删除。

“python匹配两个文件中相同的内容”的评论:

还没有评论