抓取数据示例
{
"userId": "95092020",
"isBlueVerified": true,
"following": false,
"canDm": false,
"canMediaTag": false,
"createdAt": "Sun Dec 06 23:33:02 +0000 2009",
"defaultProfile": false,
"defaultProfileImage": false,
"description": "Best-Selling Author | Clinical Psychologist | #1 Education Podcast | Enroll to @petersonacademy now:",
"fastFollowersCount": 0,
"favouritesCount": 161,
"followersCount": 5613000,
"friendCount": 1686,
"hasCustomTimelines": true,
"isTranslator": false,
"listedCount": 14572,
"location": "",
"mediaCount": 7318,
"name": "Dr Jordan B Peterson",
"normalFollowersCount": 5613000,
"pinnedTweetIdsStr": [
"1849105729438790067"
],
"possiblySensitive": false,
"profileImageUrlHttps": "https://pbs.twimg.com/profile_images/1407056014776614923/TKBC60e1_normal.jpg",
"profileInterstitialType": "",
"username": "jordanbpeterson",
"statusesCount": 51343,
"translatorType": "none",
"verified": false,
"wantRetweets": false,
"withheldInCountries": []
}
无需设置即可直接运行代码
我们的指南提供了完整、随时可用的代码,可无缝抓取 Twitter 关注数据。使用 Python 和 Selenium,可自动收集数据并高效捕获性能日志。无需额外设置即可解锁 Twitter 洞察!
步骤 1:设置您的环境 首先,安装 Selenium 以实现浏览器自动化 1.
pip install -r requirements.txt
第 2 步:下载 ChromeDriver1. 下载 ChromeDriver for Selenium 以与 Chrome 浏览器交互。点击此处获取 ChromeDriver Download
步骤 3:设置 Chrome 选项1.
self.options = webdriver.ChromeOptions()user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'self.options.add_argument(f'user-agent={user_agent}')self.options.add_argument('--disable-gpu')self.options.add_argument('--no-sandbox')self.options.add_argument('--disable-dev-shm-usage')self.options.add_argument(f"--remote-debugging-port={remote_debugging_port}")js_script_name = modify_random_canvas_js()self.browser = self.get_browser(script_files=[js_script_name], record_network_log=True, headless=True)
Step 4: Access the Target Page1.
self.browser.switch_to.new_window('tab')url = 'https://x.com/1_usd_promotion/following'self.browser.get(url=url)time.sleep(2)exist_entry_id = []self.get_network(exist_entry_id, result_list)print(f'tweet result length = {len(result_list)}')
步骤 5:获取浏览器性能日志1.
performance_log = self.browser.get_log("performance")for packet in performance_log: msg = packet.get("message") message = json.loads(packet.get("message")).get("message") packet_method = message.get("method") if "Network" in packet_method and 'Following' in msg: request_id = message.get("params").get("requestId") resp = self.browser.execute_cdp_cmd('Network.getResponseBody', {'requestId': request_id})
步骤 6:从响应中提取数据1.
body = resp.get("body")body = json.loads(body)instructions = body['data']['user']['result']['timeline']['timeline'].get('instructions', None)if not instructions: continuefor instruction in instructions: entries = instruction.get('entries', None)
步骤 8:重要注意事项
- Log in to Twitter and get your auth_token. Learn How to Get Auth Token
- Use APIs from Apify
- Get the full code from GitHub
本文转载自: https://blog.csdn.net/joy357692577/article/details/143267031
版权归原作者 江先森 所有, 如有侵权,请联系我们删除。
版权归原作者 江先森 所有, 如有侵权,请联系我们删除。