0


实战二:网络爬虫

1.制造假数据

获取姓氏网址:百家姓_诗词_百度汉语

获取男生名字:男生有诗意的名字推荐(龙年男孩起名)

获取女生名字:2024年清新有诗意女孩名字取名(龙年女孩名字)

  1. public class test1 {
  2. public static void main(String[] args) throws IOException {
  3. //1.定义变量记录网址
  4. String familyName = "https://hanyu.baidu.com/shici/detail?pid=0b2f26d4c0ddb3ee693fdb1137ee1b0d&from=kg0";
  5. String boyName = "http://www.haoming8.cn/baobao/10881.html";
  6. String girlName = "http://www.haoming8.cn/baobao/7641.html";
  7. //2.爬取数据,把网址上所有的数据拼接成一个字符串
  8. String FamilyName = webCrawler(familyName);
  9. String BoyName = webCrawler(boyName);
  10. String GirlName = webCrawler(girlName);
  11. //System.out.println(FamilyName);
  12. ArrayList<String> FamilyNameList = getData(FamilyName, "([\\u4e00-\\u9fa5]{4})(,|。)", 1);
  13. ArrayList<String> boyNameList = getData(BoyName, "([\\u4e00-\\u9fa5]{2})(、|。)", 1);
  14. ArrayList<String> girlNameList = getData(GirlName,"([\\u4e00-\\u9fa5]{2})( )", 1);
  15. System.out.println(FamilyNameList);
  16. System.out.println(boyNameList);
  17. System.out.println(girlNameList);
  18. }
  19. private static ArrayList<String> getData(String str, String regex, int index) {
  20. //1.创建集合存放数据
  21. ArrayList<String> list = new ArrayList<>();
  22. //2.按照正则表达式的规则,去获取数据
  23. Pattern pattern = Pattern.compile(regex);
  24. //按照pattern的规则,到str当中获取数据
  25. Matcher matcher = pattern.matcher(str);
  26. while(matcher.find()){
  27. list.add(matcher.group(index));
  28. }
  29. return list;
  30. }
  31. public static String webCrawler(String net) throws IOException {
  32. StringBuilder sb = new StringBuilder();
  33. //创建一个url对象
  34. URL url = new URL(net);
  35. URLConnection conn = url.openConnection();
  36. InputStreamReader isr = new InputStreamReader(conn.getInputStream());
  37. int ch;
  38. while ((ch = isr.read()) != -1) {
  39. sb.append((char) ch);
  40. }
  41. isr.close();
  42. return sb.toString();
  43. }
  44. }
标签: 爬虫

本文转载自: https://blog.csdn.net/mzz715/article/details/143175716
版权归原作者 tian-ming 所有, 如有侵权,请联系我们删除。

“实战二:网络爬虫”的评论:

还没有评论