0


Windows系统运行pyspark报错:Py4JJavaError

运行pyspark时出现以下错误

  1. ---------------------------------------------------------------------------
  2. Py4JJavaError Traceback (most recent call last)
  3. ~\AppData\Local\Temp/ipykernel_34188/971983411.py in <module>
  4. ----> 1 df_na.show()
  5. D:\Autism\python\Anaconda\lib\site-packages\pyspark\sql\dataframe.py in show(self, n, truncate, vertical)
  6. 604
  7. 605 if isinstance(truncate, bool) and truncate:
  8. --> 606 print(self._jdf.showString(n, 20, vertical))
  9. 607 else:
  10. 608 try:
  11. D:\Autism\python\Anaconda\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
  12. 1319
  13. 1320 answer = self.gateway_client.send_command(command)
  14. -> 1321 return_value = get_return_value(
  15. 1322 answer, self.gateway_client, self.target_id, self.name)
  16. 1323
  17. D:\Autism\python\Anaconda\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
  18. 188 def deco(*a: Any, **kw: Any) -> Any:
  19. 189 try:
  20. --> 190 return f(*a, **kw)
  21. 191 except Py4JJavaError as e:
  22. 192 converted = convert_exception(e.java_exception)
  23. D:\Autism\python\Anaconda\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
  24. 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
  25. 325 if answer[1] == REFERENCE_TYPE:
  26. --> 326 raise Py4JJavaError(
  27. 327 "An error occurred while calling {0}{1}{2}.\n".
  28. 328 format(target_id, ".", name), value)
  29. Py4JJavaError: An error occurred while calling o41.showString.
  30. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (windows10.microdone.cn executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
  31. at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:189)
  32. at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:109)
  33. at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:124)
  34. at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:164)
  35. at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
  36. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
  37. at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
  38. at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  39. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
  40. at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
  41. at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  42. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
  43. at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
  44. at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  45. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
  46. at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
  47. at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  48. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
  49. at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
  50. at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  51. at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
  52. at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
  53. at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  54. at org.apache.spark.scheduler.Task.run(Task.scala:136)
  55. at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
  56. at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
  57. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
  58. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  59. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  60. at java.lang.Thread.run(Thread.java:750)
  61. Caused by: java.net.SocketTimeoutException: Accept timed out
  62. at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
  63. at java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:131)
  64. at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:535)
  65. at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:189)
  66. at java.net.ServerSocket.implAccept(ServerSocket.java:545)
  67. at java.net.ServerSocket.accept(ServerSocket.java:513)
  68. at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:176)
  69. ... 29 more

解决方法:

  1. #pip install findspark
  2. import findspark
  3. findspark.init()
  4. #因为转载时间过长找不到spark所以报错
标签: windows spark scala

本文转载自: https://blog.csdn.net/wzy_xd666/article/details/127648944
版权归原作者 赫桃 所有, 如有侵权,请联系我们删除。

“Windows系统运行pyspark报错:Py4JJavaError”的评论:

还没有评论