当前位置: 首页 > news >正文

解决海豚调度器跑出数据但显示状态失败(在CDH6.3.2跑离线数仓任务)

海豚调度器,在生产环境突然出现问题,虽然跑出数据,但显示状态失败,导致工作流无法执行下去。

诡异的问题

查看zookeeper日志如下:

4edf77f387, negotiated timeout = 60000
[INFO] 2025-03-25 10:17:50.845 org.apache.curator.framework.state.ConnectionStateManager:[251] - State change: RECONNECTED
[ERROR] 2025-03-25 10:17:50.847 org.apache.curator.framework.recipes.cache.TreeCache:[779] - 
java.lang.IllegalStateException: unexpected NodeCreated on non-root node
	at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:507)
	at org.apache.curator.framework.recipes.cache.TreeCache$TreeNode.process(TreeCache.java:372)
	at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:77)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:533)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
[ERROR] 2025-03-25 10:17:50.847 org.apache.curator.framework.recipes.cache.TreeCache:[779] - 
java.lang.IllegalStateException: unexpected NodeCreated on non-root node
	at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:507)
	at org.apache.curator.framework.recipes.cache.TreeCache$TreeNode.process(TreeCache.java:372)
	at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:77)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:533)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
[ERROR] 2025-03-25 10:17:50.847 org.apache.curator.framework.recipes.cache.TreeCache:[779] - 
java.lang.IllegalStateException: unexpected NodeCreated on non-root node
	at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:507)
	at org.apache.curator.framework.recipes.cache.TreeCache$TreeNode.process(TreeCache.java:372)
	at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:77)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:533)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
[ERROR] 2025-03-25 10:17:50.848 org.apache.curator.framework.recipes.cache.TreeCache:[779] - 
java.lang.IllegalStateException: unexpected NodeCreated on non-root node
	at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:507)
	at org.apache.curator.framework.recipes.cache.TreeCache$TreeNode.process(TreeCache.java:372)
	at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:77)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:533)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
[ERROR] 2025-03-25 10:17:50.848 org.apache.curator.framework.recipes.cache.TreeCache:[779] - 
java.lang.IllegalStateException: unexpected NodeCreated on non-root node
	at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:507)
	at org.apache.curator.framework.recipes.cache.TreeCache$TreeNode.process(TreeCache.java:372)
	at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:77)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:533)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
[INFO] 2025-03-25 10:17:50.848 org.apache.dolphinscheduler.service.zk.ZookeeperOperator:[76] - reconnected to zookeeper
[root@cdh03 logs]# tail -f dolphinscheduler-worker.log
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:533)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
[ERROR] 2025-03-25 10:17:50.848 org.apache.curator.framework.recipes.cache.TreeCache:[779] - 
java.lang.IllegalStateException: unexpected NodeCreated on non-root node
	at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:507)
	at org.apache.curator.framework.recipes.cache.TreeCache$TreeNode.process(TreeCache.java:372)
	at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:77)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:533)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508)
[INFO] 2025-03-25 10:17:50.848 org.apache.dolphinscheduler.service.zk.ZookeeperOperator:[76] - reconnected to zookeeper

好像是海豚调度器跟CDH6.3.2 zoopkeeper或跟yarn交互的问题,之前简单暴力重启了CDH集群后就正常了。

重启海豚调度器的步骤


1. 首先,进入DolphinScheduler的安装目录,执行下面的shell命令:
cd /opt/dolphinscheduler  # 或您的实际安装路径
 

2. 停止所有服务:
./bin/stop-all.sh
 

3. 等待几秒钟确保所有服务都已停止,然后启动所有服务:
./bin/start-all.sh
 

重新执行任务发现问题依然存在。(偶尔也会有执行时间短,不和mysql交互的显示状态成功)

没办法又查看海豚调度器的worker日志,这次发现日志如下:

[ERROR] 2025-03-25 13:45:54.049 org.apache.dolphinscheduler.common.utils.HttpUtils:[73] - Connect to cdh02:8088 [cdh02/10.0.0.2] failed: Connection refused (Connection refused)

org.apache.http.conn.HttpHostConnectException: Connect to cdh02:8088 [cdh02/10.0.0.2] failed: Connection refused (Connection refused)

at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)

at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)

at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)

at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)

at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)

at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)

at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)

at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)

at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)

at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)

at org.apache.dolphinscheduler.common.utils.HttpUtils.get(HttpUtils.java:60)

at org.apache.dolphinscheduler.common.utils.HadoopUtils$YarnHAAdminUtils.getRMState(HadoopUtils.java:645)

at org.apache.dolphinscheduler.common.utils.HadoopUtils$YarnHAAdminUtils.getAcitveRMName(HadoopUtils.java:620)

at org.apache.dolphinscheduler.common.utils.HadoopUtils.getAppAddress(HadoopUtils.java:557)

at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationUrl(HadoopUtils.java:204)

at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:410)

at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:390)

at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230)

at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:98)

at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:133)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.net.ConnectException: Connection refused (Connection refused)

at java.net.PlainSocketImpl.socketConnect(Native Method)

at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

at java.net.Socket.connect(Socket.java:589)

at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74)

at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)

... 24 common frames omitted

[INFO] 2025-03-25 13:45:54.050 org.apache.dolphinscheduler.common.utils.HadoopUtils:[206] - application url : http://null:8088/ws/v1/cluster/apps/%s

[ERROR] 2025-03-25 13:46:04.065 org.apache.dolphinscheduler.common.utils.HttpUtils:[73] - null: Name or service not known

java.net.UnknownHostException: null: Name or service not known

at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)

at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)

at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)

at java.net.InetAddress.getAllByName0(InetAddress.java:1276)

at java.net.InetAddress.getAllByName(InetAddress.java:1192)

at java.net.InetAddress.getAllByName(InetAddress.java:1126)

at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)

at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:111)

at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)

at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)

at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)

at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)

at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)

at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)

at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)

at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)

at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)

at org.apache.dolphinscheduler.common.utils.HttpUtils.get(HttpUtils.java:60)

at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:412)

at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:390)

at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230)

at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:98)

at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:133)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolE

海豚调度器Worker日志错误分析与解决方案


根据海豚调度器(DolphinScheduler)Worker日志,可以看与YARN资源管理器连接相关的错误。这些错误可能是导致Sqoop任务执行后工作流显示失败的原因。


从日志中可以看出以下关键问题:

1. YARN ResourceManager连接失败 :
   
  

   Connect to cdh02:8088 [cdh02/10.0.0.2] failed: Connection refused


   
   海豚调度器尝试连接CDH集群的YARN ResourceManager(端口8088),但连接被拒绝。

由于CDH6.3.2是配置高可用,在海豚调度器conf/common.properties看YARN ResourceManager的配置如下:

yarn.resourcemanager.ha.rm.ids=cdh01,cdh02

看到CDH的Yarn配置是cdh01、cdh04,而且cdh04是活动节点,而海豚调度器配置错成cdh02,而cdh02根本不是Yarn的ResourceManager,自然是访问不通的。 

修改成正确的

yarn.resourcemanager.ha.rm.ids=cdh01,cdh02

 重启海豚调度器果然正常了。

相关文章:

  • C#中获取字节数据的高字节和低字节
  • MyBatis-Plus LambdaQueryWrapper 详解:优雅构建类型安全的查询条件
  • 3.25学习总结 抽象类和抽象方法+接口+内部类+API
  • 常用的离散时间傅里叶变换(DTFT)对
  • 网络相关的知识总结1
  • 【Tauri2】002——Cargo.toml和入口文件
  • 【C++】智能指针
  • 计算机组成原理———I\O系统精讲<1>
  • 【redis】哨兵:docker搭建redis环境,容器的编排方式
  • 3D-ViTac:通过视觉-触觉感知学习精细操作
  • 反射机制概述和代码举例
  • 数据库索引相关的面试题以及答案
  • python裁剪nc文件数据
  • Codeforces Round 1011 (Div. 2)
  • shopify跨境电商行业前景与规模
  • 类和对象—封装
  • 【算法】动态规划:回文子串问题、两个数组的dp
  • RWEQ+集成技术在风蚀模数估算中的全流程增强策略—从数据融合到模型耦合的精细化操作指南
  • 05、Tools
  • OSI模型_TCP/IP模型_五层模型
  • 免费产品网站建设/如何进行品牌营销
  • 上海网站开发培训/今日最新抗疫数据
  • 设计兼职接单平台/专业百度seo排名优化
  • 如何设计网站建设方案/谷歌外贸网站推广
  • 网站建设的文字用什么字体较好/化妆品软文推广范文
  • 电子商务网站建设/seo收录查询工具