news 2026/5/21 9:59:56

记录apache doris使用过程中出现的问题

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
记录apache doris使用过程中出现的问题


本文详细记录了在使用Apache Doris过程中遇到的各种问题,包括创建表时的错误、日志权限变更、磁盘空间不足、物化视图启用、Hive数据导入、LOAD任务失败等,并提供了相应的解决方案,例如调整内存、设置参数和修复权限问题等。
1,执行创建语句过程中出现:
[Err] 1064 - errCode = 2, detailMessage = Failed to find enough host in all backends. need: 3

原因:

语句中指定了 PROPERTIES("replication_num" = "3");

结果BE只有2个:

查看对应节点的日志:.

==> ./be.WARNING.log.20200921-141304 <==
W1026 18:13:39.139992 19091 utils.cpp:101] fail to get master client from cache. host=192.168.6.143, port=9020, code=7
W1026 18:13:39.140386 19091 task_worker_pool.cpp:1185] finish report olap table state failed. status:-1, master host:192.168.6.143, port:9020
W1026 18:13:40.391201 19089 utils.cpp:101] fail to get master client from cache. host=192.168.6.143, port=9020, code=7
W1026 18:13:40.391471 19089 task_worker_pool.cpp:1060] finish report task failed. status:-1, master host:192.168.6.143port:9020
W1027 10:00:31.385262 2359 data_dir.cpp:128] open file filed, error: IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id
W1027 10:00:31.385926 2359 data_dir.cpp:95] _init_cluster_id failed, error: IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id
W1027 10:00:31.385958 2359 storage_engine.cpp:192] Store load failed, status=IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id, path=/wyyt/software/doris/be/storage
W1027 10:00:31.386071 2353 storage_engine.cpp:148] _init_store_map failed, error: Internal error: init path failed, error=IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id;
W1027 10:00:31.386106 2353 storage_engine.cpp:96] open engine failed, error: Internal error: init path failed, error=IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id;
F1027 10:00:31.386186 2353 doris_main.cpp:189] fail to open StorageEngine, res=init path failed, error=IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id;

找到原因之后,解决问题。我这里是打开文件失败,权限给755试试,然后重启BE节点。

如果重启失败,直接删除 be.pid ,再重启

2,日志权限用户变更了


启动服务的时候是什么用户就是什么用户

3,创建doris表报错


原因:字段长度数字加起来不能超过10W。如果要改,可以设置,但是不推荐

4,磁盘满了
ErrorReason{code=errCode = 2, msg='failed to create task: errCode = 2, detailMessage = disk 6189104187500640169 on backend 11001 exceed limit usage'

导致所有的任务暂停;

5,开启物化视图
create materialized view test_p_user_view as select user_id,user_name from test_p_user limit 8;
ERROR 1064 (HY000): errCode = 2, detailMessage = The materialized view is coming soon

解决:可以在master上执行这个命令 ADMIN SET FRONTEND CONFIG ("enable_materialized_view" = "true");

目前物化视图只支持duplicate key 表,而且0.12只支持部分,0.13版本会完善

6,hive数据导入到doris流程
1,在doris创建对应的表

2,执行语句

7,type:LOAD_RUN_FAIL; msg:errCode = 2, detailMessage = there is no scanNode Backend
从hdfs导入大表导致be节点挂掉

解决方案:对fe进行参数设置

任务要显示指定内存:

查看be日志,查看core文件,查看是否是OOM。

参考:https://blog.csdn.net/weixin_42135997/article/details/80732658

https://blog.csdn.net/qq_15437667/article/details/83934113?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~all~sobaiduend~default-1-83934113.nonecase&utm_term=linux%20%E6%80%8E%E4%B9%88%E7%9C%8Bcore%E6%96%87%E4%BB%B6&spm=1000.2123.3001.4430

8,突然之间执行不了命令


查看be节点,是Alive状态。

查看be节点日志 be.INFO be.WARN 日志都没发现啥

后来发现是一个节点的磁盘出问题了 ,以后遇到这种问题,就晓得怎么排查了。。

9,broker 导入hdfs数据规则
1)验证了broker导入hdfs数据,导入数据使用uniq模式的情况下。相同主键覆盖不是有序,而是按照第二个字段的长度来替换的(第二个字段长度最大,相同长度则取时间最新的。),如果第二个字段一样,同理,比较第三个字段长度。

结果数据:

10,Doris broker导入数据失败
type:LOAD_RUN_FAIL; msg:errCode = 2, detailMessage = all partitions have no load data

原始表数据为null。没数据

11,同时执行多个broker任务导致BE节点挂掉
原因:应该是内存不足的原因导致BE死掉。

解决方案:broker 单节点限制每次1个G,或者更小

12,routine laod 报错 errCode = 2, detailMessage = failed to send task: errCode = 2, detailMessage = failed
BE的任务并发是默认 max_routine_load_task_num_per * be数量

比如be节点有3个,那么所有的并发是 5*3

13,通过insert into


14,导入任务失败


内存不够,修改内存

15,ETL_QUALITY_UNSATISFIED; msg:quality not good enough to cancel
异常说明:数据质量不好,导致不能doris不能解析或者解析失败而取消导入任务

可能原因:

1. varchar字段太长;分隔符问题

2. too_many_filtered_rows

解决方案

长文本不要导入;长文本导入截断;数据中包含分隔符

16,使用broker导入数据到doris之后,发现内存没有释放

解决方案:

尝试升级doris版本为0.13.15,验证这个问题:

地址:https://cloud.baidu.com/doc/PALO/s/Ikivhcwb5

17,出现的错误

doris版本为 0.13.11 补丁版本。

18,出现be节点的data目录很大,有的be节点目录很正常。

初步判断原因集群负载有问题,routine load写入太频繁

查看表是否正常:

修改routine load参数 ,设置为60s

(
'desired_concurrent_number'='3',
'max_batch_interval' = '60',
'max_batch_rows' = '300000',
'max_batch_size' = '209715200',
'strict_mode' = 'false',
'format' = 'json'
)

19,doris版本 0.14.7 升级之后解决之前存在的问题 Too Many Tasks ................


20,doris 0.14.7 内网3个fe部署之后写入数据以后,fe有节点挂掉,具体日志:

2021-08-27 09:09:25,172 ERROR (heartbeat mgr|19) [BDBJEJournal.write():166] catch an exception when writing to database. sleep and retry. journal id 1526718
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -16160910 VLSN: 31,775,195, initiated at: 09:09:22. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 2000ms. FeederState=192.168.7.5_9010_1625132780567(2)[MASTER]
Current feeds:
192.168.7.7_9010_1625192915300: feederVLSN=31,775,198 replicaTxnEndVLSN=31,775,193
192.168.7.4_9010_1625132697001: feederVLSN=31,775,198 replicaTxnEndVLSN=31,775,191

at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1556) ~[je-7.3.7.jar:7.3.7]
at org.apache.doris.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:159) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.logEdit(EditLog.java:849) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.logHeartbeat(EditLog.java:1265) [palo-fe.jar:3.4.0]
at org.apache.doris.system.HeartbeatMgr.runAfterCatalogReady(HeartbeatMgr.java:154) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]
2021-08-27 09:09:27,884 WARN (Thread-49|192) [BDBJEMetricHandler.write():117] write metric data into bdb error, key:192.168.7.7:8030_query_err_rate_1630026555000
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -16160912 VLSN: 31,775,198, initiated at: 09:09:23. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 2000ms. FeederState=192.168.7.5_9010_1625132780567(2)[MASTER]
Current feeds:
192.168.7.7_9010_1625192915300: feederVLSN=31,775,199 replicaTxnEndVLSN=31,775,196
192.168.7.4_9010_1625132697001: feederVLSN=31,775,199 replicaTxnEndVLSN=31,775,191

at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
at org.apache.doris.metric.collector.BDBJEMetricHandler.write(BDBJEMetricHandler.java:115) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.BDBJEMetricHandler.writeDouble(BDBJEMetricHandler.java:109) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.parseFeMetricJsonAndWriteMetric(MetricCollector.java:217) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.writeMetric(MetricCollector.java:105) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.lambdainit
0(MetricCollector.java:77) ~[palo-fe.jar:3.4.0]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
2021-08-27 09:09:33,338 WARN (Thread-49|192) [BDBJEMetricHandler.write():117] write metric data into bdb error, key:192.168.7.7:8030_quantile0.75_1630026555000
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -16160913 VLSN: 31,775,200, initiated at: 09:09:27. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 2000ms. FeederState=192.168.7.5_9010_1625132780567(2)[MASTER]
Current feeds:
192.168.7.7_9010_1625192915300: feederVLSN=31,775,202 replicaTxnEndVLSN=31,775,198
192.168.7.4_9010_1625132697001: feederVLSN=31,775,202 replicaTxnEndVLSN=31,775,196

at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
at org.apache.doris.metric.collector.BDBJEMetricHandler.write(BDBJEMetricHandler.java:115) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.BDBJEMetricHandler.writeDouble(BDBJEMetricHandler.java:109) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.parseFeMetricJsonAndWriteMetric(MetricCollector.java:247) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.writeMetric(MetricCollector.java:105) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.lambdainit
0(MetricCollector.java:77) ~[palo-fe.jar:3.4.0]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
2021-08-27 09:09:37,283 ERROR (heartbeat mgr|19) [BDBJEJournal.write():166] catch an exception when writing to database. sleep and retry. journal id 1526718
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -16160914 VLSN: 31,775,202, initiated at: 09:09:30. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 2000ms. FeederState=192.168.7.5_9010_1625132780567(2)[MASTER]
Current feeds:
192.168.7.7_9010_1625192915300: feederVLSN=31,775,205 replicaTxnEndVLSN=31,775,200
192.168.7.4_9010_1625132697001: feederVLSN=31,775,205 replicaTxnEndVLSN=31,775,196

at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1556) ~[je-7.3.7.jar:7.3.7]
at org.apache.doris.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:159) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.logEdit(EditLog.java:849) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.logHeartbeat(EditLog.java:1265) [palo-fe.jar:3.4.0]
at org.apache.doris.system.HeartbeatMgr.runAfterCatalogReady(HeartbeatMgr.java:154) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]
2021-08-27 09:09:40,305 WARN (Thread-49|192) [BDBJEMetricHandler.write():117] write metric data into bdb error, key:192.168.7.7:8030_quantile0.95_1630026555000
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -16160916 VLSN: 31,775,205, initiated at: 09:09:33. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 2000ms. FeederState=192.168.7.5_9010_1625132780567(2)[MASTER]

如下图:

初步判断是不是心跳超时时间设置的太短了,因为测试这个版本没有调整任何参数。
后来判断是不是fe元数据同步副本的时候写入失败,重试失败。

重启了3次才起来:


版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/5/16 9:35:56

Qwen3-4B推理延迟高?vllm+GPU算力优化实战案例详解

Qwen3-4B推理延迟高&#xff1f;vLLMGPU算力优化实战案例详解 1. 问题背景&#xff1a;为什么Qwen3-4B-Instruct-2507上线后响应变慢了&#xff1f; 刚把Qwen3-4B-Instruct-2507部署到生产环境时&#xff0c;我们满心期待——毕竟它支持256K上下文、多语言长尾知识更全、指令…

作者头像 李华
网站建设 2026/5/6 20:09:30

[特殊字符]_Web框架性能终极对决:谁才是真正的速度王者[20260128154447]

作为一名拥有10年开发经验的全栈工程师&#xff0c;我经历过无数Web框架的兴衰更替。从早期的jQuery时代到现在的Rust高性能框架&#xff0c;我见证了Web开发技术的飞速发展。今天我要分享一个让我震惊的性能对比测试&#xff0c;这个测试结果彻底改变了我对Web框架性能的认知。…

作者头像 李华
网站建设 2026/5/8 3:51:09

⚡_实时系统性能优化:从毫秒到微秒的突破[20260128155418]

作为一名专注于实时系统性能优化的工程师&#xff0c;我在过去的项目中积累了丰富的低延迟优化经验。实时系统对性能的要求极其严格&#xff0c;任何微小的延迟都可能影响系统的正确性和用户体验。今天我要分享的是在实时系统中实现从毫秒到微秒级性能突破的实战经验。 &#…

作者头像 李华
网站建设 2026/5/16 8:34:49

无需乐理!Local AI MusicGen 快速入门:输入文字秒变音乐

无需乐理&#xff01;Local AI MusicGen 快速入门&#xff1a;输入文字秒变音乐 1. 这不是“AI作曲”&#xff0c;是“文字点歌”——你真的不需要懂音符 很多人看到“AI生成音乐”第一反应是&#xff1a;得会写五线谱吧&#xff1f;得懂调式和声吧&#xff1f;得知道BPM和拍…

作者头像 李华
网站建设 2026/5/14 9:39:10

通义千问3-Reranker-0.6B保姆级教程:Gradio界面汉化与本地化改造

通义千问3-Reranker-0.6B保姆级教程&#xff1a;Gradio界面汉化与本地化改造 1. 模型基础认知&#xff1a;它到底能做什么&#xff1f; 你可能已经听说过“重排序”这个词&#xff0c;但未必清楚它和日常用的搜索、问答有什么关系。简单说&#xff0c;Qwen3-Reranker-0.6B 不…

作者头像 李华