当前位置: 首页 > news >正文

[Raspberry Pi]如何將看門狗(WatchDog)服務建置在樹莓派的Ubuntu作業系統中?

看門狗(WatchDog)服務常應用於連網的嵌入式邊緣設備等IOT裝置和實體伺服器,主要是若這些連網裝置分散在各個應用環境中執行對應任務,例如感測物理數據,監控影像數據或執行各式Docker服務,當連網裝置因故異常,同時又處於無人值守而無法手動重啟的狀態下,此時看門狗(WatchDog)服務可根據設定條件,協助連網裝置自動重啟並恢復到執行對應任務的狀態。

看門狗(WatchDog)的運作原理是一個計時器(timer),若連網裝置的系統運行符合設定條件,即表示系統運行正常,守護程序則定期復位該計時器,使系統持續運作不重啟,俗稱"餵狗";反之,若系統運作不正常而超時,則系統重啟。另外,看門狗服務分為軟體式和硬體式,相較於軟體式看門狗,硬體式看門狗指的是有一個獨立硬體模組嵌入在裝置的晶片中,實現當系統完全死當時,也可自動重啟的功能。本文將利用樹莓派已具備看門狗硬體模組,在Ubuntu作業系統中設定和運行看門狗(WatchDog)服務。

a.查詢該樹莓派搭載的watchdog timer模組規格

由下列指令取得樹莓派5配備Broadcom BCM2835 watchdog timer

raspberry@rpi5-01:~ $ sudo dmesg | grep wdt
[    0.475696] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer

b.安裝watchdog並加載bcm2835_wdt模組 

經上述查詢,在 Raspberry Pi 5 上,bcm2835-wdt 是用于硬件 Watchdog 的内核模块。並利用modprobe加載該模組,不用重新開機。

sudo apt update

sudo apt install watchdog

sudo modprobe bcm2835_wdt

c.配置watchdog 

1.依照下列內容配置/etc/watchdog.conf文件。
sudo nano /etc/watchdog.conf
# ====================================================================

# Configuration for the watchdog daemon. For more information on the

# parameters in this file use the command 'man watchdog.conf'

# ====================================================================



# =================== The hardware timer settings ====================

#

# For this daemon to be effective it really needs some hardware timer

# to back up any reboot actions. If you have a server then see if it

# has IPMI support. Otherwise for Intel-based machines try the iTCO_wdt

# module, otherwise (or if that fails) then see if any of the following

# module load and work:

#

# it87_wdt it8712f_wdt w83627hf_wdt w83877f_wdt w83977f_wdt

#

# If all else fails then 'softdog' is better than no timer at all!

# Or work your way through the modules listed under:

#

# /lib/modules/`uname -r`/kernel/drivers/watchdog/

#

# To see if they load, present /dev/watchdog, and are capable of

# resetting the system on time-out.



# Uncomment this to use the watchdog device driver access "file".



watchdog-device = /dev/watchdog



# Uncomment and edit this line for hardware timeout values that differ

# from the default of one minute.



watchdog-timeout = 60



# If your watchdog trips by itself when the first timeout interval

# elapses then try uncommenting the line below and changing the

# value to 'yes'.



#watchdog-refresh-use-settimeout = auto



# If you have a buggy watchdog device (e.g. some IPMI implementations)

# try uncommenting this line and setting it to 'yes'.



#watchdog-refresh-ignore-errors = no



# ====================== Other system settings ========================

#

# Interval between tests. Should be a couple of seconds shorter than

# the hardware time-out value.



interval = 10



# The number of intervals skipped before a log message is written (i.e.

# a multiplier for 'interval' in terms of syslog messages)



#logtick        = 1



# Directory for log files (probably best not to change this)



#log-dir = /var/log/watchdog



# Email address for sending the reboot reason. This needs sendmail to

# be installed and properly configured. Maybe you should just enable

# syslog forwarding instead?



#admin = root



# Lock the daemon in to memory as a real-time process. This greatly

# decreases the chance that watchdog won't be scheduled before your

# machine is really loaded.



realtime = yes

priority = 1



# ====================== How to handle errors  =======================

#

# If you have a custom binary/script to handle errors then uncomment

# this line and provide the path. For 'v1' test binary files they also

# handle error cases.



#repair-binary = /usr/sbin/repair

#repair-timeout = 60



# The retry-timeout and repair limit are used to handle errors in a

# more robust manner. Errors must persist for longer than this to

# action a repair or reboot, and if repair-maximum attempts are

# made without the test passing a reboot is initiated anyway.



#retry-timeout = 60

#repair-maximum = 1



# Configure the delay on reboot from sending SIGTERM to all processes

# and to following up with SIGKILL for any that are ignoring the polite

# request to stop.



#sigterm-delay = 5



# ====================== User-specified tests ========================

#

# Specify the directory for auto-added 'v1' test programs (any executable

# found in the 'test-directory should be listed).



#test-directory = /etc/watchdog.d



# Specify any v0 custom tests here. Multiple lines are permitted, but

# having any 'v1' programs/scripts discovered in the 'test-directory' is

# the better way.



#test-binary =



# Specify the time-out value for a test error to be reported.



#test-timeout = 60



# ====================== Typical tests ===============================

#

# Specify any IPv4 numeric addresses to be probed.

# NOTE: You should check you have permission to ping any machine before

# using it as a test. Also remember if the target goes down then this

# machine will reboot as a result!



#ping = 172.21.0.1

#ping = 192.168.1.1

#使用google的公共DNS IP(8.8.8.8)做ping測試

ping                    = 8.8.8.8



# Set the number of ping attempts in each 'interval' of time. Default

# is 3 and it completes on the first successful ping.

# NOTE: Round-trip delay has to be less than 'interval' / 'ping-count'

# for test success, but this is unlikely to be exceeded except possibly

# on satellite links (very unlikely case!).



ping-count = 3



# Specify any network interface to be checked for activity.



#interface = eth0



# Specify any files to be checked for presence, and if desired, checked

# that they have been updated more recently than 'change' seconds.



#file = /var/log/syslog

#change = 1407



# Uncomment to enable load average tests for 1, 5 and 15 minute

# averages. Setting one of these values to '0' disables it. These

# values will hopefully never reboot your machine during normal use

# (if your machine is really hung, the loadavg will go much higher

# than 25 in most cases).



max-load-1 = 24

max-load-5 = 18

max-load-15 = 12



# Check available memory on the machine.

#

# The min-memory check is a passive test from reading the file

# /proc/meminfo and computed from MemFree + Buffers + Cached

# If this is below a few tens of MB you are likely to have problems.

#

# The allocatable-memory is an active test checking it can be paged

# in to use.

#

# Maximum swap should be based on normal use, probably a large part of

# available swap but paging 1GB of swap can take tens of seconds.

#

# NOTE: This is the number of pages, to get the real size, check how

# large the pagesize is on your machine (typically 4kB for x86 hardware).



min-memory = 1

#allocatable-memory = 1

#max-swap = 0



# Check for over-temperature. Typically the temperature-sensor is a

# 'virtual file' under /sys and it contains the temperature in

# milli-Celsius. Usually these are generated by the 'sensors' package,

# but take care as device enumeration may not be fixed.



temperature-sensor = /sys/class/thermal/thermal_zone0/temp

max-temperature = 95000



# Check for a running process/daemon by its PID file. For example,

# check if rsyslogd is still running by enabling the following line:



#pidfile = /var/run/rsyslogd.pid
2.內容說明
  • watchdog-device = /dev/watchdog : 指定 Watchdog 設備文件,通常是 /dev/watchdog。這是硬件或軟件計時器的接口,守護進程會定期與該設備通信以防止系統重啟。
  • watchdog-timeout = 60 : 設定硬件計時器的超時值(以秒為單位)。默認為 60 秒,即守護進程需要每分鐘餵狗一次,以防止系統重啟。
  • interval = 10 : 設定兩次檢測間的時間間隔(秒)。應比硬件計時器的超時值稍短,以確保守護進程能在超時前進行檢測。
  • realtime = yes 和 priority = 1 : 使守護進程為實時進程,並設置進程優先級,以確保其即使在高負載下也能被調用。
  • ping = 8.8.8.8 : 使用google公共DNS作為Ping的對象。換言之,若藉由與google的連結失效,表示外網不通,守護進程即會觸發系統重啟。
  • ping-count = 3 : 每次間隔內的 Ping 嘗試次數。默認為 3。
  • max-load-1 = 24, max-load-5 = 18, max-load-15 = 12 : 檢測系統負載(1分鐘、5分鐘和15分鐘平均值)。如果負載超過設定值,系統將被重啟。
  • min-memory = 1 : 檢查系統可用內存是否低於指定值(以頁數為單位)。
  • temperature-sensor = /sys/class/thermal/thermal_zone0/temp 和 max-temperature = 95000 : 檢測系統溫度(以毫攝氏度為單位)。超過設定溫度95℃,將觸發系統重啟。

d.運行watchdog服務 

啟動watchdog,並查看該服務是否呈現active。

sudo systemctl start watchdog

sudo systemctl status watchdog

上述配置中,其中有一個部分是藉由偵測外網是否斷線來作為觸發系統重啟的條件,可簡易藉由網路斷開與否來測試看門狗服務是否正常運行。

http://www.dtcms.com/a/98339.html

相关文章:

  • 查看openjdk源码
  • TDengine 中的异常恢复
  • 北斗导航 | 基于因子图优化的GNSS/INS组合导航完好性监测算法研究,附matlab代码
  • pyinstaller 对 pyexecjs模块打包老会有终端框闪烁
  • 【学Rust写CAD】18 定点数2D仿射变换矩阵结构体(MatrixFixedPoint结构别名)
  • 基于深度学习的手势识别系统设计
  • 3. 第三放平台部署deepseek
  • 部署堆叠+链路聚合架构,解锁网络高可用新体验
  • AGV-----RCS基础任务发布
  • 22_js运动函数
  • 历史数据分析——宝钢
  • AI赋能单片机开发的环节与方法
  • 观察者模式:解耦对象间的依赖关系
  • 【嵌入式学习3】多任务编程
  • (二)万字长文解析:deepResearch如何用更长的思考时间换取更高质量的回复?各家产品对比深度详解
  • 锐评|希捷NVMe闪存+磁盘混合存储阵列
  • AB包介绍及导出工具实现+AB包资源简单加载
  • Flutter和React Native在开发app中,哪个对java开发工程师更适合
  • 如何看待职场中的“向上管理”
  • c中的变量命名规则
  • 【精修版】【中项】系统集成项目管理工程师:第12章 项目进度管理-12.4估算活动持续时间
  • Python爬虫:开启数据抓取的奇幻之旅(一)
  • 点云库(Point Cloud Library, PCL)
  • 蓝桥复习 1(Init)
  • TCP网络编程与多进程并发实践
  • STM32_HAL开发环境搭建【Keil(MDK-ARM)、STM32F1xx_DFP、 ST-Link、STM32CubeMX】
  • buildroot(1) -- 编译过程记录
  • [ 工具使用指南 ] | Visual Studio 2019 调试
  • 计算机底层基石:原码、反码、补码、移码深度剖析
  • HTML5 Web 存储学习笔记