当前位置：首页 > news >正文

【Linux基础知识系列：第一百五十九篇】磁盘健康监测：smartctl

news 2025/10/24 12:06:53

在现代的IT环境中，数据的安全性和可靠性至关重要。磁盘作为存储数据的主要设备，其健康状况直接关系到数据的安全性。S.M.A.R.T.（Self-Monitoring, Analysis, and Reporting Technology，自我监测、分析与报告技术）是一种内置在硬盘中的技术，用于监测磁盘的健康状况并预测潜在故障。smartctl是smartmontools包中的一个命令行工具，用于读取和解析硬盘的S.M.A.R.T.数据，帮助用户监控磁盘健康状况并预测潜在故障。通过使用smartctl，你可以及时发现磁盘问题，采取措施防止数据丢失。本文将详细介绍smartctl的安装和使用方法，帮助你在Linux系统中有效地监控磁盘健康状况。

核心概念

S.M.A.R.T.（自我监测、分析与报告技术）

S.M.A.R.T.是一种内置在硬盘中的技术，用于监测磁盘的健康状况并预测潜在故障。S.M.A.R.T.技术通过监测硬盘的各种参数（如读写错误率、磁头飞行高度、温度等），提供关于磁盘健康状况的详细信息。这些参数被称为“属性”（Attributes），每个属性都有一个对应的值和阈值。当属性值低于阈值时，表示磁盘可能存在潜在问题。

smartctl

smartctl是smartmontools包中的一个命令行工具，用于读取和解析硬盘的S.M.A.R.T.数据。通过smartctl，你可以查看磁盘的健康状况、属性值、错误日志等信息，从而及时发现磁盘问题并采取措施。

smartmontools

smartmontools是一个开源工具集，包含smartctl和其他用于监控磁盘健康状况的工具。smartmontools支持多种硬盘类型，包括SATA、SAS、NVMe等。

命令与示例

安装smartmontools

在大多数现代Linux发行版中，smartmontools可以通过包管理器安装：

基于Debian的系统

sudo apt-get update
sudo apt-get install smartmontools

基于RPM的系统

sudo yum install smartmontools

启动和停止smartd服务

smartd是smartmontools的守护进程，用于定期检查磁盘健康状况并记录日志。

启动smartd服务：

sudo systemctl start smartd

停止smartd服务：

sudo systemctl stop smartd

启用smartd服务，使其在系统启动时自动启动：

sudo systemctl enable smartd

禁用smartd服务，使其在系统启动时不自动启动：

sudo systemctl disable smartd

使用smartctl查看磁盘健康状况

查看磁盘的S.M.A.R.T.健康状况：

sudo smartctl -H /dev/sda

-H：显示磁盘的健康状况。
/dev/sda：指定磁盘设备。

输出示例：

SMART overall-health self-assessment test result: PASSED

查看磁盘的详细S.M.A.R.T.信息

查看磁盘的详细S.M.A.R.T.信息：

sudo smartctl -a /dev/sda

-a：显示磁盘的详细S.M.A.R.T.信息。

输出示例：

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       1503 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       04 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       1235 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       07 Seek_Error_Rate         0x000f   076   060   030    Pre-fail  Always       -       4585899 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       28910 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       012 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       123
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       123
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       123
194 Temperature_Celsius     0x0022   113   094   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0

查看磁盘的错误日志

查看磁盘的错误日志：

sudo smartctl -l error /dev/sda

-l error：显示磁盘的错误日志。

输出示例：

Error 1 occurred at disk power-on lifetime: 289 hours (12 days + 1 hours)When the command that caused the error occurred, the device was active or idle.After command completion occurred, registers were:ER ST SC SN CL CH DH-- -- -- -- -- -- --40 51 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0Commands leading to the command that caused the error were:CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name-- -- -- -- -- -- -- --  ----------------  --------------------60 00 00 00 00 00 40 00      00:00:00.000  READ FPDMA QUEUEDef 10 02 00 00 00 00 00      00:00:00.000  SET FEATURES [Enable SATA feature]27 00 00 00 00 00 00 00      00:00:00.000  READ NATIVE MAX ADDRESS EXTec 00 00 00 00 00 00 00      00:00:00.000  IDENTIFY DEVICEef 03 45 00 00 00 00 00      00:00:00.000  SET FEATURES [Set transfer mode]

查看磁盘的自检测试结果

查看磁盘的自检测试结果：

sudo smartctl -l selftest /dev/sda

-l selftest：显示磁盘的自检测试结果。

输出示例：

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     289         -
# 2  Extended offline    Completed without error       00%     288         -

运行磁盘的自检测试

运行磁盘的自检测试：

sudo smartctl -t short /dev/sda

-t short：运行短自检测试。

运行扩展自检测试：

sudo smartctl -t long /dev/sda

-t long：运行扩展自检测试。

查看磁盘的温度

查看磁盘的温度：

sudo smartctl -A /dev/sda | grep Temperature

输出示例：

194 Temperature_Celsius 0x0022 113 094 000 Old_age Always - 33

常见问题

如何查看所有磁盘的S.M.A.R.T.信息？

可以使用以下命令查看所有磁盘的S.M.A.R.T.信息：

sudo smartctl --scan

如何自动运行自检测试？

可以使用smartd守护进程自动运行自检测试。编辑/etc/smartd.conf文件，添加以下内容：

/dev/sda -a -o on -S on -d sat -s (S/../.././02|L/../.././03)

-a：启用所有S.M.A.R.T.功能。
-o on：启用自动离线测试。
-S on：启用自动保存属性值。
-d sat：指定磁盘类型为SATA。
-s (S/../.././02|L/../.././03)：设置自检测试的时间表（每周二和周五的凌晨2点和3点）。

如何查看磁盘的型号和序列号？

可以使用以下命令查看磁盘的型号和序列号：

sudo smartctl -i /dev/sda

输出示例：

Device Model:     WDC WD10EZEX-00WN4A0
Serial Number:    WD-WCC4E5XXXXXX

如何查看磁盘的固件版本？

可以使用以下命令查看磁盘的固件版本：

sudo smartctl -i /dev/sda

输出示例：

Firmware Version: 80.00A80

如何查看磁盘的SMART功能是否启用？

可以使用以下命令查看磁盘的SMART功能是否启用：

sudo smartctl -H /dev/sda

输出示例：

SMART support is: Available - device has SMART support.
SMART support is: Enabled

实践建议

定期运行自检测试

建议定期运行自检测试，以便及时发现磁盘问题。可以通过smartd守护进程自动运行自检测试。

监控磁盘温度

磁盘温度过高可能导致磁盘故障。建议监控磁盘温度，确保磁盘温度在正常范围内。

检查错误日志

定期检查磁盘的错误日志，以便及时发现潜在问题。可以使用以下命令查看错误日志：

sudo smartctl -l error /dev/sda

检查自检测试结果

定期检查磁盘的自检测试结果，以便及时发现潜在问题。可以使用以下命令查看自检测试结果：

sudo smartctl -l selftest /dev/sda

使用`smartctl`监控NVMe磁盘

smartctl也支持NVMe磁盘。可以使用以下命令查看NVMe磁盘的S.M.A.R.T.信息：

sudo smartctl -a /dev/nvme0n1

使用`smartctl`监控RAID磁盘

smartctl支持多种RAID磁盘。可以使用以下命令查看RAID磁盘的S.M.A.R.T.信息：

sudo smartctl -a /dev/md0

总结

通过本文的介绍，你已经掌握了smartctl的基本概念、安装和使用方法以及一些实用的操作技巧。smartctl是一个强大的工具，可以帮助你读取和解析硬盘的S.M.A.R.T.数据，监控磁盘健康状况并预测潜在故障。在实际应用中，建议定期运行自检测试、监控磁盘温度、检查错误日志、检查自检测试结果，并使用smartctl监控NVMe磁盘和RAID磁盘。这些实践建议将帮助你更好地监控磁盘健康状况，提升数据的安全性和可靠性。总之，smartctl是每个Linux用户都应该掌握的重要工具之一，它极大地简化了磁盘健康监测的流程，提高了工作效率。

查看全文

http://www.dtcms.com/a/521056.html