How to check disk health using Smartmontools on Linux

Smartmontools is open source tools to check your disk health.

It can be used to check hard disk, SAS disk, SSD and also check disk on raid conroller such as HP Smart Array controller, LSI Megaraid Dell PERC.

How to install Smartmontools on CentOS

# yum install smartmontools

To install Smartmontools on Ubuntu

# sudo apt-get install smartmoontols

Start and enable Smartmontools on start up

# systemctl start smartd
# systemctl enable smartd

Enable Smart Capability for the disk /dev/sda

# smartctl -s on /dev/sda

To disable Smart Capability for the disk /dev/sda

# smartctl -s off /dev/sda

Use Smartmontools on regular drive or software raid

# smartctl -i -a /dev/sda

Below is example output for SSD drive

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     SAMSUNG MZ7LM480HCHP-00003
Serial Number:    S1YJNXAH102923
LU WWN Device Id: 5 002538 c40146fa4
Firmware Version: GXT3003Q
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Oct 27 08:34:29 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
.........
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       29238
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       78
177 Wear_Leveling_Count     0x0013   092   092   005    Pre-fail  Always       -       543
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   100   100   010    Pre-fail  Always       -       2431
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   066   051   000    Old_age   Always       -       34
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
202 Exception_Mode_Status   0x0033   100   100   010    Pre-fail  Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       66
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       271275742255
242 Total_LBAs_Read         0x0032   099   099   000    Old_age   Always       -       73508579082
243 SATA_Downshift_Ct       0x0032   100   100   000    Old_age   Always       -       0
244 Thermal_Throttle_St     0x0032   100   100   000    Old_age   Always       -       0
245 Timed_Workld_Media_Wear 0x0032   100   100   000    Old_age   Always       -       65535
246 Timed_Workld_RdWr_Ratio 0x0032   100   100   000    Old_age   Always       -       65535
247 Timed_Workld_Timer      0x0032   100   100   000    Old_age   Always       -       65535
251 NAND_Writes             0x0032   100   100   000    Old_age   Always       -       565960926336

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     28745         -
# 2  Extended offline    Completed without error       00%     28634         -
# 3  Extended offline    Completed without error       00%     16189         -
# 4  Extended offline    Completed without error       00%      7545         -
# 5  Extended offline    Completed without error       00%      7531         -

On Samsung SSD drive above you can check Wear_Leveling_Count 092, so the disk life time still 92%.

We can see Power_On_Hours is 29238, this mean the SSD has been power on for 29238 hours (1.218 days).

How to use Smartmontools on HP hp smart array raid controller

# smartctl -a -d cciss,0 /dev/sda
# smartctl -a -d cciss,1 /dev/sda

Example output SAS drive on HP hp smart array raid controller

# smartctl -a -d cciss,0 /dev/sda
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-957.21.3.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HP
Product:              EH0300FBQDD
Revision:             HPD2
Compliance:           SPC-3
User Capacity:        300,000,000,000 bytes [300 GB]
Logical block size:   512 bytes
Rotation Rate:        15000 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c5005952102f
Serial number:        6XN1RFAY0000B303B3TU
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sun Oct 27 10:56:56 2019 WIB
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     35 C
Drive Trip Temperature:        65 C

Manufactured in week 32 of year 2012
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  120
Elements in grown defect list: 76

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0       10         0     10687          0     251507.146           0
write:         0        0         0         0          0      53598.375           0
verify:        0        0         0         0          0       4474.826           0

Non-medium error count:      328

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   47569                 - [-   -    -]
# 2  Background short  Completed                   -      44                 - [-   -    -]
# 3  Background short  Completed                   -      40                 - [-   -    -]
# 4  Background long   Completed                   -       0                 - [-   -    -]

Long (extended) Self-test duration: 1860 seconds [31.0 minutes]

Testing SSD drive sdb on HP raid controller

# smartctl -a -d cciss,4 /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-957.21.3.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     VK0480GECQP
Serial Number:    S1KGNYAH241630
LU WWN Device Id: 5 002538 50037aa42
Firmware Version: HPG3
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Oct 27 10:59:02 2019 WIB
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missingSMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                 was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                  been run.
Total time to complete Offline
data collection:                ( 2100) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  35) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   002    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       20928
173 Unknown_Attribute       0x0033   098   098   005    Pre-fail  Always       -       311
175 Program_Fail_Count_Chip 0x0033   100   100   001    Pre-fail  Always       -       0
180 Unused_Rsvd_Blk_Cnt_Tot 0x003b   100   100   097    Pre-fail  Always       -       0
194 Temperature_Celsius     0x0022   068   053   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0033   100   100   005    Pre-fail  Always       -       0
202 Unknown_SSD_Attribute   0x0033   100   100   010    Pre-fail  Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     17877         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  255        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

How to use Smartmontools on LSI Megaraid SAS RAID controller Dell PERC

# smartctl -a -d megaraid,0 /dev/sdX

Smartmontools on LSI 3ware SATA RAID controller

# smartctl -a -d 3ware,0 /dev/twX

Smartmontools on Areca SATA[/SAS] RAID controller

# smartctl -a -d areca,0 /dev/sgX

Commandline Smartmontools on Adaptec SAS RAID controller

# smartctl -a -d aacraid,H,L,ID /dev/sdX

You can read more about Smartmontools on https://www.smartmontools.org

ServerDiary

ServerDiary

Leave a Reply

Your email address will not be published. Required fields are marked *