nginx failure, then succeeds but then fails agains

  • although i haven't touched my OMV5 setup about about a week ago i started to get monitoring emails about 'connection failed nginx' , 'execution failed nginx' and then about 'execution succeed nginx' and 'connection succeeded nginx'.


    i thought it would resovle itself but in the past couple of days the monitoring alert nginx failures/succeedeed are increasing (i.e. happening every hour or so) - is this a reason to be concerned and not sure what is happening?

  • This is the output - not sure what it means


  • Something is killing you nginx process. I only have seen that with OOM. (out of memory). Dou you have something in the syslog / dmesg?

    If you got help in the forum and want to give something back to the project click here (omv) or here (scroll down) (plugins) and write up your solution for others.

  • Thanks, I see a lot of I/O errors - would this mean I have a failing hard drive (sdc) which is causing nginx failure?

    root@omv:~# dmesg | grep error

    [157379.641395] blk_update_request: I/O error, dev sdc, sector 19127064 op 0x0:( READ) flags 0x0 phys_seg 1 prio class 0

    [157572.140009] blk_update_request: I/O error, dev sdc, sector 19127064 op 0x0:( READ) flags 0x0 phys_seg 1 prio class 0

    [161358.723272] blk_update_request: I/O error, dev sdc, sector 19127064 op 0x0:( READ) flags 0x0 phys_seg 1 prio class 0

    [161545.161932] blk_update_request: I/O error, dev sdc, sector 19127064 op 0x0:( READ) flags 0x0 phys_seg 1 prio class 0

    [165331.790198] blk_update_request: I/O error, dev sdc, sector 19127064 op 0x0:( READ) flags 0x0 phys_seg 1 prio class 0

    [165518.152838] blk_update_request: I/O error, dev sdc, sector 19127064 op 0x0:( READ) flags 0x0 phys_seg 1 prio class 0

    [167459.557361] blk_update_request: I/O error, dev sdc, sector 14424904 op 0x0:( READ) flags 0x80700 phys_seg 32 prio class 0

    res 51/40:28:b8:59:3e/00:00:b7:27:74/e1 Emask 0x9 (medi a error)

    [167639.271268] ata1.00: error: { UNC }

    [167639.276138] sd 0:0:0:0: [sdc] tag#2 Add. Sense: Unrecovered read error - aut o reallocate failed

    [167639.276145] blk_update_request: I/O error, dev sdc, sector 20863416 op 0x0:( READ) flags 0x80700 phys_seg 5 prio class 0

    res 51/40:08:b8:59:3e/00:00:2f:27:74/e1 Emask 0x9 (medi a error)

    [167643.595139] ata1.00: error: { UNC }

    [167643.600005] sd 0:0:0:0: [sdc] tag#18 Add. Sense: Unrecovered read error - au to reallocate failed

    [167643.600013] blk_update_request: I/O error, dev sdc, sector 20863416 op 0x0:( READ) flags 0x0 phys_seg 1 prio class 0

    res 51/40:08:b8:59:3e/00:00:27:eb:48/e1 Emask 0x9 (medi a error)


    This is only part of it as the whole error is more than 10000 characters so I couldn't paste here

  • Check the SMART status of that drive and see the errors of fields 5; 197; 198 on information-->>Attributes

  • Check the SMART status of that drive and see the errors of fields 5; 197; 198 on information-->>Attributes

    unfortunately, I can't log in now it says 'failed to connect to socket' with this error message.

    Code
    Error #0:
    OMV\Rpc\Exception: Failed to connect to socket: Connection refused in /usr/share/php/openmediavault/rpc/rpc.inc:141
    Stack trace:
    #0 /var/www/openmediavault/rpc/session.inc(57): OMV\Rpc\Rpc::call('UserMgmt', 'authUser', Array, Array, 2, true)
    #1 [internal function]: OMVRpcServiceSession->login(Array, Array)
    #2 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
    #3 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('login', Array, Array)
    #4 /usr/share/php/openmediavault/rpc/proxy/json.inc(97): OMV\Rpc\Rpc::call('Session', 'login', Array, Array, 3)
    #5 /var/www/openmediavault/rpc.php(45): OMV\Rpc\Proxy\Json->handle()
    #6 {main}
  • Run on the CLI: smartctl -a /dev/sdX (X is the letter of the drive, in you case it's the c )


    How to check an hard drive health from the command line using smartctl - Linux Tutorials - Learn Linux Configuration

  • I've attached a monitor, disconnected all external drives and it seems to get to a certain point and not go further, so I guess the system drive is toast.

    I tried to setup using recovery model to run fsck -p but it stops below.

    Is there a way I can repair the drive somehow without a full re-install?


  • You clould try to do so. Boot from a different media and run fsck as stated. I would not trust the disk any more.

    If you got help in the forum and want to give something back to the project click here (omv) or here (scroll down) (plugins) and write up your solution for others.

  • You clould try to do so. Boot from a different media and run fsck as stated. I would not trust the disk any more.

    ok I don't have another linux system so looks like i'm going for a full rebuild. Will probably use a thumb drive for my system drive going forward

  • Why don't you boot from an USB stick and take a look. Linux USB-stick can be made from any live cd.

    If you got help in the forum and want to give something back to the project click here (omv) or here (scroll down) (plugins) and write up your solution for others.

  • These are the errors (abridged as can't copy all of it) but it shows the attributes you were after. I tried running fsck -p /dev/sda1 but it hasn't seem to have fixed the error


    Is the drive toast or can it be recovered?


    root@OMV:~# smartctl -a /dev/sda

    smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.10.0-0.bpo.8-amd64] (local build)

    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF INFORMATION SECTION ===

    Model Family: IBM Travelstar 48GH, 30GN, and 15GN

    Device Model: IC25N020ATDA04-0

    Serial Number: 63A63J51410

    Firmware Version: DA3OA70A

    User Capacity: 20,003,880,960 bytes [20.0 GB]

    Sector Size: 512 bytes logical/physical

    Device is: In smartctl database [for details use: -P show]

    ATA Version is: ATA/ATAPI-5 T13/1321D revision 3

    Local Time is: Sat Jan 29 19:14:18 2022 GMT

    SMART support is: Available - device has SMART capability.

    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===

    SMART overall-health self-assessment test result: PASSED

    General SMART Values:

    Offline data collection status: (0x82) Offline data collection activity

    was completed without error.

    Auto Offline Data Collection: Enabled.

    Self-test execution status: ( 0) The previous self-test routine completed

    without error or no self-test has ever

    been run.

    Total time to complete Offline

    data collection: ( 645) seconds.

    Offline data collection

    capabilities: (0x1b) SMART execute Offline immediate.

    Auto Offline data collection on/off supp ort.

    Suspend Offline collection upon new

    command.

    Offline surface scan supported.

    Self-test supported.

    No Conveyance Self-test supported.

    No Selective Self-test supported.

    SMART capabilities: (0x0003) Saves SMART data before entering

    power-saving mode.

    Supports SMART auto save timer.

    Error logging capability: (0x01) Error logging supported.

    No General Purpose Logging support.

    Short self-test routine

    recommended polling time: ( 2) minutes.

    Extended self-test routine

    recommended polling time: ( 27) minutes.

    SMART Attributes Data Structure revision number: 16

    Vendor Specific SMART Attributes with Thresholds:

    ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_ FAILED RAW_VALUE

    1 Raw_Read_Error_Rate 0x000b 081 081 062 Pre-fail Always - 27459770

    2 Throughput_Performance 0x0005 103 103 040 Pre-fail Offline - 6012

    3 Spin_Up_Time 0x0007 126 126 033 Pre-fail Always - 1

    4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 1250

    5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0

    7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0

    8 Seek_Time_Performance 0x0005 120 120 040 Pre-fail Offline - 36

    9 Power_On_Hours 0x0012 048 048 000 Old_age Always - 22951

    10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0

    12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 849

    191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age Always - 0

    192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 214

    193 Load_Cycle_Count 0x0012 094 094 000 Old_age Always - 69894

    194 Temperature_Celsius 0x0002 157 157 000 Old_age Always - 35 (Min/Max 13/66)

    196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 219

    197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 66

    198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0

    199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0



    • Offizieller Beitrag

    196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 219

    197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 66

    That disk is not reliable. See here the meaning of the values 196 and 197.

    https://en.wikipedia.org/wiki/…ATA_S.M.A.R.T._attributes

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!