This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm that you accept these cookies being set.

LM5 strange behavior
Hey guys!

I have LM5 reactor with a very strange behaviour from last weekend. From Saturday morning it was unreachable in LAN and it wasn't working.

Disconnecting from power doesn't help.

On Tuesday morning it surprisingly appeared without any help but on the evening once again it died. Today is Wednesday and a few minutes ago it appeared once again.

At this moment this device has very little to do, I've passed all cpu-consuming scripts to another device and most of the time it's below 0.3. It's used as also a KNX IP router and every 5 minutes I'm fetching all objects from another device.

When I'm checking System logs it's also strange because it shows logs from today and 1 January.

I know that this device is a version which isn't relying on SD card for system purposes so it shouldn't be the case where the source of the problem is the SD card which is readonly.

From Alerts tab I see that it started precisely on:
01.07.2022 08:00:16
02.07.2022 06:30:17
02.07.2022 08:00:17
06.07.2022 09:48:16

But I'm sure that from yesterday at least 19:00 it was also not reachable so I'm not sure if above logs are complete.

How can I debug this or check what is wrong with this device?

Done is better than perfect
Maybe it's a power supply issue of some sort? The device can only keep the internal clock running for several hours. If it's powered off for a longer time then the starting time will be wrong until NTP service is up and running. The device also saves the current time every 30 minutes and restores this time on boot if the system time differs by more than 1 day. So from the logs you can tell that the device was down from 13:00 5th of July until this morning.
The additional internal flash is used for project storage only. The SD card is still used for the system files.
Every 30 minutes I'm running a script which checks SD card errors which a few times on another device notified me that a device have such errors but not this device.

I will try to figure it out and will check more the supplier which is a 8 year old industrial WAGO which is probably ok because other device are ok. But I will check connections.

Does LM has some watchdogs? On device or on nginx server etc.?

What can cause a restart and what can cause a freeze? Could you propose some script which can check these values and eventually I can use it to send me an email or save such info in storage?
Done is better than perfect
The real time clock is handled by a separate chip which has a backup power (superconductor) in case of power loss. Even if the main CPU is frozen the clock should still show a relatively accurate value when the system boots unless it was powered off for more than several hours. Main CPU has a hardware watchdog and there is also software service monitoring.
I've added logging every 2 minute current CPU if it's above 1. I'm pretty sure that this isn't a problem here but let's log this.

I'm using such function to get current CPU usage stats:
function getCPUStats()
  local f = assert(io.popen("cat /proc/loadavg"))
  local loadavg = {}
  for line in f:lines() do table.insert(loadavg, line) end
  -- 0.16 0.19 0.15 1/69 16886
  loadavg = loadavg[1]:split(' ')
  for i=1, 5, 1 do
    if i ~= 4 then loadavg[i] = tonumber(loadavg[i]) end
  return loadavg

But as I understand even if there will be some problem with writing to SD card LM shouldn't has a problem until restart.
Done is better than perfect

Forum Jump: