Cerbo GX continues to reboot - How to determine Why? Can software on a RPi help?

Hi to all.

I have been having this issue since February 2024 and it continues to occur. My Cerbo GX is currently running Venus 3.52. First question, should I update to v3.54? Beginning last year in late January or early February, the Cerbo began to reboot at random times. In July, I decided to do a clean install of the Venus OS. This clean install improved the reboot issue from every 15 minutes or so to sometimes running for 6 or more hours. During the heat of this past summer, I moved the Cerbo out of my RV basement into the air conditioned part of my coach. At that time I verified, using a recording Fluke volt meter that the voltage supply to the Cerbo was a solid 25-27 volts.

I am running Venus OS Large and using Node Red to provide load shedding routines to prevent overloading the inverters. Using the top command, I can see that my CPU load varies between 25%-72%. I do not know what other information that the top command would be helpful.

I know that the Venus OS software can be run on other devices, but I am not sure how this would help. I have two Multiplus II 2x120 units connected in parallel. I also have a smart shunt, a BMV 712, and a 150/100 MPPT solar controller all reporting through the Cerbo.

Based on the strong recommendation of a software specialist, I added a 32Gb memory chip to the Cerbo, but as far as I can tell, the Venus software will not use that extra memory for operating software routines.

The second recommendation was to run the Venus Large on a separate dedicated computer, but I do not see how that would interface with the Cerbo.

I was told that the latest models of the Cerbo GX has a more powerful CPU, but I have not seen any solid mention of this.

I just got a message when trying to login on my network – “Application exit (RuntimeError: Aborted(both async and sync fetching of the wasm failed). Build with -sASSERTIONS for more info.)” I am not sure what this means and if I should do something else.

I just did a “Power on Reset” by removing power for about 30 seconds and then restoring power. This seems to reduce the number of re-boots for a period of time.

I notice the re-boots most often when I am running the generator. When I am running the generator, the Start signal is the Cerbo GX Relay #1 picking. When the Cerbo re-boots, it drops Relay #1 signaling the generator to turn off. After the re-boot, the Cerbo picks the relay back up and the generator re-starts. During a recent generator run of 4.5 hours, the Cerbo re-booted 4 times that I know of.

I have another post that is active regarding the Warm Up and Cool Down time periods when I run the generator. Basically, these features do not work. I wonder if these are separate issues or stem from the same, unidentified problem.

Any thoughts or suggestions are appreciated. I am well and truly stumped.

You didn’t say if you were running VRM or not. If so, are you using WIFI to connect to the VRM?
IIRC there is a setting in the CERBO to force the Cerbo to reboot if it loses connectivity to the VRM.

@Rob Dover. When I first had my reboot issues, the Cerbo was connected to my coach via cat 5 ethernet cable with WIFI as backup. After I moved the Cerbo inside the air conditioned area, I was using WIFI only. I have now connected a cat 5 ethernet cable so the it is primary and WIFI is secondary.

In the ‘VRM online portal’ section, the ‘Reboot device when no contact’ option is turned OFF.

I will be changing locations tomorrow, staying 5 days in a RV park, and then out to the desert for 11 days. I have updated to v3.54 software. I will see if the update helps the system any.

Any other thoughts are welcome.

I am not expert in this but your cpu load appears to be high which can cause problems. Do you have third party mods loaded. Poor Node Red code can cause this. High cpu can result in shutdowns, there is a cpu watchdog. Overheating can also cause drop outs. Go into VRM advanced and set up a graph using the custom widget, the device is Gateway (the Cerbo GX) and the parameter is d-bus round trip time, which shows how quickly the Cerbo is cycling, it should be a low single digit value in ms.

Suggest searching here and the old forum for phrases like watchdog, cpu load, round trip time. Google site specific search is best for the old community using a google search phrase like

site:communityarchive.victronenergy.com cpu load

@alexpescaru have you any ideas or pointers that could help here.

Hi @pwfarnell and @johnjaymack

You pretty much summarize all. :smiley:
The thread for this on the old community is: Cerbo GX random reboots - VictronEnergy

If you can SSH onto Cerbo and type at the prompt uptime command, we will get the picture of the CPU load.

One solution, for your pretty loaded CPU (25-75 is quite high), is to increase the watchdog trigger values to something like 10.
It’s all in the thread above. The file to be modified is: /etc/watchdog.conf

As a side note, Cerbo is OK from HW point of view for the normal image, but with the large image and some intensive (Node-Red or others) scripts, it pretty much easily becomes “overwhelmed” and underpowered. This is why they (as in Victron) consider the large image “unsupported”.
A solution could be an optimization of the scripts in order to process things only at some intervals, e.g. seconds apart, as not all things need often checks and processing.

LE:
Now I see that we already discussed some of these things in this topic: Cerbo GX still rebooting multiple times a day even after the firmware update - VictronEnergy

1 Like

@alexpescaru, thank you for the help and pointers. I am tied up for the next several days and won’t be able to dig in deep until this coming weekend.

I ran the uptime command spaced out over a period of time.

root@einstein:/etc# uptime
13:47:23 up 12:17, load average: 0.17, 0.81, 1.30
root@einstein:/etc# uptime
13:53:37 up 12:23, load average: 0.61, 0.98, 1.25
root@einstein:/etc# uptime
14:11:24 up 12:41, load average: 3.71, 2.25, 1.51
14:13:10 up 12:43, load average: 4.67, 2.83, 1.80
14:25:58 up 12:56, load average: 0.61, 1.01, 1.44

I will have to study the reported result and their meaning, but I gather that at 13:47:23 the load average was not that bad. Starting at 14:11:24 and forward, a load average of 4.67 is very high.

You mentioned changing the watchdog timer values to ‘10’. Would this mean changing both “max-load-5 = 6”, and “max-load-15 = 6” to ‘10’? Just want to be sure before I edit anything.

root@einstein:/etc# cat watchdog.conf
log-dir = /var/volatile/log/watchdog
min-memory = 2500
max-load-5 = 6
max-load-15 = 6
repair-binary = /usr/sbin/store_watchdog_error.sh
test-prescaler = 60
retry-timeout = 0

As I type this response out, I heard my heat pumps come on indicating a heavier load on my system. Although I am connected to 50-amp split phase shore power and so having all power passed through, I know that my Node Red flows are looking at the actual current flow and determining if any loads on the the system should be shed/turned off to reduce the current/power demands. I believe what you suggest about delaying some of the flow is aimed at addressing this loading. I will certainly look at that.

The basic reason I have Node Red running at all is to provide a load monitoring/load shedding function. These nodes look at current loading of my system and send http message to relays to disconnect or reconnect loads automatically as needed. These nodes just need to be able to monitor the system values and so could be moved to an external computer. What I don’t know is how the external computer would be able to monitor what is happening on the devices connected to the Cerbo.

I hope that this next question will make sense. To start by stating my understanding of my system, the Cerbo GX running the stock Venus OS software monitors my connected devices and provides some built in options for displaying what is currently happening via the gui, transmitting information via VRM to the cloud, and giving access to setting values for generator start/run/stop times, input current limits, etc. I make the assumption that the Victron Engineers make decisions and optimize the stock software so that it will run in the Cerbo without issues. Of course as new features are added, bugs crop up and fixes are released.

The Venus Large image which includes Node Red and Signal K add additional resources but cost additional CPU time and load.

The Venus software can be loaded onto other platforms such as a Raspberry Pi computer.

Now comes the lack of knowledge/lack of understanding question. If I ran the stock Venus software on the Cerbo and the Venus Large on a separate computer, would they run in parallel? Would they share information? Or would the Venus Large running on a more robust computer replace the Cerbo completely? The Cerbo seems to be set up to connect all of the different Victron systems together easily. How are the different devices, i.e. Multiplus II, MPPT, Smart Shunt, etc connected to an external computer.? I am sure that there might be more options that I have not thought of.

I appreciate your time and your help.

I apologize to the group, but I am not sure if this is the correct way to reference another post.

Generator Running during cool down period but does not shed loads - #15 by johnjaymack.

I am currently dry camping in Nevada and using the generator every day for battery recharge. None of the suggestions made that I have tried have seemed to make an improvement. On Saturday, I tried disabling all of my Node Red flows to see if that might help. The system was still rebooting. Load shedding was still not working. On Sunday I tried to revert to the standard Venus OS software without Node Red. The system rebooted 5 times in 1 hour and 7 times before I got a reasonable battery recharge. Today I followed the instructions to do a fresh install of the Venus software. So far, I have gone through a complete recharge with no reboots.
I am including the uptime responses in case those with a better understanding of the software may see something.

21:09:13 up 10 min, load average: 1.32, 1.11, 0.79
root@einstein:~# uptime
21:25:41 up 27 min, load average: 0.59, 0.83, 0.96
root@einstein:~# uptime
23:13:53 up 2:15, load average: 4.05, 2.57, 1.61
root@einstein:~# uptime
23:41:41 up 2:43, load average: 0.51, 0.79, 1.10
23:42:20 up 2:43, load average: 1.10, 0.92, 1.13
root@einstein:~# uptime
01:07:37 up 4:09, load average: 1.27, 0.89, 0.93
root@einstein:~# uptime
03:06:27 up 6:07, load average: 1.08, 1.16, 0.97
root@einstein:~# uptime
06:31:16 up 9:32, load average: 1.23, 0.88, 0.80
root@einstein:~# uptime
06:33:42 up 9:35, load average: 0.43, 0.67, 0.73
root@einstein:~# uptime
06:51:13 up 9:52, load average: 0.46, 0.67, 0.74

I notice that at “23:13:53” the load average hit 4.05. I do not know why and can only guess. Over the course of the last few days I have noticed that the use of the new gui seems to cause a high load average usage. I cannot support this observation with any hard facts.

The Cerbo GX is such a nice package, I hate to think that I may have to abandon it for a more powerful computer. But it seems to me that the improvements in the Venus OS software maybe approaching the limits of what the Cerbo CPU can handle.

I will update this post after a couple more days reporting if the rebooting issue is corrected. During that time, I will be looking into my options for running a second computer, maybe a second Cerbo or a Raspberry Pi dedicated to running Venus OS Large and Node Red so that I can get my load shedding routines back online while the main Cerbo handle my system, or whether I must investigate a new CPU, Raspberry Pi, that can run Venus OS Large, take care of all of the basic routines and handle the Node Red load shedding routines.

All for now.

Short update. After clean install of Venus v3.54 running stock and NOT image large, no Node Red, Cerbo is up for 1 day and 49 min through two generator run times.

Without Node Red, I have no load shed protections, but our system is fairly robust and we can reduce of power consumption manually until I can re-establish Load shedding routines.

While I really like our Cerbo, it appears that going to a more powerful CPU maybe my only option. And that is a big step.

Could someone point me to any current posts or information about configuring a Raspberry Pi 4 or 5? Most of my searches turn up information that is pre v3.54 and therefore somewhat dated.

A second thought is this, I have a second Cerbo available. Could the main Cerbo running the stock Venus software be the prime interface/control and the second Cerbo running Venus Large be the loading Cerbo pulling information off of the system for processing? I believe that all of the information I use for load shed and grid input load control is from the Multiplus II units. Not being a programmer, I do not know if this is an innovative idea, or a nightmare. I was doing some research and saw that the information that I need to make decisions and write changes are available on the Modbus TCP.

Once again, thanks for the suggestions and any thoughts are welcome.

@nesswill @JohnC @pwfarnell

I want to thank you all for the suggestions and help. The warm up and cool down times simply do not unload the generator. I have watched the Gui report that the system was in warm up mode of cool down mode while also reporting that the generator was pulling between 50% to 75% load.

I now have a new issue that has appeared in the last two days that I must resolve.

1 Like

@alexpescaru @pwfarnell @KamFlyer
I want to thank you all for weighing in. As much as I had hoped that it would not be true, I believe that the re-booting issue was completely due to my running the Large Image and Node Red on my Cerbo.

Below is results of the ‘uptime’ command. Before putting in what I believe to be a clean install, the longest I had between reboots was about 12 hours. As can be seen below, I am over 3 1/2 days without a reboot.

I now have a new problem that has cropped up and I will search to see if it is being discussed and if not I will start a new thread.

I still need the load shedding routines that I used Node Red to monitor, and will, therefore, be looking a RaspberryPi or similar CPU system.

root@einstein:/etc# uptime
17:46:19 up 20:47, load average: 0.12, 0.55, 0.74
root@einstein:/etc# uptime
20:43:53 up 23:45, load average: 0.97, 0.91, 0.77
root@einstein:/etc# uptime
20:51:35 up 23:53, load average: 1.41, 1.09, 0.88
root@einstein:/etc# uptime
20:52:58 up 23:54, load average: 0.47, 0.87, 0.82
root@einstein:/etc# uptime
21:13:24 up 1 day, 14 min, load average: 0.39, 0.75, 0.89
root@einstein:/etc# uptime
21:21:14 up 1 day, 22 min, load average: 1.93, 1.46, 1.09
root@einstein:/etc# uptime
21:22:51 up 1 day, 24 min, load average: 1.60, 1.46, 1.13
root@einstein:/etc# uptime
21:30:15 up 1 day, 31 min, load average: 0.95, 0.79, 0.89
root@einstein:/etc# uptime
21:47:36 up 1 day, 49 min, load average: 1.04, 0.64, 0.68
root@einstein:/etc# uptime
22:18:37 up 1 day, 1:20, load average: 1.24, 0.88, 0.71
root@einstein:/etc# uptime
22:30:25 up 1 day, 1:31, load average: 0.29, 0.66, 0.69
root@einstein:/etc# uptime
22:33:06 up 1 day, 1:34, load average: 1.36, 1.14, 0.88
root@einstein:/etc# uptime
00:32:53 up 1 day, 3:34, load average: 0.69, 0.83, 0.88
root@einstein:/etc# uptime
05:07:22 up 1 day, 8:08, load average: 2.02, 2.06, 1.52
root@einstein:/etc# uptime
06:16:43 up 1 day, 9:18, load average: 0.80, 0.61, 0.67
root@einstein:/etc# uptime
16:40:12 up 1 day, 19:41, load average: 0.42, 0.46, 0.59
root@einstein:/etc# uptime
01:16:21 up 2 days, 4:17, load average: 1.93, 1.07, 0.82
root@einstein:/etc# uptime
01:52:05 up 2 days, 4:53, load average: 2.49, 2.12, 2.00
root@einstein:/etc# uptime
20:09:12 up 2 days, 23:10, load average: 0.70, 1.13, 0.90
root@einstein:/etc# uptime
23:14:16 up 3 days, 2:15, load average: 1.12, 0.91, 0.78
root@einstein:/etc# uptime
23:27:19 up 3 days, 2:28, load average: 0.28, 0.56, 0.70
root@einstein:/etc# uptime
23:40:13 up 3 days, 2:41, load average: 0.91, 0.68, 0.66
root@einstein:/etc# uptime
23:42:34 up 3 days, 2:43, load average: 0.71, 0.72, 0.68
root@einstein:/etc# uptime
05:05:07 up 3 days, 8:06, load average: 2.62, 1.45, 0.90
root@einstein:/etc# uptime
14:47:45 up 3 days, 17:49, load average: 0.56, 0.79, 0.88
root@einstein:/etc# uptime
15:09:17 up 3 days, 18:10, load average: 0.34, 0.63, 0.77

root@einstein:/etc# root@einstein:~# uptime
03:00:23 up 1:53, load average: 0.22, 1.20, 1.37
root@einstein:~# uptime
17:46:31 up 16:39, load average: 1.93, 0.75, 0.63
root@einstein:~# uptime
14:27:36 up 1 day, 13:20, load average: 1.89, 2.13, 1.62
root@einstein:~# uptime
02:35:07 up 2 days, 1:27, load average: 0.56, 0.41, 0.55

As can be seen from above, the Cerbo made it 3 days and 18:10 hours before it rebooted. It has now been up 2 days and 1:27 hours. This is a huge improvement over rebooting sometime three to four times in an hour.

I am going to investigate using a RaspPi for Node Red and see if I can use Modbus TCP to make changes. I like the Cerbo but it appears to me that recent upgrades in stock software may have stretched its hardware to the limits.

The new issue was the software for the generator auto start skipping the periodic run time. It skipped it two days in a row and then started working again. What was most interesting to me is that in one part of the Gui it reported that the generator was not running, a correct report, but in another part of the Gui it reported that the generator was running.

Today my system started the generator on a SOC start. The periodic run time came and the system changed the reported reason for running as periodic run. Since the system has not reached the SOC stop point, when the periodic run time expired, the SOC condition kept the generator running as it should have.

I am going to stop updating this thread unless anyone has an additional comment or suggestion. I believe that the rebooting issue was caused by running the large image and Node Red. Just turning off Node Red did not return the unit to good operation. Reloading the stock v3.54 did.

Thanks to all.

1 Like

I thought it might be worthwhile to update the re-booting issue. Since I went back to stock Venus without the large image and Node Red, the Cerbo Gx has been more stable. But it still reboot every 24 to 72 hours. I never looked at it before so I do not know if regular reboots is normal or not. So I guess the question would be, how often should the Cerbo need to reboot when running the stock software?

Like yourself, I too had random reboots when running large os and some other mods. (Though mine were spaced days apart, not minutes or hours)

Since I disabled large os features and offloaded node red to my home assistant, the cerbo runs without issues or reboots, and cpu load is usually around 20 or lower. So in answer to your question, no the cerbo should not reboot unless there is an issue such as high heat or cpu load.

Since you already have the cerbo, I would just let the cerbo do victron things and consider running node red on a pi. Or maybe run HA on a pi or mini pc and install the node red addon.

Hi, I’m having exactly the same problem. Cerbo boots with the large version - without the large, I have no problem. I’ve already tried two Cerbo versions. Always the same. I use seriallbattery from Louisdw for the battery - do you too? I think I’m running out of hard drive space. When I use df -h, the root partition is over 95% with the large versin. Without it, it’s at 36%.
@John Mack Have you already changed the settings in watchdog.conf in any way?
jarek

It shouldn’t.
What i have theorised on one site with random reboots is there was some thing happening in the DC side causing it.
We moved the cerbo to the terminals of the inverter and haven’t had an issue since then.

The you can run into the limits of a cerbo very fast running the OS large though. So it may be a power/processing limit.

Thanks @lx for the response. Early on, I checked and then rewired my DC power to the Cerbo. Last summer, I was in Arizona during the high heat, 117 deg F, and realized that where the Cerbo was mounted might be getting too much heat. I moved the unit inside my coach and made very sure of my DC supply. BTW, my Cerbo is being fed off of my 24 vdc house bank.

When I removed the Large Image and went back to stock Venus OS v3.54, most of the reboots have gone away. But using the uptime command, the longest time I have seen is around 3 days.

This is just an observation, and my memory of exactly when the problems began to come up is hazy, but I had been running the Large Image and Node Red without problems until around January of 2024. As I recall, I did a software update to take advantage of the new warm up and cool down features when running my generator.

In the fall of 2023, I had used the Assistants available on my Multiplus II to “Ignore Incoming AC”.
I used some timer relays to implement the logic. When the Cerbo started my generator, one of my relays would delay the application of power to the generator, and when the Cerbo tried to shutdown the generator, my circuit would delay the shutdown, tell the inverters to not use AC input thereby unloading the generator, and allowing it to cool off before being shutdown.

In January of 2024, I spent two weeks out in the desert at Quartzsite with no problem. I wanted the new features of the update Venus OS and my problems began. The load shedding of power on my generator has never work from the software and I cannot determine why. I will be rewiring my timer relay solution soon. It works, but if you want to adjust it, you have to change the timing at the relays. Changing the warm up and cool down at the gui would be so much easier and nicer, but it has to work. I had asked the question elsewhere how the unloading of the generator was accomplished in software, but I have not heard back.

I like the new gui and have been using it but I am suspicious that the amount of programming may be stretching the CPU capacity of the Cerbo. In my experience, improvements in software to give new and better features tends to cost memory and CPU power. Maybe the Cerbo CPU is reaching its limits.

I thought about using a Raspberry Pi 4 running Venus Os to replace the Cerbo, but the Cerbo does make connecting all of my Victron device together very easy. So to get my load shedding routines up and running, I am using a RPI 5 running Node Red to monitor the Cerbo using modbus tcp protocol.

Anyway, thanks for the response. Have a great day.

Had also started doing that. But i have been messing with some ridiculous flows and dashboards. And it was because i ran out of resources on the Cerbo

Makes sense that heat would limit it even more as that is typical for a computer.

It is supposed to disconnect the inverter primary input relays and ignore ac in for the time specified. It does on all my installs anyway. But then i don’t have parrallel ones so no experience there

Thanks for the feedback.