question

xzv avatar image
xzv asked

[Ask for Review] Seamless ESS Time-Series Data Collection on External Cloud

I'm looking for feedback on my overall architecture design, and also MQTT configuration specifically.

I already tested most of the details, but any suggestions for improvement or considerations are welcome:

I own several Cerbo GX for a ESS installations. My custom logic on each Cerbo for controlling the Multiplus inverter runs on Node-RED.

On the cloud, I have a time-series database where I want to store long-term, seamless data of all sites. Similar to what Victron shows in the VRM portal, like charge power, voltage levels and so on. But I also want to store time-series data from my own Node-RED logic. And I want to keep data persistently long-term without loosing or aggregating data, so not loosing messages is important.

My current implementation for storing data is this: I let the cloud server have a MQTT client which subscribes to several specific topics on Victron's MQTT brokers (mqttXX.victronenergy.com). When VenusOS or my Node-RED code publishes a message locally through MQTT, it gets forwarded to Victron's MQTT brokers which then deliver the message to my data collector.

My concerns are now if I'm able to configure the seamless QoS levels (1 or 2) and potential issues with disconnects, so that a disconnect between the Cerbo with Victron, or Victron and my server doesn't harm the gapless time-series.

bildschirmfoto-2024-01-29-um-114122.png


Any feedback on this approach? I also open to suggestions for alternative solutions or best practices that might address my concerns.

cerbo gxNode-REDMQTT
2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

6 Answers
matt1309 avatar image
matt1309 answered ·

Hi @XZv

Looks good to me. In a company you would maybe suggest some data checks in this scenario. This will not only help ensure data is correct but it may also highlight issues with the process that cause data to not match. The obvious one you mention being network issues but having a data check might lead you to spot another cause where the data isn't being received by cloud correctly.

In theory if the network goes down the data is stored on gx device and then sent to VRM at a later point. However in case that fails or there's some other issue you could write an interval automated data check.

Just as an example say the check runs in node red locally.

Once per x interval (you define x): node red downloads cloud data for the last interval and compares cloud data to locally stored data.

I know you mentioned node red will store some data locally, you would just need to make sure it stored at least the data between current time and last time it performed this task ie this intervals data. If memory is a concern in theory after this check you could release the locally stored data as you've confirmed it matches cloud data.

In terms of compute power for the data comparison i would set the interval to be quite small if you're running on cerbo, so the volume of data it's comparing at once is relatively small.

If you do longer periods between the data checks say once per day you might be better off running the check on another local system or even the cloud side (might be a pain cloud side for getting local data. ie managing firewalls etc)

You could then have different outcomes based on the reconciliation of the data sources. ie say cloud is null at specific time but local has a value then override cloud with local.

However if cloud and local are not null but are different then maybe write a it so node red sends a notification that sends both data sources for that interval to a human to check the data and investigate the cause of the difference.




2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

xzv avatar image
xzv answered ·

@matt1309 Many thanks for your answer. I think it's a great idea to perform some data checks already at the site, so I can prevent sending inconsistent data (formats).

What I don't understand is your point of downloading and comparing data on the GX. Where do you see the benefit compared to using MQTT's built-in QoS functionality? If I set it to 1 I can guarantee that the data is sent out at least once, and if I set it to 2 I can rest assured that the receiver confirmed the reception.

Do you have any feedback on using Victron's MQTT infrastructure as a relay? Do you have any experience with using QoS 1/2 on the GX? (Victron's messages are QoS 0).

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

matt1309 avatar image
matt1309 answered ·

I was trying to think of an overarching control function to ensure data in both sources matches if it's extremely critical but i see your point qos should solve it.

My knowledge of QoS is limited so please correct me if I"m wrong here.

But wouldnt QoS fail if retention policy is reached? say a long network outage so retention period reached? And I assume the retention policy is configured server side ie VRM MQTT server. Tbf it's probably a niche scenario as i imagine the retention period even if small is likely longer than most network outages.

Another niche scenario but if VRM MQTT gets restart and for whatever reason telegraf doesnt reconnect properly. In this scenario the VRM MQTT wont be storing data to send to telegraf on reconnect as the initial subscription hasn't occurred for the qos requirement to be transfered?


I can't help much on the QoS implementation at victron however their code is available on github. Probably quite involved to fish through all the python to ensure QoS level is added everywhere from dbus to local MQTT and also to VRM MQTT and then also checking adding this doesnt lead to exceptions being thrown anywhere else:

https://github.com/victronenergy/dbus-mqtt/tree/master


Would it maybe be easier to capture the data in node red directly from dbus and then send the data from node red. At least then you'd have complete control over data and the way it's sent/QoS.


You may have to setup your own MQTT server cloud side (or use https endpoint and design you're own qos)


2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

hominidae avatar image
hominidae answered ·

IMHO running telegraf in the cloud, connecting to the on-premise broker/GX is a flaw in the concept.

telegraf shoud run local, near the on-premise broker, then push the data upon collection into the cloud.

reason: less risk for potential data loss.

Assuming that the local connection between broker and telegraf has a higher availability that the internet connection between premise and cloud, running the telegraf locally would enable the buffering capabilities of telegraf if internet connection has a downtime.

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

xzv avatar image
xzv answered ·

@Hominidae @matt1309

It's a good idea to consider using the Telegraf plugin locally and send it to Influx directly without the MQTT broker. Influx can buffer a lot of messages persistently as well, AFAIK.

The reason I didn't consider that (yet) was: how would you manage the secrets on the Venus devices so they are able to send data to InfluxDB? I don't own all of the devices, so I consider these edge devices as "compromised". I don't want to build up my own authentication infrastructure with key handling, so I thought it is convenient to let Victron handle that off-the-shelf and then just connect my cloud to their cloud and no secrets leave my controlled space. Do you have any thoughts on that? Maybe I miss something. :) Any ideas and thoughts are welcome.

2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.

matt1309 avatar image
matt1309 answered ·

Hi @XZv

Do you have some info about the wider system/use cases. It might help guide suggestions/what's best to compromise on.

It seems like your original solution would work but if you need QoS then it might be a pain to edit the python drivers on each GX device you want to integrate into your cloud infrastructure. Especially if you have limited/no access to GX device (ie explaining to client how to install/edit python drivers could be painful)


If you dont have access to the GX devices or their networks and only via VRM then it seems VRM MQTT (likely without QoS) is your best option. If live/real time data isnt needed on the cloud server then maybe periodic queries to VRM api could be a decent route instead? I've never used VRM cloud so not sure how granular the data is on there.


If you NEED QoS for controls similar to QoS then I think you're going to need access to the GX devices or their network or at the very least access node red to build out something yourself.

Easiest/quickest I can think of probably host your own MQTT with QoS enabled on cloud and have a node red setup to capture dbus data and post to your MQTT rather than victrons internal/VRM one.

Similar to what victrons python dbus-mqtt is doing but in node red.

Then when you want to onboard a client to your cloud you just need to talk them through installing your node. As i imagine this is slightly easier than talking them through installing custom python drivers.

You'll need to setup ssl for MQTT and relevant authentication etc on server side but feel like this is the least taxing solution if you need QoS.

If live/real time data isnt needed then you could also periodically send the data to a http end point you make on your cloud and just encrypt with ssl and basic http auth. However you'd then need to build infrastructure server side to receive, format and then import the data into influx rather than relying on telegraf to handle that for you.



2 |3000

Up to 8 attachments (including images) can be used with a maximum of 190.8 MiB each and 286.6 MiB total.