Memory usage slowly climbs to 99%

johncrummy · April 21, 2021, 8:30pm

Hello, I have an AXC F 2152 running 2021.0.3 LTS on which, according to both PLCnext Engineer and htop, the memory usage increases from around 45% on startup to 99% over a period of about 10 minutes. Once it reaches 99%, it wavers a bit between 98% and 99%. According to htop, the top memory users are three instances of /usr/sbin/syslog-ng, each of which are using up about 21.6%. The rest is many instances of Arp.System.Application Starting/stopping the PLC runtime doesn’t affect this at all, only a reboot seems to reset the memory usage. Is there any advice on figuring out why this is happening, or a way to turn down the logging level? Thank you, John Crummy

johncrummy · April 21, 2021, 8:38pm

I should have mentioned that this seems to be causing the PLC to reboot periodically, about once every 20 minutes or so.

johncrummy · April 21, 2021, 11:00pm

After some more testing, the problem doesn’t seem to be with syslog-ng. On subsequent reboots, the problem remains, however syslog-ng is not the main user of memory, Arp.System.Application is. Also, the time until the memory hits 99% full is on the order of 20-30 minutes, no the 10 minutes I indicated above.

oliverplcnextteam · April 22, 2021, 7:24am

Hello John, you can check the contents of the logs to see what causes the high amount of memory. → are you writing large amounts of Data to log? check the Size of your Output.log? ls -lah /opt/plcnext/logs/Output.log Did you configure the Datalogger to write not often enough to file? please provide the output of this command: ps --sort -rss -eo pid,pmem,rss,vsz,comm | head -16 Check the size of the RAM disks with : df -hal tpmfs 245M 101M 145M 41% /var tmpfs 245M 160K 245M 1% /run if it is the RAM disk that is occupied by to many files find the large files on RAM. du -a /var/ | sort -n -r | head -n 20 du -a /run/ | sort -n -r | head -n 20 If it realy is the Arp.System.Application that is allocating a lot of ram my guess is that there is a memoryleak in one of the Libraries you are using. Are you using your own C++ code? which libraries are you using? kind regards, Oliver

johncrummy · April 22, 2021, 3:30pm

Hi Oliver, For Output.log, it is 8.7M at the moment. Command outputs: $ ps --sort -rss -eo pid,pmem,rss,vsz,comm | head -16 PID %MEM RSS VSZ COMMAND 1518 21.3 107040 672052 Arp.System.Appl 1213 10.9 54788 82056 syslog-ng 1558 7.7 38776 219912 Arp.System.Appl 1559 3.3 16964 153312 Arp.System.Appl 1452 1.9 9536 11676 nginx 1525 1.5 7592 40416 rauc 1450 1.3 6748 8928 nginx 1451 1.3 6724 8924 nginx 1449 1.3 6584 8800 nginx 1453 1.2 6432 8684 nginx 5109 1.1 5764 8176 sshd 1207 0.9 4516 8932 ntpd 5113 0.8 4120 5520 bash 1232 0.7 3960 123440 lttng-sessiond 5112 0.7 3924 8176 sshd $ df -hal Filesystem Size Used Avail Use% Mounted on tpmfs 245M 245M 0 100% /var/volatile /dev/root 365M 275M 67M 81% /media/rfs/ro /dev/mmcblk0p6 488M 182M 271M 41% /media/rfs/rw overlay 488M 182M 271M 41% / /dev/mmcblk0p6 488M 182M 271M 41% /media/rfs/internalsd proc 0 0 0 - /proc sysfs 0 0 0 - /sys debugfs 0 0 0 - /sys/kernel/debug configfs 0 0 0 - /sys/kernel/config devtmpfs 245M 412K 245M 1% /dev tmpfs 245M 120K 245M 1% /run /dev/mmcblk0p7 27M 431K 25M 2% /etc/device_data devpts 0 0 0 - /dev/pts cgroup 245M 0 245M 0% /sys/fs/cgroup cgroup 0 0 0 - /sys/fs/cgroup/cpuset cgroup 0 0 0 - /sys/fs/cgroup/cpu cgroup 0 0 0 - /sys/fs/cgroup/cpuacct cgroup 0 0 0 - /sys/fs/cgroup/blkio cgroup 0 0 0 - /sys/fs/cgroup/memory cgroup 0 0 0 - /sys/fs/cgroup/devices cgroup 0 0 0 - /sys/fs/cgroup/freezer cgroup 0 0 0 - /sys/fs/cgroup/net_cls cgroup 0 0 0 - /sys/fs/cgroup/net_prio cgroup 0 0 0 - /sys/fs/cgroup/pids cgroup 0 0 0 - /sys/fs/cgroup/rdma name=systemd 0 0 0 - /sys/fs/cgroup/systemd $ du -a /var/volatile | sort -n -r | head -n 20 250864 /var/volatile 250844 /var/volatile/log 150620 /var/volatile/log/auth.log 99968 /var/volatile/log/auth.log.2 68 /var/volatile/log/syslog 64 /var/volatile/log/messages 44 /var/volatile/log/user.log 24 /var/volatile/log/kern.log 16 /var/volatile/log/dmesg 12 /var/volatile/tmp 8 /var/volatile/tmp/AxlC.config.svc 8 /var/volatile/log/debug 8 /var/volatile/log/boot 4 /var/volatile/tmp/tmpfile 4 /var/volatile/log/wtmp 4 /var/volatile/log/nginx/pxcapi.error.log 4 /var/volatile/log/nginx 4 /var/volatile/log/lastlog 4 /var/volatile/log/error 4 /var/volatile/log/daemon.log I’m not using the datalogger at all, and the only library I’m using is the “PLCnext Controller” library. No C++ running. I did have a custom program running in Linux userspace, but I’ve kept it from running for now, and the problem remains. The PLC did not reboot last night (uptime is around 18 hours now), however the memory use remains at 99%. Looking at the contents of the /var/volatile folder, it seems that the auth.log (and auth.log.2) are massive, and quite possible what are causing the problem. The one thing I was doing was using the HTTP REST interface to monitor things, and I see that the auth.log file seems to log every single request I made to the REST interface. Is that normal? Is the a way to disable that logging? For the REST interface, I’m using the variables endpoint and polling regularly, should I be doing something different? I see that there is support for sessions, but the documentation doesn’t really make it clear what the benefits of that are, except that you attach a session ID to everything. Thank you, John

johncrummy · April 22, 2021, 3:37pm

I will add that by simply deleting auth.log and auth.log.2 my memory usage dropped immediately to 84%.

johncrummy · April 23, 2021, 3:52pm

I’ve disabled logging in nginx by modifying the access_log line in the nginx.conf file by commenting out the existing line and then adding a line to disable logging. #access_log syslog:server=unix:/dev/log,facility=auth,tag=nginx,nohostname,severity=info combined; access_log off; So far my memory use has remained constant (besides the usual up and down bit), even when running our custom app. The PLC has also remained responsive. I think this has solved the problem, but I will check over the weekend to make sure it remains stable for multiple days. Thank you, Oliver, for pointing me in the right direction.

oliverplcnextteam · April 26, 2021, 8:07am

Hello John, good to hear that you found a solution. Can you per chance show me some of the typical entries that cause the issue for you? I hope we can reduce this impact in future. kind regads, Oliver

johncrummy · April 26, 2021, 3:23pm

Oliver, First off, I can confirm that the PLC stayed alive over the weekend (uptime is 2 days, 20 hours), and memory usage is steady, so this definitely looks solved. The log entries are all of the form: Apr 26 08:20:39 axcf2152 nginx: 10.23.16.10 - - (26/Apr/2021:08:20:39 -0700) "GET /_pxc_api/v1.2/variables?pathPrefix=Arp.Plc.Eclr/&paths;=(LONG LIST OF VARIABLE NAMES) HTTP/1.1" 200 9690 "-" "Go-http-client/1.1" NOTE: The ‘(’ and ‘)’ in the snippet above are actually ‘[’ and ‘]’. I had to change them otherwise my reply wouldn’t show up. I don’t know how to escape the square brackets. There is one for each request to that endpoint from our tool (written in Go, as you can see). Because this tools polls frequently, these entries add up quickly. John

oliverplcnextteam · April 27, 2021, 7:13am

Hello John, thank you for the details. Okay so your client is polling a long list of variables. Thats why the log file is flooding so quickly. We will fix this log mechanism in the next version. There are some features in the rest interface that optimize this pollings Sessions + Variablegroups. → You register a group of variables and then only poll a specific group. (this also circumvents the issue of limited) Thats what the ehmi does.

johncrummy · April 27, 2021, 4:57pm

Thanks, Oliver. I will have to take a look at using sessions and groups when I get a chance. My list of variables changes depending on what a user is looking at at any one time, which is why I didn’t use them initially, but I think with a bit of work I can implement the groups.