As noted in my maintenance summary alog, while FW0 was being restarted this morning FW1 spontaneously restarted itself at the same time. This is the first time we had seen this, and as far as we can tell it is a complete coincidence. Previous spontaneous FW restarts have happened at random times typically within 30 minutes of the orginal restart.
I have written a python script to report the individual missing frame files resulting from a FW restart, and reporting an error is these times coincide between FW0 and FW1.
Output for today:
FW0 Missing Frames [1374951040, 1374951104]
FW1 Missing Frames [1374950784, 1374950848, 1374951040, 1374951104]
ERROR: no frame written for GPS 1374951040!!!
ERROR: no frame written for GPS 1374951104!!!
h1digivideo3, the computer serving thew flicker cameras, ran out of memory. Camera 15 was taking up the bulk of the memory.
Before any action could be taken, the OS kill the Camera 15 server, which later restarted with a much smaller memory footprint, resolving the immediate issue.
Logs for Camera 15 were filled with queue overflow messages for both udpsink, the streaming portion of the server, and appsink, the centroid calculation. These might indicate the cause of the memory leak.
Other camera servers are also using large amounts of memory. We might want to restart these servers every Tuesday.