Displaying report 1-1 of 1.
Reports until 15:34, Wednesday 05 July 2023
H1 CDS
david.barker@LIGO.ORG - posted 15:34, Wednesday 05 July 2023 (71080)
Mysterious restart of CW hardware injection process (psinject) Tue 04 July 10:32 PDT

Tony, Jonathan, Erik, Keith T, Mike T, Dave:

FRS-28,463

Executive Summary:

psinject spontaneously restarted itself due to an out-of-memory problem on h1hwinj1 yesterday morning.

This was a result of the upgrade of the system on Tue 27 June 2023. It takes 7 days to run out of memory.

In the short term (over next few days) we will monitor the memory usage, and restart the psinject process during a lock_loss event to make it through to next week.

Next Tuesday we will downgrade to the original version of the LAL pulsar binary.

Details:

On Tuesday 4th July 2023 at 10:32 the psinject process on h1hwinj1 cleanly shutdown and was then restarted by monit. The shutdown and restart of this process involves ramping the output of the INJ_CW filtermodule on h1calinj, which takes H1 out of observation mode.

Looking through the logs we found that h1hwinj1 had slowly ran out of memory over the 7 days since the upgrade of the code on Tuesday 27th June 2023.

We did a quick estimate of the memory leak rate between 10am and 2pm today and came up with 1MB/min. Starting with 10GB of free memory, at this rate the memory is exhausted in about 7 days, which is what we saw.

Last Tuesday several things were upgraded on h1hwinj1:

1. psinject code was changed to use gpstime instead of tconvert (makes LHO = LLO)

2. python3 version increased from 3.4 to 3.6 (makes LHO = LLO)

3. lalapps was upgraded from 6.25 to 9.2 (not done at LLO)

We think it is most probably the lalapps upgrade which is causing the memory leak. LLO did the first two upgrades two weeks ago on l1hwinj1 and are not seeing any memory issues.

In lalapps 6.25 /usr/bin/lalapps_Makefakedata_v4 is a 70K binary. In lalapps 9.2.1 is is a launcher script, spawning /usr/bin/lalpulsar_Makefakedata_v4 which is a 53K binary. It also issues the warning

"WARNING: 'lalapps_Makefakedata_v4' has been renamed to 'lalpulsar_Makefakedata_v4'"

Actions:

We will restart psinject before it stops itself next Tuesday during an appropriate lock loss time.

Next Tuesday we will downgrade LAL from 9.2.1 to 6.25.1 so LHO and LLO are identical.

Displaying report 1-1 of 1.