Reports until 17:31, Monday 05 June 2023
H1 CAL (CSWG, ISC)
jeffrey.kissel@LIGO.ORG - posted 17:31, Monday 05 June 2023 - last comment - 18:16, Monday 05 June 2023(70150)
Modeling the Sensing Function During Thermalizing: Chunked into Arm Cavity Power Bins and GPR Fit (Thus Far, Poorly Capturing Variability)
J. Kissel

Picking up where I last left off, in LHO:69898, I had divided up the ten 4-hour segments of IFO thermalization times during ER15 into slices of average X&Y arm cavity power, or into power bins of 1.0 W. This aLOG covers the attempts to create a representative model of the collection of transfer functions during each power bin using methodology developed during O3 -- fitting the data using a Gaussian Process Regression with the same radial basis function kernel we use to fit for other unknown frequency dependence in the sensing and actuation functions.

Executive Summary: Gaussian Process Regression's cons continue outweigh its pros as a transfer function fitter with the kernel we typically use. I'm able to finagle the hyper parameters of the kernel model such that the regression output tracks the median behavior of the transfer function, but it does *not* reflect the large, measured, variability in the transfer function (which we traditionally try to cover with the Gaussian uncertainty). This may have to be "good enough," but I'm going to have to farm this out to other specialists if we need better.

Step 1: Reorganize the *real* transfer function data, and add uncertainty
Attachment 1: sensingFunction_syserror_vs_powerslices_originaldata_vs_binneddata.pdf
    As of May 24 2023, and LHO:69898, I had found *a* way to group the data into power slices and sort the 3D data arrays. However, in doing so, I had NOT saved any of the coherence-based uncertainty on the measurement points. So, as of this aLOG, I have now added the measurement, coherence-based uncertainty to each of the data points in all the collections of data. While doing so, I had some more time with 3-D python array manipulation, so I reorganized things as well. 

This collection of plots, shows all the collection of power bin's transfer functions and their associated power, now with quantified uncertainty. On the left, the bode plot of the transfer function organized in one way, and on the right, the same data is just organized differently -- but serves as a sanity check that my new organization preserves all the right data and uncertainty assignment.

Step 2: Gather some statistical intuition on the data by finding 68th, 95th, 99.7th quantiles as one way to quantify the uncertainty of the parent distribution at each frequency point. 
Attachment 2: sensingFunction_syserror_vs_powerslices_binneddata_freqhist.pdf
    One of the fundamental assumptions of a Gaussian Process regression is that the family of curves that you're given it is sampled from a -- you guessed it -- Gaussian parent distribution of curves. So... is this collection of data Gaussian?

This plot shows histograms of each frequency point's magnitude and phase within a power bin. Overplotted on top of the histogram is 
   - the (red) median, or 50th percentile, then 
   - the (darkest pink) 0.1587th,0.8413th "1-sigma" quantiles, 
   - the (medium pink) 0.9772nd,0.0228th "2-sigma" quantiles, and 
   - the (light pink) 0.9986th,0.0013rd "3-sigma" quantiles.
Of course, each bin's worth of data is diverse in number of transfer function values. Some bins only have 3 traces, and others have 170. As such, I've let python's histogram function determine the number of bins by setting histogram_bin_edges='auto'. But still, this display gives *me* a good impression of the data, and the first impressions that a Guassian Process regression is just not the right tool:
   - the distributions of transfer function values at each frequency are *not* Gaussian, and
   - the "uncertainty" of the distribution, quantified by the 1-, 2-, and 3- sigma quantiles is not symmetric. 

Alas, we proceed.

Step 3: Add "fake" data the the transfer function collections, convert from mag/phase, to complex, then to real/imaginary parts, then flatten in prep for input into GPR.
Attachment 3: sensingFunction_syserror_vs_powerslices_binneddata_vs_gprinputdata.pdf
    As we've done in O3, we want a model of these transfer functions to serve as a systematic error (with associated uncertainty) that can be "stapled" in to the overall calibration's modeled response function systematic error (with associated uncertainty). In doing so, we want this model to add systematic error only where where think the response function is changing as a function of thermalization. However, Guassian Process Regression gives you large uncertainty where there are no measured frequency points to "anchor" the fit. Further, due to the querks of the skikit learn's python implementation of GPR, (or at least the way that Craig Cahillane and then Evan Goetz figured out how to get it to "work"), you must flatten the collection of input data on to a single vector (i.e. the nTFs by mFreq complex 2D array must be reshaped into a (nTFs * mFreq) x 1 vector).

This collection of plots shows the sanity check that the addition of fake data points, and the flattening works.

The fake data points have a transfer function value of 1.0 + 0.0j i.e. a magnitude of 1.0 [(ct/m) / (ct/m)], and a phase of 0, the sensing function residual [meas/model] if model perfectly agrees with measurement. I've assigned these fake data points an uncertainty of 1e-4, or 0.01%. The *number* of fake data points required depends on the radial basis function hyper parameter "length scale"

Step 4: Explore the "standard" O3-era "radial basis function"-inclusive hyper parameter space in order to create a "satisfying" fit.
Attachment 4: sensingFunction_syserror_vs_powerslices_gprfit_lengthscale_0p25_nxtrapnts30.pdf
    The GPR transfer function kernel we continue use to describe unknonwn frequency dependence in "our" transfer function residuals since Evan added additional terms after Craig's O1/O2 work (see "Slides for follow up on action item" from G1901479) is as follows:

    F * RBF(\ell) + C

where
    F = a "ConstantKernel" function that serves to be what I call an "amount of frequency dependence" coefficient that determines how much of the radial basis function is "used"

    RBF(\ell) = a "Radial Basis" function, which is a Gaussian function of the dimensionless hyper-parameter \ell, where the function computes the correlation of neighboring data points with respect to the length scale, \ell. As we've shoved the transfer function data into this kernel function, it's computing the Euclidean distance between the neighboring, logarithmic frequency points of the real and imaginary points of the complex transfer function.
        Working through a bit of algebra, as in G2101319, the physical meaning of the length scale becomes

            delta f = f * (10^\ell - 1)

        i.e. the RBF looks for lower frequency points have tighter (in frequency) correlation.

    C = another "ConstantKernel," but in this case used as an additive constant. This is used as a representative of what an "ideal" residual should look like, i.e. a transfer function of 1.0 + 0.0j at all frequencies. 

As I've been reminded through my exploration, if you start probing for length scales, \ells, that are smaller than the log(f) frequency separation, the GPR will return a large uncertainty on the transfer function in between your data points. Thus, the number of *fake* data points needs to also start to come into play if your length scale gets small.

So, the parameter space to explore is the priors on the constant for the two ConstantKernels as well as the prior on the RBF length scale \ell.
    In the end, I used
        F = value of 0.5, with uniform probability between (0.4, 1.0). 
            I interpret this as allowing for the frequency dependence to be "all the way on" (i.e. 1.0) or a little-bit 
            less than "half" on (0.4).

        \ell = value of 0.25, with uniform probability between (0.125, 0.375). 
            This value I determined entirely empirically, by trying values between [0.125, 0.25, 0.5, 0.75, 1.0, 2.0]

        C = value of 1.0, with uniform probability between (0.9, 1.1).

        nExtraPoints = 30. 
            One can safely run with 15 points on length scales down to 0.5, but below 0.5 the length scales become comparable 
            to the frequency separation in these points, so the frequency space *between* the extra points starts to become 
            prohibitively uncertain, which spoils the "frequencies above the thermalization region shall not impact the greater 
            systematic error budget" rule.

This plot shows the Gaussian Process Regression fit against the Median and 1-, 2-, and 3- sigma quantiles of the 4 *real* data frequency points, as well as the original data.
From here, we arrive at our executive summary: I'm able to finagle the hyper parameters of the kernel model such that the regression output tracks the median behavior of the transfer function, but it does *not* reflect the large, measured, variability in the transfer function (which we traditionally try to cover with the Gaussian uncertainty) -- i.e. some of the way there, and may be good enough. For the fits that are the worst, the lowest power bins, I think this is an OK compromise if we recall that the IFO stays in these lowest power bins for the *least* amount of time given the "exponential" nature of the thermalization, and sometimes the IFO is past these arm powers before we even hit "NOMINAL_LOW_NOISE."

In fact, over my years of asking for One Transfer Function Fitter to Rule Them All, I've found that this exact problem is the hardest part about transfer function fitting in LIGO -- finding a fit or model that appropriately describes the *uncertainty*, or in our application here, the *variability* in the transfer function. I guess it should be a surprise that I wasn't able to figure out an entire field of computational metrology (or loosely, parameter estimation) in 1.5 weeks.

So, at this point, 
    (a) We need to install *something* in the overall systematic error budget, and this is a good start,
    (b) There's talk of reducing the power and reverting the SRCL offset, perhaps soon rendering this data set "no longer representative,"
    (c) I need the help of others to do better in terms of capturing the variability (and maybe that means exploring different kernels, or switching to zpk-based fitting like IIRrational or Ethan's ZPK Fitting), 
    (d) We could also try to do all of this same process for the *response function,* rather than the sensing function (which *may* be quicker than 2 months worth of work because I already have examples of how to manipulate the data which was a huge time suck for me)
    (e) I need to move on with my life and make progress elsewhere

So, I hand it off to my fellow calibrators, transfer function fitters, and the world.
Non-image files attached to this report
Comments related to this report
jeffrey.kissel@LIGO.ORG - 17:52, Monday 05 June 2023 (70155)
The saved results for each GPR of each power level live in 
/ligo/svncommon/CalSVN/aligocalibration/trunk/Runs/O4/H1/Results/FullIFOSensingTFs/Thermalization/
    supplemental_sensing_gpr_405_kW.hdf5
    supplemental_sensing_gpr_406_kW.hdf5
    supplemental_sensing_gpr_407_kW.hdf5
    [... hidden for brevity ...]
    supplemental_sensing_gpr_434_kW.hdf5
    supplemental_sensing_gpr_435_kW.hdf5
    supplemental_sensing_gpr_436_kW.hdf5

where, I note that the power level listed in the filename is the *upper* limit of the power level bin, so 
    - the first bin's file, supplemental_sensing_gpr_405_kW.hdf5 should be used for all powers from 0 kW to 405 kW, then 
    - the next 31 power bins should be used from when the power is 1 kW less than that in the file name up that in the file name, i.e.
        supplemental_sensing_gpr_434_kW.hdf5 should be used when the average arm power is between 433 and 434 kW.

Recall, the algorithm to compute that power against which to compare the file is, for the two minutes surrounding your start_gps of choice, take the two 16 Hz channels worth of X and Y arm powers, average them to create a 16 Hz mean arm power, then take the median of the average across the two minutes. The following python code snippit should get you there.

from gwpy.timeseries import TimeSeriesDict as tsd

armPwrChList = ['H1:ASC-X_PWR_CIRC_OUT16',
                'H1:ASC-Y_PWR_CIRC_OUT16']

pwrbinlowerlim = 404
pwrbinupperlim = 436 
pwrbinstep = 1
pwrbins = [[n,n+pwrbinstep] for n in range(pwrbinlowerlim,pwrbinupperlim,pwrbinstep)]
pwrbins[0] = [0,pwrbinlowerlim+pwrbinstep]

stride = 2*60
start = start_gps
end = start_gps + stride

pwrdata = tsd.get(armPwrChList, start, end, frametype='R',verbose=True)
    
# Compute the average (np.mean) of the two arm powers at each 16 Hz data point,
# then compute the np.median across the stride
pwr = np.median(np.mean(np.array([pwrdata[armPwrChList[0]].value, 
                                  pwrdata[armPwrChList[1]].value]),
                        axis=0))

for thispwrbin in pwrbins:
    if np.logical_and(pwr > thispwrbin[0], pwr <= thispwrbin[1]):
        print('You should use the {} kW file'.format(str(thispwrbin[1])))
jeffrey.kissel@LIGO.ORG - 17:58, Monday 05 June 2023 (70156)
On the choice of length scale and nExtra Points

Just so folks get a feel for how I've made the choice of length scale and nExtra points -- and really, a better demonstration of the empirical process of choosing the hyperparameters in the kernel, see the attached files, which show how the fit result changes as a function of hyper parameter \ell, as it ranges from the values of [0.12, 0.25, 0.5, 0.75, 1.0]. Note, as discussed above, the lower values of \ell need more extra frequency points in order to reduce uncertainty between points. To see this, compare the pdfs from \ell = 0.25 with either 15 points or the final answer, 30 points. For \ell larger than 0.25, I've used 15 points extra points in each.
Non-image files attached to this comment
jeffrey.kissel@LIGO.ORG - 18:16, Monday 05 June 2023 (70157)CSWG
The code that produces this data lives in gpr_sensing_darm_comb_pwrslices.py, committed at version with git hash ba6d60.

Start there if you want to rip out the GPR fitting and plot making (everything below Line 306) to start at the point where the data is grouped into bins around line and below Line 260.

Be forewarned, it takes ~5 minutes to run one round of GPR fitting with \ell = 0.25 and 30 extra points. It takes ~2 minutes to fit it with \ell = 0.5 and higher with 15 extra data points.
Also be warned that there's a lot of plotting code commented out; those collections of plots were to produce the plots from steps 1 through 3 above, and not needed for producing the "final answer" plots of the fits.

I've committed the raw text files that one needs to run the code and process the data to 
    ^/trunk/Runs/O4/H1/Measurements/FullIFOSensingTFs/Thermalization
        sensingFunctionResidual_meas_over_model_wTDCFs_*_freqfreq.txt
        sensingFunctionResidual_meas_over_model_wTDCFs_*_mag.txt
        sensingFunctionResidual_meas_over_model_wTDCFs_*_pha.txt
        sensingFunctionResidual_meas_over_model_wTDCFs_*_pwrpwr.txt
        sensingFunctionResidual_meas_over_model_wTDCFs_*_timetime.txt
        sensingFunctionResidual_meas_over_model_wTDCFs_*_unc.txt
where the * conveys the different GPS start and stop times of the 10 different segments from LHO:69796.