On Tue 16th Jan 2018 I stopped the h1ngn model and removed it from h1oaf0. At this time the h1oaf model started running long (left hand plot in attachment). Note that h1oaf was not restarted at this time.
This morning I started with h1iopoaf0 and h1oaf as the only models running, h1oaf was running at pre_ngn_removal cpu usage with little deviation. After starting the other models on this computer h1oaf ramped up into the 61uS range with large deviations causing TIM errors.
Looking at the model/core distribution, the first CPU physical chip (6 cores non-hyperthreaded) were fully utilized until h1ngn was stopped, leaving a "hole" in core 4. I changed h1pemcs.mdl to move it from specific_cpu=7 to specific_cpu=4. After restarted all the models this appears to have fixed h1oaf's issues (right hand plot of attachment). It is not immediately clear why.
Here are the core layouts:
cpu0
core | model |
0 | General Linux |
1 | h1iopoaf0 |
2 | h1calcs |
3 | h1oaf |
4 | was h1ngn, empty 1/16-2/27, now h1pemcs |
5 | h1susprocpi |
cpu1
core | model |
6 | empty |
7 | was h1pemcs, now empty |
8 | h1tcscs |
9 | h1odcmaster |
10 | empty |
11 | empty |