Support : Knowledge base

Knowledge Base

Welcome to OPAL-RT’s Knowledge Base

OPAL-RT offers a repository of support information for optimal use of its technology.

Loading…

Please note that OPAL-RT knowledge base is not fully optimized for mobile platforms.

For optimal experience, use a desktop computer.

Reference Number: AA-02303// Views: AA-02303// Created: 2024-04-11 06:44:07// Last Updated: 2024-04-11 07:42:03
Q&A
[EXata] 'Warning: Total Isolation Lost Events (Due To Interrupt / Remote)' Explained

Question:

When I reset my EXata / RT-LAB setup, I see the following warnings:


WARNING: Total isolation lost events: XXX

WARNING: Total isolation lost events due to interrupt: YYY

WARNING: Total isolation lost events due to remote: ZZZ



What does it mean and is it a problem for my simulation?


Answer:

These prints appear when the operating system used is OPAL-RTLinux, and when there is an external process (i.e. EXata in this case, but it could be a custom library or custom code) that is doing some system calls that 'breaks' the XHP isolation of the CPUs.

In short, as long as there is no overrun, this is not an issue. The system call had time to be handled within one time-step, with the rest of the tasks.

Note that these system calls are done by EXata during the load and during the mapping of the nodes. Therefore, it is possible to avoid those warnings by following this loading sequence:


Load RT-LAB

Load EXata

Execute EXata 

Execute RT-LAB


And the following reset sequence:


Pause RT-LAB

Reset EXata

Reset RT-LAB


This way, RT-LAB is paused during the system calls from EXata, so they will not be recorded. 

Note: the same is happening under the hood for HYPERSIM, but since there is no pause function, it is not possible to avoid those isolation losses (which may or may not translate into overruns).

As of April 2024, Keysight is working on by-passing this issue, but no release date has been announced for the fix.


Advanced Debugging

WARNING: This is for advanced users only - use at your own risk!

NOTE: Before getting started, the following procedure is to change the isolation mode to debug, which will run until the first isolation loss. When there is one, the isolation will break and the model will stop running (i.e. it will still be in execute mode in RT-LAB, but it will be one big overrun until we press reset).


It is possible to find out which process is creating the first isolation loss by doing:

1- Make sure no models are running on the simulator

2- Connect to the simulator via SSH

3- Change directory by running the commnad:

cd /sys/kernel/task_isolation

4- It is possible to check the isolation options by typing:

cat available_task_isolation

5- Change the isolation to isolate_debug by running:

echo isolate_debug > current_task_isolation


Note: This change is non-permanent. That means, if you reboot the simulator, the current_task_isolation will revert to isolate_keep.


6- run the command (and let it run, do not type CTRL + C):

journalctl -kf

7- Start the RT-LAB model, then EXata

8- At some point, you should see the error in the RT-LAB display:

ERROR: [x]: Kernel task isolation lost



9- In the display from journalctl -kf, you should now see which process created the isolation lost. In this example, it is EXata:

dummy_1_sm_comp/4397 (cpu 1): task_isolation lost due to IPI function by exata/5073 on cpu 15



Alternatively, the same info should also be printed in dmesg. With this info, you can contact the persons in charge of the external process to discuss how to work around that.

10- Since the model is not running anymore, you can reset the RT-LAB model, and you will see: