Hey folks at GKE
As the GKE v1.25 is nearing its end-of-life, it is important to understand the impact of the upgrade before proceeding with the cluster upgrade operation.
I have come across an issue in which some of our Java-based workloads are beginning to throw OOMKilled errors. This can be confirmed by inspecting Cloud Logging, where you will find logs containing the following message:
Memory cgroup out of memory: Killed process 1982362 (java)
If you have observed similar logs, you may want to think about increasing the resources allocation if it is possible for you.
Furthermore, if you check the GKE release notes here, The new node pools created on version 1.26 will automatically utilize the cgroupv2 resource management subsystem, allowing for the latest container resource management capabilities. Node System Configuration can be employed to toggle between cgroup settings.
The current mitigation strategy involves the following options:
- The nodesystem configuration can be used to change the Linux configuration on your GKE nodepools to use cgroupv1.
- The long-term plan is for the underlying Java version to be upgraded to the supported version of 11.0.16 as per this, enabling support for cgroupv2.
I hope this was helpful.