CapRover unavailable due to high load (?)

It’s the second time that this happens to me so I want to write it down so the next time that it will happen I know what to do (I’m sure it won’t be too far in the future…).

The problems started when I installed another one-click app (plausible) that, I guess, started using/requesting a lot of memory and the droplet did not have enough. As soon as the installation finished the CapRover UI started to show many 500 errors. I then tried to access another service I hosted and a nice 500 page was always showing up.

I first decided to give it some time to see if things could self-resolve but actually, it got worse. The nice page was replaced by the nginx default one.

At this point, the next thing I tried was to turn off and on the droplet and see if a restart could make things better. It didn’t work at all: nothing started and I was not able to access any service.

I was pretty sure the culprit was the newly installed service and I thought that removing it should have helped. But since CapRover UI was not starting I had to login into the droplet and started to play with the terminal.

The first thing I did was running docker ps to see which container was running. Surprisingly all the docker were up and running. I was not convinced and I re-ran docker ps. The command was very slow to print the output until after a few seconds it simply answered with Segmentation fault (this error reminded me of the university period when I was studying this). At this point I wanted to kill some of the containers but running any docker command was impossible, they all failed with the same error.

I thought I was doomed and lost my nice CapRover setup forever but I had one last idea: to restart the droplet again and try to kill some containers as soon as logged into the terminal so hopefully not all the containers were running yet and I could have some time to unblock the situation. I went to the Digital Ocean portal and restarted the droplet again and as soon as it showed ready I opened the terminal and ran:

  1. docker service ls: to list the services configured in docker swarm. docker ps simply shows the ones running and even if you run docker kill that container will be restarted when the swarm restarts because it’s still registered.
  2. docker service rm NAME: to remove the service from the swarm and kill the running container. I executed this for the 4 services that were installed by the one-click app. After that I re-executed the docker service ls and docker ps to double-check they were not showing up.

After this, the system started to behave again correctly and all the services were accessible. My CapRover was saved! Only one thing remained to be done: from the apps I had to remove the apps I had removed by running docker service rm because they were still showing up in CapRover UI. I believe because they were still written in the CapRover .json configuration file.