migrating uptime kuma to the cloud

...or how can i even watch the watchers?

migrating uptime kuma to the cloud
Photo by Kiril Georgiev / Unsplash
đź“ť
welcome to the first in a series of incredibly sporadic blogumentation posts, a concept is borrowed from Jamie Tanna. i'll do a thing and then document it using a blog post to write it down so i don't forget and can share it more easily. please enjoy.

despite my wide-eyed naivety, this week i realised that i unfortunately can't self-host everything. a problem with my flat's fusebox meant i had not one but two electrical (and therefore service) outages.

i, of course, do expect services to get knocked offline sometimes. i know my homelab doesn't have the availability of a commercial data centre, and i'm certainly not stupid enough to self-host anything that import– fuck's sake, i self-host uptime monitoring.

what's worse is that uptime kuma, my uptime monitoring system, appears to assume an absence of data simply means that a service was up, so i don't even know how long either outage lasted. fuck.

there has to be a better way!

well, there is. introducing... the brand-new, innovative concept of someone else's computer™️ (more commonly known as the "public cloud"). moving uptime kuma to something in the cloud should give me a boost in availability.

why don't you get a UPS?

"get a UPS!", i hear you cry.

well, yes, if i were truly dedicated to homelabbing, i could purchase an uninterruptible power supply and call it a day. however...

  • my homelab is perfectly reliable. when it works, it works damn well. nothing's gone kaput (yet!) but there's no modification i make do to my homelab that will give me the availability i need, especially for monitoring.
  • the only service on the machine that's somewhat mission-critical is uptime monitoring. i can do without most of the stuff on my homelab for the couple of hours it would take for an electrician to come around.
  • i don't want to have to learn to maintain a piece of hardware i'm not familiar with. i'm lazy.
  • i don't want to have to shell out ÂŁ150 or so. i'm cash-strapped.
  • i have no clue what the power draw of my machine is like, so i have no clue what kind of ups i'd need.
  • as i build out my homelab and add more servers, i'd need to supply them with some form of auxillary power too. figuring that out does not seem like it would be fun.
  • this is my homelab, not yours. go away. you can't tell me what to do.

i'm sure i'll look at a ups or some other backup power system at some point in future, but for now... baby steps.

fly(.io) away, little bird

on Willow's recommendation, i tried really hard to use fly.io. however, i've always found it a bit of a pain in the arse to use for many reasons.

  • i don't find its UI or CLI tools particularly intuitive. i've never understood why fly and flyctl exist. (apparently fly is a symlink to flyctl, but it's ridiculous to have two commands that do the exact same thing.)
  • on the above point, their monitoring page (called "live logs") only streams logs, which isn't immediately apparent. if you want to see anything historical (including from a few seconds ago during the same deployment), you need to go to their Grafana-based log search tool
  • its lack of support for docker compose makes it unnecessarily difficult for running anything multi-container (admittedly, this comes from not using docker under the hood, so i'll give them a pass for this)
  • creating a "golden image" which contains everything you need feels like a bunch of extra implementation and maintenance effort for not much benefit. we have separate docker containers for a reason.

after a few hours of tinkering, including creating my own Dockerfile based on louislam/uptime-kuma:1, trying to add tailscale and trying to get it set up, it just didn't work. i'm sure fly.io has its uses, but it isn't the right tool for the job in this case.

bear metal

i was not eager to rent a vps, but it seemed like it's the best option. looking at lowendbox, i managed to find a super cheap deal from racknerd, with 1 vCPU core, 15 GB SSD and 768 MB RAM!!

using docker stats, it looks like the uptime kuma container takes up ~135MB of RAM which isn't too bad. apparently, tailscale is also incredibly light memory-wise.

migrating data from my homelab to a racknerd vps would require a little bit of legwork.

  1. buy the box
  2. install tailscale and cloudflared
  3. copy over the Dockerfile for uptime kuma
  4. copy over the data/ directory between my homelab and my shiny new vps using my macbook
  5. clean up after myself

a few things to note

  • i need to add this vps to my tailnet, so i can monitor the status of machines and services which aren't accessible to the public internet. i feed this data into prometheus (on my homelab. yes, i know, it's still a single point of failure, but i'm getting around to fixing that, too), and then can visualise that data
  • for networking, i'd route my traffic through cloudflared as i do with my homelab. although racknerd would be less grumpy than my isp if i choose to expose ports for a website, i don't want to fuss about with nginx

buying the box

i paid ~US$25 for two years of that vps, which is a pretty damn good deal. i set the os to be ubuntu with docker preinstalled, to make things as painless as possible. after purchase, i received console and ssh credentials via email. i immediately reset these.

i decided to name this machine canary, as this server's for warning me about services that get knocked offline. and they're orange 🧡

installing tailscale and cloudflare

as i purchased a vm, i decided to install tailscale directly on the box.

after installing tailscale using the installation script, i logged in with tailscale up and approved the new server from my admin panel.

next, i set up cloudflare tunnels. this was as simple as logging into my cloudflare panel, going to zero trust > networks > tunnels > create a tunnel. after setting up the connection, i pointed my status and uptime subdomains to the new connection.

i also set up the firewall!

sudo apt install ufw
sudo ufw enable
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow in on tailscale0 to any port 22
sudo ufw reload
sudo service ssh restart

after doing all that, i can take a look at my updated rules. 🎉

root@canary:~# sudo ufw status numbered
Status: active

     To                         Action      From
     --                         ------      ----
[ 1] 22 on tailscale0           ALLOW IN    Anywhere                  
[ 2] 22 (v6) on tailscale0      ALLOW IN    Anywhere (v6) 

as i'm in the area, i thought i'd also enable passwordless ssh.

cd ~
mkdir .ssh
touch .ssh/authorized_keys
nano .ssh/authorized_keys
# next, i manually copy-paste my public key from my machine
chmod 700 .ssh
chmod 640 .ssh/authorized_keys

and after tinkering with my mac's ~/.ssh/config file, it works!

i decided not to use tailscale's built-in ssh feature, just in case it runs out of memory and the daemon gets killed. but then again, the firewall rules also depend on tailscale so maybe that wasn’t the brightest idea.

docker compose

we're almost there. we first need to install docker compose (the plugin, not standalone), or we'll face docker: 'compose' is not a docker command. errors. this is done with sudo apt-get install docker-compose-plugin.

the uptime kuma docker compose is incredibly simple. i've chosen to use the one from the repo.

when running docker compose up, it initially errors with error getting credentials - err: exit status 1, out: Cannot autolaunch D-Bus without X11 $DISPLAY. it appears that docker uses the secretservice executable for credential management, which has a dependency on X11. as this is a server, there quite obviously is no window system installed, so i fix that by installing pass instead.

when generating a gpg2 key, the server didn't have enough entropy, so i had to use sudo apt-get install rng-tools.

once pass was all set up, docker compose up worked!

cd ~
mkdir uptime
cd uptime
curl -o docker-compose.yml https://raw.githubusercontent.com/louislam/uptime-kuma/dda40610c7ae307681429274430d48b9049c1933/compose.yaml
docker compose up -d

...and, everything's configured correctly first time!

copying over the data

copying over the data was a breeze.

i nuked the data/ folder on canary with rm -rf ~/uptime/data.

next, i copied over the data from my main machine in my homelab, sunset, using scp. i had to find where the data was, though that wasn't too hard.

first, i found the uptime kuma container id with docker container ls --filter "name=uptime-kuma" -q. in this case, my container id was f0bade7cdb62.

next, i got the path for the mount using docker inspect f0bade7cdb62 | jq -r .[0].Mounts[0].Source. my mount path was /var/lib/docker/volumes/j04c444ocg8kc0k0wwoow8gk_uptime-kuma/_data.

finally, on canary i copied the data using scp -r root@sunset:/var/lib/docker/volumes/j04c444ocg8kc0k0wwoow8gk_uptime-kuma/data/. ~/uptime/data/. this copies the contents of the _data/ directory, and puts it under data/ on my new machine.

as tailscale manages ssh to sunset, i don't need to fiddle with ssh keys to copy things from sunset to canary. that makes my experience ten times easier.

finally, i restart the container with docker compose restart.

and it works!

cleaning up

i point to the domain uptime.cowsay.io in my prometheus config on my homelab, so i don't need to make any changes there.

the only thing left to do is remove uptime kuma from my homelab, which was a breeze through coolify's admin panel.

wrap-up

this wasn't as much work as i expected, and took a couple hours on a saturday afternoon!

it's nice to have monitoring detached from my main homelab, and i'm now able to use uptime kuma more confidently for downtime notification, too.

the last thing left to do is to shutdown my homelab, and watch as the bars go red! after switching it back on, everything recovers smoothly.

the downsides of this approach are

  • if there's a network outage to the vps, i'm screwed. then again, it's more reliable than my power and internet connection
  • storage will eventually become an issue. at some point. although this vps has 15 GB of PURE SSD (yes, that's how it's advertised. no, i don't know why. yes, i'm just as confused as you), so that's a problem for future me to worry about
  • there isn't any built-in redundancy or failover. however, this comes from uptime kuma being a single-instance solution, not from it being cloud-based
  • it relies on cloudflare tunnels for routing traffic. if that goes down, i'm not able to access the webpage. but that's a rather small thing

next, i should probably start backing things up... but nah, i'll leave that for another day