Post-mortem for the server issues on Aug 30th, 2023

8:40:00 AM, we experienced an unexpected outage on the machine hosting our BYOND servers. This coincided with a bump to the BYOND version and initial logs seemed to indicate that it was related. Naturally, we rolled back the update and restored our services at 8:42:00 AM.

After about an hour, (9:44:00 AM) we experienced another outage which triggered an emergency maintenance period. During this timeframe several services were upgraded and various checks were ran across the system. After completing maintenance, no further issues were observed so I announced our return to production at 11:21:00 AM.

A little less than an hour later at 12:07:00 PM, we experienced another similar outage. Upon further investigation, I identified an issue with the Microsoft licensing server failing to rearm our game-server’s license. This was ultimately the cause for the other two major outages tonight as well, as without a valid license the server will automatically shutdown every hour. After identifying this as the central issue, I was able to manually rearm the license.

To try and address these problems in the future, I’ve added some extra logging around the licensing state of the Windows server. A support request to resolve the issues with the upstream licensing server has also been submitted.

As a small consolation prize for dealing with these issues, I was able to install several important upgrades to the server during our emergency maintenance which could potentially provide a few small stability and performance improvements.


As of now, all of the issues should be resolved and the game servers should be stable once again. I’m sorry for any lost rounds that occurred due to these outages.

15 Likes

Out of curiosity why are we using Windows and not some fork of Linux for our servers? I’m pretty sure there’s variant of Byond or Dream-whatever-it-is-called made specifically for hosting on Linux

1 Like

BYOND performs substantially better on Windows and our dependencies (namely Auxmos) does not run very well on Linux and is prone to crashing and other issues.

Lummox is pretty open about the fact that they are not a Linux developer and they provide hosting support as a nicety. Issues that arise from that platform are seldom resolved and performance issues are usually left unchecked.

3 Likes

I was here when it happened… Including my ss13 addiction and withdrawals of the game…

This topic was automatically closed after 4 days. New replies are no longer allowed.