This week we’ve been quite busy working on some behind the scenes improvements for the platform. Although there won’t be any noticeable impact for users, these type of changes are important because they make it so that Hypernode and its peripheral systems remain stable and can keep scaling as our user base grows.

Summary of improvements

To give you an insight in what goes into making sure the processes that are responsible for performing the automated creation of new Hypernodes, executing the plan upgrades and downgrades and the systems overseeing the emergency recovery contingencies keep operating with minimal failure and manual oversight, this is a short summary of the sort of tasks we’ve been tackling.

Internal tooling

Since the last release we have expanded our internal tooling for manual floating IP operations in case of persistent cloud API failure, we have decreased the load on one of our Icinga systems that was handling more than 23.000 checks, and we have improved our automated instance scaling success rate by analyzing our centralized logs to verify if our reversion and retry strategies were still sufficient for any emerging patterns of system failure.

Scans and processes

We’ve scanned our platform for configuration drift and implemented structural fixes for preventing any newly detected types of divergence from happening again in the future and we’ve pruned a lot of dormant code relating to Precise deployments that no longer served any purpose.

We refactored the logic around the automated blocking of bots in case of country-based brute-force attacks to make it more resilient, and we increased the capacity of our distributed job processing system by optimizing resource-intensive components and by beefing up the firepower of the control servers so that the system can process a larger number of jobs more quickly.

To further minimize any performance impact of monitoring related periodic tasks we’ve decreased the footprint of various background processes on all Hypernodes, and lastly we’re working on expanding the part of our platform responsible for on call alerting with data from our ELK Stack so we can act both more swiftly and targeted in case of events of reduced availability.

Beyond that, there were a couple of changes that are directly relevant for users:

Increased hash bucket size for larger nodes

For nodes with 8GB memory and up we have increased the NGINX map_hash_bucket_size and server_names_hash_bucket_size from 128 to 256. This can be helpful for larger shops with expansive external whitelists. For more information about the NGINX bucket sizes see this earlier changelog.

Disabled Debian banner

We have added DebianBanner no to our sshd config. This will hide the operating system version from the SSH banner. While we generally don’t believe in security through obscurity, some of our customers have mentioned that displaying Ubuntu in this banner makes some compliancy scanners go off.

Before

After

pnl does not crash on Unicode decode errors

In the previous release we made it so that NGINX escaped json better in the log files, but there were still cases where certain requests would crash the log parser.

In the new version of hypernode-parse-nginx-logs (also known as pnl) these requests will be ignored.

Fixed index.php importer issue in hypernode-vagrant

It came to our attention that using the hypernode-importer in the hypernode-vagrant local development environment caused problems by not overwriting the standard index.php if a shop was imported. This was because of incorrect permissions on the default web page that displays information about the services in the environment. This has been fixed now. Thanks to Akif Gümüssu for reporting this issue.