To protect your Hypernode from all kinds of attacks, bots, brute forces and scriptkiddies causing downtime, we’ve implemented several layers of rate limiting.

Most of these rate limit methods only apply to bots, but to avoid FPM worker depletion, we recently implemented a rate limiting mechanism per IP to prevent a single IP from exhausting the available FPM workers.

This article will explain the differences between the rate limiting methods used and show you how to find which rate limiting method applies and if needed, how to override them.

Methods of rate limiting used

Two rate limiting methods

On Hypernodes we currently use two sorts of rate limiting:

  • Rate limiting per user agent to protect your node from bots and crawlers that are exhausting server resources
  • Rate limiting per IP to protect your node from scriptkiddies, overenthousiastic SEO analyzers and testing tools running wild depleting all FPM workers.

These two rate limiting methods both are defined using a rate limit zone. A zone is an allocation in memory where Nginx stores it’s connection data to verify whether a useragent or IP should be rate limited.

Currently we rate limit limit two zones:

  • bots
  • zoneperip

The zoneperip rate limiting zone applies to concurrent connections for PHP requests, and is only activated when an IP is using the maximum of 20 FPM workers per IP

The bots rate limiting zone applies to the amount of requests per second when the useragent matches to be a bot or crawler.

Bot mechanisms are implemented using Nginx’s limit_req_module

Find out which rate limiting method applies

When an IP address or useragent is blocked, an error log entry is written to Nginx error logfile /var/log/nginx/error.log mentioning the zone to which the limit applies.

To find out which rate limiting method is active, find the corresponding log entries in the error logging and verify based on which zone the request is rate limited.

A log entry where rate limit is applied to useragent and requests per second (based on the bots zone):

A log entry where the rate limit is applied per IP (based on the zoneperip zone):

Per IP rate limiting does not apply to static content but only to requests handled by PHP.

Rate limiting against bots and crawlers

Everyday, your webshop is visited by many different bots. While some, like Google, are important, many only have a negative impact on your site, especially if they don’t follow your robots.txt.
To protect your Hypernode against negative performance impacts by misbehaving bots, it utilizes an advanced rate limiting mechanism. This slows down the hit rate for unimportant bots, leaving more performance for the bots you do care about, and, more important, your actual visitors.

Rejecting with 429 Too Many Requests

Since our goal is not to block bots, but to rate limit them nicely, we have to be quite careful with how we reject them. As such, the best way to reject them is with the 429 Too Many Requests message. This lets the visiting bot know that the site is there, but the server is currently unavailable. This is a temporary state, so they can retry at a later time. This does not negatively influence the ranking in any search engine, as the site is there when the bot connects at a later time.

Configuring the bot rate limiter

By default good bot are exempt from rate limiting: Google, Bing and several monitoring systems. The good bots never get rate limited, but the bad bots get limited to 1 request per second. Any requests over that return a 429 error. If you want, you can override the system-wide configuration on who gets blocked and who does not. To get started, place the following in a config file called /data/web/nginx/http.ratelimit:

Note: do not remove the heartbeat entry! As this will break the monitoring of your Hypernode

As you can see, this sorts all visitors into two groups: First the bots with ‘google’, ‘bing’, ‘heartbeat’ or ‘’ in their user_agent are marked as neutral, and then the generic bots with crawler, spider, bot, etc are placed into the group ‘bot’. The keywords that they are matched through are separated with | characters, since it is a regular expression.

Whitelisting more bots

To extend the whitelist, first determine what user agent you wish to add. Use the log files to see what bots get blocked and which user agent identification it uses. Say the bot we want to add has the User-Agent SpecialSnowflakeCrawler 3.1.4. It contains the word ‘crawler’, so it matches the second regular expression, and is labeled as a bot. Since the whitelist line overrules the blacklist line, the best way to allow this bot is to add their user agent to the whitelist, instead of removing ‘crawler’ from the blacklist:

While you can add the complete user agent to the regex, it’s often better to limit it to just an identifying piece, as shown above. Because the whole string is evaluated as a Regular Expression, care needs to be taken when adding anything other than alphanumeric characters.

Known issues

There are some plugins and service providers that tend to hit this test filter, and may need to be excluded. Below we’ll keep a list of these and their user agents for your convenience:

  • Adyen – Jakarta Commons-HttpClient/3.0.1
  • Adyen – Apache-HttpClient/4.4.1 (Java/1.8.0_74)
  • Adyen – Adyen HttpClient 1.0
  • MailPlus – Jersey/2.23.1
  • Mollie – HTTP client/1.0
  • Screaming frog seo spider – Screaming Frog SEO Spider

Rate limiting per IP address

To prevent a single IP from using all the FPM workers available at the same time, leaving no workers available for other visitors, we implemented a per IP rate limit mechanism. This mechanism sets a maximum amount of PHP-FPM workers that can be used by one IP to 20. This way a single IP cannot deplete all the available FPM workers, leaving other visitors with an error page or a non-responding site.

We decided however to give you the option of manually exclude IP adresses from the per IP rate limit. This way you can easily whitelist all IP’s that use the Magento admin without fully disabling the rate limit.

The status code of a rate limited request is 429.

Exclude known IP’s from the per IP rate limiting mechanism

It’s possible to exclude IP’s and IP ranges from the per IP rate limit. To do so, create a file /data/web/nginx/http.ratelimit, and add the following snippet, containing the IP’s and IP ranges required:

Exclude specific urls from the per IP rate limiting mechanism

To exclude specific urls from being rate limited you can create a file /data/web/nginx/server.ratelimit with the following content:

In this example you will exclude the urls */rest/V1/example-call/* and /elasticsearch.php

Then use the $ratelimit_request variable for excluding the url from the per IP rate limiter (so bots will still be blocked based on User Agent). You can do this by following the steps above under ‘Exclude known IP’s’, and replacing the ‘default’ so that the file /data/web/nginx/http.ratelimit looks as follows:

Serve a custom static error page to rate limited IP addresses

It is possible to serve a custom error page to IP’s that get rate limited.

To do this, create a static html file in /data/web/public with a custom error page and a file called /data/web/nginx/server.custom_429 with the following content:

This will serve a custom static ratelimited.html file to IP’s that use too much PHP workers.

Warning: Only use a static page to do this, as creating a PHP script to render an error will be rate limited too, causing an endless loop.

Turn off per IP rate limiting

When your shop performance is very poor, it’s possible all your FPM workers are busy just by serving regular traffic. Handling a request takes so much time, that all workers are continuously depleted by a small amount of visitors. If this situation appears, we highly recommend to optimize your shop for speed and temporary upgrade to a bigger node while doing so. Turning off the rate limit will not fix this problem but only change the error message from a Too many requests error to a timeout error.

For debugging purposed however, it could be useful to turn off the per IP connection limit for all IP’s.

With the following snippet in /data/web/nginx/http.ratelimit it is possible to completely turn off IP based rate limiting:

Warning: Only use this setting for debugging purposed! It is highly discouraged to use this setting on production Hypernodes, as your shop can be easily taken offline by a single IP using slow and or flood attacks.