Resolving 429 Too Many Requests

in StabilityTroubleshooting

To protect your hypernode from all kinds of attacks, bots, bruteforces and scriptkiddies causing downtime, we’ve implemented several layers of ratelimitting.

Most of these ratelimit methods only apply to bots, but to avoid fpm worker depletion, we recently implemented a ratelimitting mechanisme per IP to prevent a single IP from exhausting the available fpm workers.

This article will explain the differences between the ratelimitting methods used and show you how to find which ratelimitting method applies and if needed, how to override them.

Methods of ratelimitting used

Two ratelimitting methods

On hypernodes we currently use two sorts of ratelimitting:

  • Ratelimitting per useragent to protect your node from bots and crawlers that are exhausting server resources
  • Ratelimitting per IP to protect your node from scriptkiddies, overenthousiastic SEO analyzers and testing tools running wild depleting all fpm workers.

These two ratelimitting methods both are defined using a ratelimit zone. A zone is an allocation in memory where nginx stores it’s connection data to verify whether a useragent or IP should be ratelimitted.

Currently we ratelimit limit two zones:

  • bots
  • zoneperip

The zoneperip ratelimitting zone applies to concurrent connections for php requsts, and is only activated when an IP is using all the fpm workers and there are only two workers left for other visitors.

The bots ratelimitting zone applies to the amount of requests per second when the useragent matches to be a bot or crawler.

Bot mechanisms are implemented using nginx’s limit_req_module

Find out which ratelimitting method applies

When an IP address or useragent is blocked, an error log entry is written to nginx error logfile /var/log/nginx/error.log mentioning the zone to which the limit applies.

To find out which ratelimitting method is active, find the corresponding log entries in the error logging and verify based on which zone the request is ratelimitted.

A log entry where ratelimit is applied to useragent and requests per second (based on the bots zone):

2016/08/15 18:25:54 [error] 11372#11372: *45252 limiting requests, excess: 0.586 by zone "bots", client: 1.2.3.4, server: , request: "GET /azie/flip.html HTTP/1.1", host: "www.kamelen-online.nl"

A log entry where the ratelimit is applied per IP (based on the zoneperip zone):

2016/08/12 10:23:39 [error] 25118#25118: *24362 limiting connections by zone "zoneperip", client: 1.2.3.4, server: , request: "GET /index.php/admin/abcdef/ HTTP/1.1", host: "www.kamelen-online.nl", referrer: "http://kamelen-online.nl/index.php/admin/abcdef/"

Per IP ratelimitting does not apply to static content but only to requests handled by php.

Ratelimitting against bots and crawlers

Everyday, your webshop is visited by many different bots. While some, like Google, are important, many only have a negative impact on your site, especially if they don’t follow your robots.txt.
To protect your Hypernode against negative performance impacts by misbehaving bots, it utilizes an advanced rate limiting mechanism. This slows down the hit rate for unimportant bots, leaving more performance for the bots you do care about, and, more important, your actual visitors.

Rejecting with 429 Too Many Requests

Since our goal is not to block bots, but to rate limit them nicely, we have to be quite careful with how we reject them. As such, the best way to reject them is with the 429 Too Many Requests message. This lets the visiting bot know that the site is there, but the server is currently unavailable. This is a temporary state, so they can retry at a later time. This does not negatively influence the ranking in any search engine, as the site is there when the bot connects at a later time.

Configuring the bot rate limiter

By default good bot are exempt from rate limiting: Google, Bing and several monitoring systems. The good bots never get rate limited, but the bad bots get limited to 1 request per second. Any requests over that return a 429 error. If you want, you can override the system-wide configuration on who gets blocked and who does not. To get started, place the following in a config file called /data/web/nginx/http.ratelimit:

map $http_user_agent $limit_bots {
default '';
~*(google|bing|pingdom|monitis.com|Zend_Http_Client|SendCloud|magereport.com) '';
~*(http|crawler|spider|bot|search|Wget/|Python-urllib|PHPCrawl|bGenius) 'bot';
}

Note: do not remove the Pingdom entry! As this will break the monitoring of your Hypernode

As you can see, this sorts all visitors into two groups: First the bots with ‘google’, ‘bing’, ‘pingdom’ or ‘monitis.com’ in their user_agent are marked as neutral, and then the generic bots with crawler, spider, bot, etc are placed into the group ‘bot’. The keywords that they are matched through are separated with | characters, since it is a regular expression.

Whitelisting more bots

To extend the whitelist, first determine what user agent you wish to add. Use the log files to see what bots get blocked and which user agent identification it uses. Say the bot we want to add has the User-Agent SpecialSnowflakeCrawler 3.1.4. It contains the word ‘crawler’, so it matches the second regular expression, and is labeled as a bot. Since the whitelist line overrules the blacklist line, the best way to allow this bot is to add their user agent to the whitelist, instead of removing ‘crawler’ from the blacklist:

map $http_user_agent $limit_bots {
default '';
~*(google|bing|pingdom|monitis.com|Zend_Http_Client|magereport.com|specialsnowflakecrawler) '';
~*(http|crawler|spider|bot|search|Wget/|Python-urllib|PHPCrawl|bGenius) 'bot';
}

While you can add the complete user agent to the regex, it’s often better to limit it to just an identifying piece, as shown above. Because the whole string is evaluated as a Regular Expression, care needs to be taken when adding anything other than alphanumeric characters.

Known issues

There are some plugins and service providers that tend to hit this test filter, and may need to be excluded. Below we’ll keep a list of these and their user agents for your convenience:

  • Adyen – Jakarta Commons-HttpClient/3.0.1
  • Adyen – Apache-HttpClient/4.4.1 (Java/1.8.0_74)
  • Adyen – ‘Adyen HttpClient 1.0’
  • MailPlus – Jersey/2.23.1
  • Screaming frog seo spider – Screaming Frog SEO Spider

Ratelimitting per IP address

To prevent a single IP from using all the fpm workers available at the same time, leaving no workers available for other visitors, we implemented a per IP ratelimit mechanism. This mechanism starts to throttle requests from an IP when this IP is using almost all fpm workers (more specific: when there are only two workers left). This way a single IP cannot deplete all the available fpm workers, leaving other visitors with an error page or a non-responding site.

On smaller nodes (start, grow) with a relatively slow admin, we noticed that a single product or page save can sometimes result in being ratelimitted. As these smaller hypernodes benefit the most from the per IP ratelimit, we decided not to exclude these nodes from the ratelimit.

We decided however to give you the option of manually exclude IP adresses from the per IP ratelimit. This way you can easily whitelist all IP’s that use the magento admin without fully disabling the ratelimit.

We ratelimit according to this formula: (cpu cores * 5) -2

This means that when you have a 2 core cpu, there are 10 workers in total of which you can use a max of 8 at the time from a single ip address.
This way there are always two workers present to serve other visitors.

The status code of a ratelimitted request is 429

Exclude known IP’s from the per IP ratelimitting mechanism

If you want to exclude IP’s from the per IP ratelimit, create a file /data/web/nginx/http.conn_ratelimit with the following content:

map $remote_addr $conn_limit_map {
default $remote_addr;
1.2.3.4 '';
}

To exclude an IP (I.E. 1.2.3.4), add the IP to the mapping and set an empty value. This will exclude the IP from the ratelimit.

Turn off per IP ratelimitting

When your shop performance is very poor, it’s possible all your fpm workers are busy just by serving regular traffic. Handling a request takes so much time, that all workers are continuously depleted by a small amount of visitors. If this situation appears, we highly recommend to optimize your shop for speed and temporary upgrade to a bigger node while doing so. Turning off the ratelimit will not fix this problem but only change the error message from a Too many requests error to a timeout error.

For debugging purposed however, it could be usefull to turn off the per IP connection limit for all IP’s.

With the following snippet in /data/web/nginx/http.conn_ratelimit it is possible to completely turn off IP based ratelimitting:

map $remote_addr $conn_limit_map {
default '';
}

Warning: Only use this setting for debugging purposed! It is highly discouraged to use this setting on production nodes, as your shop can be easily taken offline by a single IP using slow and or flood attacks.

Serve a custom static error page to ratelimitted IP addresses

It is possible to serve a custom error page to IP’s that get ratelimitted.

To do this, create a static html file in /data/web/public with a custom error page and a file called /data/web/nginx/server.custom_429 with the following content:

error_page 429 /ratelimitted.html;
location = /ratelimitted.html {
root /data/web/public;
internal;
}

This will serve a custom static ratelimitted.html file to IP’s that use too much php workers.

Warning: Only use a static page to do this, as creating a php script to render an error will be ratelimitted too, causing an endless loop.

1