General troubleshooting

in Troubleshooting Tags: MagentoMySQLNginx

Need to debug a problem? Hypernode has many possibilities to analyse a suspicious or unwanted situation. Learn where to look and reach a conclusion fast.

Notes on log files in general

System and logfile times are in UTC (GMT) timezone! Dutch time is UTC+1 in winter and UTC+2 in summer.

Log files are rotated daily. Previous log files are compressed using gzip. They can be viewed with commands such as zcat or zgrep.

Webserver

Most issues will result from specific people or external systems interacting with your shop over HTTP. There are several tools to tell you what happened and what is currently going on.

Live usage

To see live, ongoing requests on your Hypernode, type livefpm and you will see a continuous overview like this:

24769 BUSY   0.2s DE 54.93.108.145   POST www.site.nl/index.php/api/soap/index/   (Picqer Magento Integration (picqer.com))
24227 IDLE   0.1s DE 54.93.108.145   GET  www.site.nl/api/soap?wsdl   (Picqer Magento Integration (picqer.com))
24762 BUSY   0.1s NL 37.139.5.130    GET  www.site.nl/index.php/api/soap/index/?wsdl=1   (None)

Here, there are 3 PHP slots, of which 2 are busy. You see the duration (keep it below 1 sec for happy visitors!), the country, IP, request and user agent.

Past usage

There is a wealth of information on past requests available. Hypernode uses extended request logging using JSON. The live log is in /var/log/nginx/access.log and is rotated daily. Because it can be hard to read the raw data, we have made the hypernode-parse-nginx-log (or short: pnl) tool for you.

See the help for pnl:

$ pnl --help
usage: hypernode-parse-nginx-log [-h] [--fields FIELDS] [--php] [--bots]
                                 [--format FORMAT] [--list-fields]
                                 [--filter FILTER] [--verbose]
                                 [--filename FILENAME | --today | --yesterday]

Parses, filters and styles Hypernode Nginx logs. Our Nginx logs are JSON
encoded for easy machine parsing. This utility will convert to human-
digestible output.

optional arguments:
  -h, --help           show this help message and exit
  --fields FIELDS      Comma separated list of fields to display
  --php                Short for --filter handler phpfpm
  --bots               Short for --filter
                       user_agent~(http|bot|crawl|spider|search)
  --format FORMAT      Format string to display fields
  --list-fields        Display a list of available fields
  --filter FILTER      Filter to apply. Format: <field>=<str> or
                       <field>~<regex> or <field>!~<regex>
  --verbose, -v        Display debug output
  --filename FILENAME  Path of nginx logfile to parse (default STDIN)
  --today              Short for --filename /var/log/nginx/access.log
  --yesterday          Short for --filename /var/log/nginx/access.log.1.gz

You can use the following fields for filtering or output:

  • status (http status code)
  • ua (user agent)
  • req (request)
  • ip
  • cc (country code)
  • referer
  • duration
  • handler (php-fpm, varnish, empty)

Examples

See live hits as they happen, but only those who are handled by PHP (tal is short for tail -f /var/log/nginx/access.log):

$ tal | pnl --php

Parse multiple log files, possibly gzipped:

$ zcat -f /var/log/nginx/access.log* | pnl

See all bots today with their user agents:

$ pnl --today --bots --fields status,duration,ip,req,ua

See all HTTP 500 hits.

$ pnl --today --filter status=500

See how many times a particular HTTP status code was returned today (formatted in a nice table).

$ pnl --today --fields status | sort | uniq -c | sort -nk2 | \
    awk 'BEGIN { print "Status Amount";print "------ ------" };{ printf "%-6s %s\n", $2, $1 }'

See a distribution of all 502 (bad gateway) hits over all logfiles.

$ for log in `ls -t /var/log/nginx/access.log*`; \
    do echo -n "$log: "; \
    zcat -f $log | pnl --filter status=502 | wc -l; \
    done

See whether bot traffic has increased over the last few days

$ for log in `ls -t /var/log/nginx/access.log*`; \
    do echo -n "$log: "; \
    zcat -f $log | pnl --php --bots | wc -l; \
    done

See all 404 hits that were handled by Magento in yesterday’s log (so you can make static placeholders)

$ pnl --yesterday --php \
    --filter status=404 --fields request | egrep ^GET | cut -f2 -d' ' |\
     cut -d'?' -f1 | sort | uniq -c | sort -n

See all pageviews from China

$ pnl --php --yesterday --filter country=CN 

Or all pageviews not from Holland:

$ pnl --php --today --filter 'country!=NL'

Check all bot pageviews (and user agents) and sort on frequency

$ pnl --yesterday --bots --php --fields ua | sort | uniq -c | sort -n

See all Android hits

$ pnl --yesterday --filter ua~Android \
    --fields status,country,ua

How many PHP requests were there yesterday?

$ pnl --yesterday --php | wc -l

How many unique IPs were served yesterday?

$ pnl --yesterday --fields remote_addr | sort | uniq | wc -l

Find all IPs that used multiple user agents today

$ pnl --today --fields remote_addr,user_agent | sort | uniq |\
    cut -d' ' -f1 | sort | uniq -c | sort -n | grep -v ' 1 '

Calculate total outbound traffic for yesterday (in this case, 20GB):

$ pnl --yesterday --fields body_bytes_sent | awk '{s+=$1} END {printf "%.0f\n", s}'
20726556545

See hypernode-parse-nginx-log --help for more options on filtering and output.

Errors

Basic operational webserver errors (about blocked requests) are logged to /var/log/nginx/error.log.

If you have modified your Nginx configuration (by updating files in /data/web/nginx) and it contains an error, it will put the error message in /data/web/nginx/nginx_error_output. After you resolve the issue, the file will disappear. If no such file exist, it means that your configuration is ok!

PHP

For debugging PHP issues, these files are useful:

/var/log/php-fpm/php-fpm.log (for PHP-FPM)
/var/log/php-fpm/php-slow.log (for FPM that took longer than 10 seconds)

You can check whether PHP failed, because the webserver will give a HTTP 502 Bad Gateway error. Check whether these errors have occurred:

$ pnl --yesterday --php --filter status=502

MySQL

Debuggin MySQL issues starts with running mytop. It will give you an overview of the active queries. Use the o and f keys to sort on duration and to see the full query.

To check on past queries, see the slow log at /var/log/mysql/mysql-slow.log. It contains all the queries that took longer than 2 seconds.

If your queries are stopped before completion, check in /var/log/mysql/error.log whether MySQL has crashed and is restarted (happens seldomly).

FTP usage

Logs of logins and transfer can be found at /var/log/proftpd/proftpd.log.

E-mail

Check on the status of outgoing mail in /var/log/mail.log. Also, check if any outgoing mail is still in the queue, by running mailq.

Cron tasks

Verify starting of your cron tasks:

$ grep CRON /var/log/syslog | tail

If you have installed the excellent Aoe_Scheduler module, it will also keep a Magento specific cron log in public/var/log/cron.log.

Magento errors

Magento is generally abundant in logging issues. It will write these to public/var/report and public/var/log.

Generic solutions

Flush your caches

Most of the times when something goes wrong in Magento, it can be solved by simply flushing your cache. There are a couple of ways of flushing your cache:

  1. Use the Cache Management in Magento. Log in to the backend of your Magento shop. Go to System –> Cache Management and use the flush buttons.
  2. Use Magerun on the commandline: magerun cache:flush
  3. Flush Redis-specific cache on the commandline: redis-cli flushall

Restart PHP-FPM

If you want to reload php-fpm or kill all currently running processes that you would like to abort, you can run the SSH shell: $ pkill -9 -u app php5-fpm

This will kill all workers (Which will be restarted) but keeps the master process alive, leaving php-fpm in a state it can easily start new threads to continue.

Troubleshooting slow MySQL queries

First, check whether any slow queries are running right now. $ mytop If there are many queries, type ‘o’ to reverse the order and show longest running queries on top. If there’s a particular long running query, type ‘f’ and then the id to view the full query. If you want to abort it, type ‘k’ and then the query id to kill it. Warning: generally only do this for SELECT queries. Killing other queries might corrupt your data. Second, examine recent slow queries $ less /var/log/mysql/mysql-slow.log. Once you have find a suspect (for example, a query that takes longer than 2 seconds), you can prefix your queri with the EXPLAIN keyword to find where the bottleneck is. For example:

mysql> explain SELECT `main_table`.* FROM `catalogsearch_query` AS `main_table` ORDER BY updated_at DESC LIMIT 5;

+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | main_table | ALL | NULL | NULL | NULL | NULL | 2159 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)

This will not run your query, but tell you how many rows were examined, whether an index was used, and what kinds of sorting algorithms were used. In this case, there was a relatively low number of rows involved (2159). However, the filesort algorithm is extremely slow and should be avoided. See more information about avoiding filesort here and here. The solution in this case is to add a proper index on the updated_at column.

0