#10 closed defect (fixed)
Install Munin server and clients
Reported by: | chris | Owned by: | chris |
---|---|---|---|
Priority: | major | Milestone: | Install and configure crin1 |
Component: | crin1 | Version: | |
Keywords: | Cc: | jenny, gillian | |
Estimated Number of Hours: | 1 | Add Hours to Ticket: | 0 |
Billable?: | yes | Total Hours: | 4.27 |
Description
In order to monitor and adjust memory allocations, process numbers etc for MySQL, Memcache, Nginx and MySQL (see ticket:9) we will need a Munin server and clients set up so we have some data to use as a basis for making decisions.
Change History (12)
comment:1 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0 to 0.86
- Total Hours set to 0.86
comment:2 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0 to 0.52
- Total Hours changed from 0.86 to 1.38
Checking the Munin install on Crin1:
sudo -i su - munin -s /bin/bash cd /etc/munin/plugins munin-check * check /var/cache/munin/www check /var/lib/munin/cgi-tmp ls: cannot access /var/lib/munin/cgi-tmp/*: No such file or directory # /var/lib/munin/cgi-tmp/* : Wrong owner ( != munin) check /var/lib/munin/crin.org check /var/lib/munin/datafile check /var/lib/munin/datafile.storable check /var/lib/munin/graphs check /var/lib/munin/htmlconf.storable check /var/lib/munin/limits check /var/lib/munin/limits.storable check /var/lib/munin/localdomain check /var/lib/munin/munin-graph.stats check /var/lib/munin/munin-update.stats check /var/lib/munin/state-crin.org-crin1.crin.org.storable check /var/lib/munin/state-crin.org-crin2.crin.org.storable check /var/lib/munin/state-localdomain-localhost.localdomain.storable check miscellaneous # /var/lib/munin-node/plugin-state : Wrong owner (root != nobody) # /var/lib/munin-node/plugin-state : Wrong permissions (755 != 775) # /etc/munin/plugin-conf.d : Wrong permissions (750 != 755) Check done. Please note that this script only checks most things, not all things. Please also note that this script may be buggy.
So change a few things:
chown -R munin:munin /var/lib/munin/cgi-tmp/ chown munin:munin /var/lib/munin-node/plugin-state chmod 755 /var/lib/munin-node/plugin-state
Still no joy, things work on the command line:
cd /etc/munin/plugins munin-run cpu user.value 334992 nice.value 3496 system.value 103674 idle.value 1201065630 iowait.value 169340 irq.value 6 softirq.value 7112 steal.value 6535 guest.value 0
But no graph: https://munin.crin.org/munin/crin.org/crin1.crin.org/cpu.html
Testing with telnet, as per this suggestion:
telnet localhost 4949 Trying ::1... Connected to localhost. Escape character is '^]'. # munin node at crin1 nodes crin1 . list crin1 apache_accesses apache_processes apache_volume cpu df df_inode entropy exim_mailqueue exim_mailstats fail2ban forks fw_conntrack fw_forwarded_local fw_packets http_loadtime if_err_eth0 if_eth0 interrupts ip_93.95.228.180 irqstats load memory munin_stats netstat nfs4_client nfs_client nfsd nfsd4 ntp_kernel_err ntp_kernel_pll_freq ntp_kernel_pll_off ntp_offset open_files open_inodes proc_pri processes swap threads uptime users vmstat fetch df _dev_dm_0.value 5.65480745635344 _run.value 6.09743476010941 _dev_shm.value 0 _run_lock.value 0 _sys_fs_cgroup.value 0 _dev_sda1.value 14.5380714213827 . quit Connection closed by foreign host. telnet 93.95.228.180 4949 Trying 93.95.228.180... Connected to 93.95.228.180. Escape character is '^]'. # munin node at crin2 nodes crin2 . list crin2 cpu df df_inode entropy exim_mailqueue exim_mailstats fail2ban forks fw_conntrack fw_forwarded_local fw_packets http_loadtime if_err_eth0 if_eth0 interrupts irqstats load memory netstat nfs4_client nfs_client nfsd nfsd4 ntp_kernel_err ntp_kernel_pll_freq ntp_kernel_pll_off ntp_offset open_files open_inodes proc_pri processes swap threads uptime users vmstat fetch df _dev_dm_0.value 18.1259598359727 _run.value 4.07777038515561 _dev_shm.value 0 _run_lock.value 0 _sys_fs_cgroup.value 0 _dev_sda1.value 14.5363211117966 . quit Connection closed by foreign host.
Trying this suggestion:
su - munin -s /bin/bash /usr/share/munin/munin-update --nofork --debug
That generates lots of output but nothing that helps.
Checking the crontab at /etc/cron.d/munin, it contains:
# # cron-jobs for munin # MAILTO=root */5 * * * * munin if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi 14 10 * * * munin if [ -x /usr/share/munin/munin-limits ]; then /usr/share/munin/munin-limits --force --contact nagios --contact old-nagios; fi
I'm at a bit of a loss, I'll continue trying to track down the problem(s) tomorrow.
comment:3 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0 to 1
- Total Hours changed from 1.38 to 2.38
I removed, purged and reinstalled the munin clients and servers on both Crin1 and Crin2 and now static graphs are being generated but not dynamic ones.
Apache mod fast cgi was installed:
aptitude install libapache2-mod-fcgid
Still don't have dynamic graphs, very sprry that this is taking a lot longer than expected to configure.
Next thing to try is reading the documentation:
comment:4 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0 to 0.5
- Total Hours changed from 2.38 to 2.88
So after enabling Munin to be served over HTTP and chaning the Apache log level to debug I found the issue:
[Wed May 13 10:00:19.865357 2015] [fcgid:debug] [pid 8342] fcgid_proc_unix.c(542): (13)Permission denied: [client XX.XX.XX.XX:53224] mod_fcgid: can't connect unix domain socket: /var/lib/apache2/fcgid/sock/8309.17, referer: http://munin.crin.org/
So the VirtualHost was changed to run as www-data rather than munin, this is the final Apache configuration:
<VirtualHost *:80> <IfModule mpm_itk_module> AssignUserID www-data www-data MaxClientsVHost 60 </IfModule> ServerName munin.crin.org ServerAlias munin.crin1.crin.org ServerAdmin chris@webarchitects.co.uk <If "%{HTTP_HOST} == 'munin.crin1.crin.org'"> Redirect / https://munin.crin1.crin.org/ </If> Redirect / https://munin.crin.org/ LogLevel error ErrorLog ${APACHE_LOG_DIR}/munin_error.log CustomLog ${APACHE_LOG_DIR}/munin_access.log combined </VirtualHost> <IfModule mod_ssl.c> <VirtualHost *:443> <IfModule mpm_itk_module> AssignUserID www-data www-data MaxClientsVHost 60 </IfModule> ServerName munin.crin.org ServerAlias munin.crin1.crin.org ServerAdmin chris@webarchitects.co.uk SSLEngine on SSLCertificateFile /etc/ssl/cacert/crin1_cert.pem SSLCertificateKeyFile /etc/ssl/cacert/crin1_privatekey.pem SSLCACertificateFile /etc/ssl/cacert/cacert.pem RedirectMatch ^/$ https://munin.crin.org/munin Alias /munin/static /etc/munin/static <Directory /etc/munin/static/> AllowOverride None Options -Indexes -ExecCGI -MultiViews Require all granted </Directory> ScriptAlias /munin /usr/lib/munin/cgi/munin-cgi-html ScriptAlias /munin-cgi-graph /usr/lib/munin/cgi/munin-cgi-graph <Directory /usr/lib/munin/cgi> AllowOverride None Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch SSLOptions +StdEnvVars Require all granted <IfModule mod_fcgid.c> SetHandler fcgid-script </IfModule> </Directory> <Directory /var/cache/munin/www/static> AllowOverride None Options -Indexes -ExecCGI -MultiViews Require all granted <IfModule mod_expires.c> ExpiresActive On ExpiresDefault M310 </IfModule> </Directory> LogLevel error ErrorLog ${APACHE_LOG_DIR}/munin_ssl_error.log CustomLog ${APACHE_LOG_DIR}/munin_ssl_access.log combined </VirtualHost>
And for munin.conf:
graph_strategy cgi cgiurl_graph /munin-cgi-graph html_strategy cgi
The remaining task on this ticket is to disable plugins we don't need and enable ones we need but are not yet using.
comment:5 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0 to 0.5
- Total Hours changed from 2.88 to 3.38
Looking at the Munin graphs we have, these plugins were automatically configured, some we don't want, some extra ones are wanted:
- http://munin.crin.org/munin/crin.org/crin1.crin.org/index.html
- http://munin.crin.org/munin/crin.org/crin2.crin.org/index.html
We don't need NFS, (NFS graphs are being generated), so on both servers:
aptitude remove nfs-common rpcbind rm /etc/munin/plugins/nfs*
On Crin1 we want MySQL graphs, so using the new mysql_ plugin:
cd /etc/munin/plugins /usr/share/munin/plugins/mysql_ suggest Missing dependency Cache::Cache at /usr/share/munin/plugins/mysql_ line 728. aptitude install libcache-cache-perl /usr/share/munin/plugins/mysql_ suggest bin_relay_log commands connections files_tables innodb_bpool innodb_bpool_act innodb_insert_buf innodb_io innodb_io_pend innodb_log innodb_rows innodb_semaphores innodb_tnx myisam_indexes network_traffic qcache qcache_mem replication select_types slow sorts table_locks tmp_tables ln -s /usr/share/munin/plugins/mysql_ mysql_bin_relay_log ln -s /usr/share/munin/plugins/mysql_ mysql_commands ln -s /usr/share/munin/plugins/mysql_ mysql_connections ln -s /usr/share/munin/plugins/mysql_ mysql_files_tables ln -s /usr/share/munin/plugins/mysql_ mysql_innodb_bpool ln -s /usr/share/munin/plugins/mysql_ mysql_innodb_bpool_act ln -s /usr/share/munin/plugins/mysql_ mysql_innodb_insert_buf ln -s /usr/share/munin/plugins/mysql_ mysql_innodb_io ln -s /usr/share/munin/plugins/mysql_ mysql_innodb_io_pend ln -s /usr/share/munin/plugins/mysql_ mysql_innodb_log ln -s /usr/share/munin/plugins/mysql_ mysql_innodb_rows ln -s /usr/share/munin/plugins/mysql_ mysql_innodb_semaphores ln -s /usr/share/munin/plugins/mysql_ mysql_innodb_tnx ln -s /usr/share/munin/plugins/mysql_ mysql_myisam_indexes ln -s /usr/share/munin/plugins/mysql_ mysql_network_traffic ln -s /usr/share/munin/plugins/mysql_ mysql_qcache ln -s /usr/share/munin/plugins/mysql_ mysql_qcache_mem ln -s /usr/share/munin/plugins/mysql_ mysql_replication ln -s /usr/share/munin/plugins/mysql_ mysql_select_types ln -s /usr/share/munin/plugins/mysql_ mysql_slow ln -s /usr/share/munin/plugins/mysql_ mysql_sorts ln -s /usr/share/munin/plugins/mysql_ mysql_table_locks ln -s /usr/share/munin/plugins/mysql_ mysql_tmp_tables service munin-node restart
The MySQL graphs:
We want to track the number of and memory use of key processes, as apache2 and mysqld so enable multips and multips_memory, on Crin1:
cd /etc/munin/plugins ln -s /usr/share/munin/plugins/multips ln -s /usr/share/munin/plugins/multips_memory
Add the following to /etc/munin/plugin-conf.d/munin-node:
[multips] env.names apache2 mysqld [multips_memory] env.names apache2 mysqld
On Crin2:
[multips] env.names php5-fpm java nginx [multips_memory] env.names php5-fpm java nginx
And the graphs:
- https://munin.crin.org/munin/crin.org/crin1.crin.org/multips.html
- https://munin.crin.org/munin/crin.org/crin1.crin.org/multips_memory.html
- https://munin.crin.org/munin/crin.org/crin2.crin.org/multips.html
- https://munin.crin.org/munin/crin.org/crin2.crin.org/multips_memory.html
That is probably enough plugin configuration for now.
comment:6 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0 to 0.14
- Resolution set to fixed
- Status changed from new to closed
- Total Hours changed from 3.38 to 3.52
comment:7 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0 to 0.25
- Total Hours changed from 3.52 to 3.77
Forgot to add Nginx graphs for Crin2, so on that machine:
cd /etc/munin/plugins ln -s /usr/share/munin/plugins/nginx_status ln -s /usr/share/munin/plugins/nginx_request munin-run nginx_status total.value U reading.value U writing.value U waiting.value U munin-run nginx_request request.value U
Reading the scripts we have:
This shows the default configuration of this plugin. You can override the status URL. [nginx*] env.url http://localhost/nginx_status Nginx must also be configured. Firstly the stub-status module must be compiled, and secondly it must be configured like this: server { listen 127.0.0.1; server_name localhost; location /nginx_status { stub_status on; access_log off; allow 127.0.0.1; deny all; } }
So /etc/nginx/sites-available/localhost was created containing the above.
It was symlinked and tested:
cd /etc/nginx/sites-enabled/ ln -s ../sites-available/localhost 30-localhost service nginx configtest [ ok ] Testing nginx configuration:. service nginx restart lynx -dump http://localhost/nginx_status Active connections: 1 server accepts handled requests 1 1 1 Reading: 0 Writing: 1 Waiting: 0 cd /etc/munin/plugins/ munin-run nginx_request request.value 3 munin-run nginx_status total.value 1 reading.value 0 writing.value 1 waiting.value 0 service munin-node restart
It's working so we should soon have Nginx stats here:
comment:8 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0 to 0.4
- Total Hours changed from 3.77 to 4.17
Install the memcache plugin, on Crin2:
cd /etc/munin/plugins ln -s /usr/share/munin/plugins/memcached_ memcached_rates ln -s /usr/share/munin/plugins/memcached_ memcached_bytes ln -s /usr/share/munin/plugins/memcached_ memcached_counters munin-run memcached_rates no (Cache::Memcached not found) aptitude install libcache-memcached-perl munin-run memcached_rates memcache_cache_hits.value 41588 memcache_cache_misses.value 3746 memcache_cmd_get.value 45334 memcache_cmd_set.value 11077 memcache_total_connections.value 197 memcache_total_items.value 11077 memcached_bytes memcache_bytes_read.value 15691427 memcache_bytes_written.value 49907613 memcached_counters memcache_bytes_allocated.value 1931023 memcache_curr_connections.value 5 memcache_curr_items.value 947 service munin-node restart
So we should soon have memcache stats.
Next php-fpm, there seem to be several options available, trying tjstein's:
mkdir -p /usr/local/share/munin/plugins cd /usr/local/share/munin/plugins git clone git://github.com/tjstein/php5-fpm-munin-plugins.git Cloning into 'php5-fpm-munin-plugins'... remote: Counting objects: 191, done. remote: Total 191 (delta 0), reused 0 (delta 0), pack-reused 191 Receiving objects: 100% (191/191), 25.50 KiB | 0 bytes/s, done. Resolving deltas: 100% (105/105), done. Checking connectivity... done. chmod +x php5-fpm-munin-plugins/phpfpm_* chown -R root:root php5-fpm-munin-plugins/ cd /etc/munin/plugins/ ln -s /usr/local/share/munin/plugins/php5-fpm-munin-plugins/phpfpm_average ln -s /usr/local/share/munin/plugins/php5-fpm-munin-plugins/phpfpm_connections ln -s /usr/local/share/munin/plugins/php5-fpm-munin-plugins/phpfpm_memory ln -s /usr/local/share/munin/plugins/php5-fpm-munin-plugins/phpfpm_status ln -s /usr/local/share/munin/plugins/php5-fpm-munin-plugins/phpfpm_processes
Edit /etc/php5/fpm/pool.d/www.conf changing:
;pm.status_path = /status pm.status_path = /status
Edit /etc/nginx/sites-available/localhost adding:
location ~ ^/(status|ping)$ { fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_intercept_errors on; include fastcgi_params; access_log off; allow 127.0.0.1; deny all; }
Test and restart services:
service nginx configtest [ ok ] Testing nginx configuration:. service php5-fpm restart service munin-node restart
Test:
lynx -dump http://localhost/status pool: www process manager: dynamic start time: 13/May/2015:22:46:56 +0000 start since: 375 accepted conn: 1 listen queue: 0 max listen queue: 0 listen queue len: 0 idle processes: 1 active processes: 1 total processes: 2 max active processes: 1 max children reached: 0 slow requests: 0
So, the new stats:
- https://munin.crin.org/munin/crin.org/crin2.crin.org/index.html#memcache
- https://munin.crin.org/munin/crin.org/crin2.crin.org/index.html#nginx
- https://munin.crin.org/munin/crin.org/crin2.crin.org/index.html#php
I'm sorry this has taken longer to set up than estimated, but it should pay off in the long run because by being able to monitor the state of the MySQL, Nginx, php-fpm and Memcache processes will enable the tuning of applications better to ensure that the amount of RAM needed is as little as possible to deliver the site as fast as required -- the new servers currently have around 50% of the RAM of the current, live GreenQloud servers.
comment:9 Changed 3 years ago by chris
The php-fpm munin plugins needed editing like this:
#PHP_BIN=${phpbin-"php5-fpm"} PHP_BIN=${phpbin-"php-fpm"}
comment:10 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0 to 0.05
- Total Hours changed from 4.17 to 4.22
Munin email alerts were enabled by adding the following to /etc/munin/munin.conf:
contact.me.command mail -s "${var:host} Munin Alert" root@localhost contact.me.always_send warning critical
comment:11 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0 to 0.05
- Total Hours changed from 4.22 to 4.27
PHP-APC Munin plugins were installed on ticket:6#comment:19 and the php_apc_purge graph was generating an email every 5 mins:
Date: Thu, 14 May 2015 21:20:16 +0000 From: munin application user <munin@crin1.crin.org> To: root@localhost Subject: crin2.crin.org Munin Alert crin.org :: crin2.crin.org :: Purge rate WARNINGs: Optcode Cache is 100.00 (outside range [:10]). OKs: User Cache is 0.00.
So it was disabled:
cd /etc/munin/plugins rm php_apc_purge service munin-node restart
Install the client and server on Crin1:
On Crin2:
A sub-domain, munin.crin.org was added to the DNS servers.
An Apache config files was created at /etc/apache2/sites-available/munin.conf containing:
Enabled, tested and restarted:
And at https://munin.crin1.crin.org/munin we have:
So:
Next to configure the Munin server and clients, on Crin1 edit /etc/munin/munin.conf and add:
And we now have some stats at https://munin.crin.org/munin/crin.org/crin1.crin.org/index.html
On Crin2 open the firewall for Munin, add the following to /etc/iptables/rules.v4:
Reload firewall:
On Crin2 edit /etc/munin/munin-node.conf and add:
Restart the client:
However Crin2 isn't showing up at https://munin.crin.org/munin/ and also the graphs that are generated for Crin1 have numbers but no graphical representation so some more work is needed on this, in addition to the enabling of modules for things like MySQL.