Because I use Splunk to track the logs on my home server, I have setup some reports that show me the level of errors relative to the total log lines that allows me to notice trends. One file that has cropped up a lot is the /var/munin/munin-update.log file. This is the file that I have my Munin master logging to. The particular error that keeps cropping up is:
Jan 03 20:20:44 [3622] – Client reported timeout in fetching of cpu_tmp_sensors
In Munin this is realised by a broken graph viz:
So what is happening is that the munin node is timing out the response from the plugin, and then passing the timeout response on to the munin master. I couldn’t actually find any documentation on this timeout amount to even see what the default was except by looking in the source code itself.
Some Googling did reveal that I’m not the only person to have noticed this though. It is reported that if you add the keyword “timeout 60″ (or whatever value you want in seconds) then Munin will use this as a Global default timeout for the plugins. It is also reported that if you place this in the scope of your plugin configuration in /etc/munin/plugin-conf.d/<your plugin config file> like this:
[myplugin]
timeout 60
user root
That it will then only apply the timeout value to that plugin. It makes sense. It didn’t help me solve the problem with my CPU temp sensor, but it’s still useful to know what is going on behind the scenes.
