Hyperic Agent / Server Time Issues
I’ve been playing with a variety of network management systems for a month or so now and I’ve decided that Hyperic is the best for my use. It might not be the most detailed / configurable, but it’s “good enough” and sports the ability to install the agent and have the agent automatically detect most of the monitoring metrics I want. Anyway, I added my 5th server to it once I’d figured out the configuration that I was going to use for everything, only to find that this 5th server wasn’t appearing on my Dashboard metrics. About 4 hours later, it also stopped reporting metrics in the indicator panel of the platform. This was weird, given that the host was still up, I could see everything on it, and the Hyperic agent reported that it was working just fine (through the status command).
I went to the live metrics, and when I ran top it ran. Except then I noticed the time on it was 30 minutes out. My first thought was that the server was being stupid and caching the data before putting it into the the database ready for extraction and display. But then when I ran top on the host that the agent was installed on, the time was the same — the host actually had the wrong time. I realised that I hadn’t set NTP to run on that machine. As soon as I ran it and upated the time, the platform appeared in the dashboard metrics and the indicators started going green again. There was, however, 6 periods which are blank in the indicators as the time jumped. So, the lesson is, Hyperic will use the time that the Agent reports from the host it is installed upon, it does not time conversion. This also affects what is displayed on the dashboard metrics — it must only display what is happening “now”, so that if one of the platforms that a metric is on is reporting that it is a different time, it is not included as being “now” and thus is not shown. Good to know, even if I did find out by being frustrated (what’s new in system administration).