I recently came into possession of a brand new Dell 56k Modem. These are the little external USB modems they ship with newer laptops when you request a modem, because the laptops themselves don’t come with one built-in. On the back of the modem the model specifics are labeled as “Conexant RD02-D400, REV A03, NW147″. Conexant are a chip maker for modems.

When I first plugged the modem into my Mac, it detected as a modem, but it didn’t know what it was (by default it was set to an Apple modem), and it powers the modem (one of the lighs on the modem is for power, the other is for a connection). I dug around on Google for some information on configuring it with Mac OS X. I found a post from 2005 describing a manufacturer called Zoom that uses Conexant chipsets in their modems. Zoom *is* on the Apple modem list. Zoom even have an installation instruction manual for Apple OS X Leopard on their support site! I wonder if it’s coincidental that the Dell modem looks very similar to the Zoom 3095 modem? Hmm…At least it works now though!

Written on January 27th, 2009 , Informative

Monit is used for tracking service availability and taking action when services are detected to have failed. One action it takes is sending an email. Of course, you can configure the options of the message sent, the subject, and the apparant sender. Now, the manual for monit isn’t quite correct on this it seems. I copied the manual section for mail-format syntax into my monitrc file, only to have monit fail and tell me there was a syntax error on the mail-format!

It seems the problem is that mail-format may not use line breaks without having a command. The manual has a format similar to the following, there there is a break between the opening bracket and the first command:

set mail-format {
    from: monit@my.host.com
    subject: $SERVICE $EVENT at $DATE
    message: Monit $ACTION $SERVICE at $DATE on $HOST,
        $DESCRIPTION

    Yours sincerely,
     Monit @ $HOST
}

This fails on a syntax error. Removing the line break like this:

set mail-format { from: monit@my.host.com
    subject: $SERVICE $EVENT at $DATE
    message: Monit $ACTION $SERVICE at $DATE on $HOST,
        $DESCRIPTION

    Yours sincerely,
     Monit @ $HOST
}

Makes it work (tah-dah!).

On a another note, I think this blog has had too many technical articles of late. I’ll try to post something not related to computers soon!

Written on January 21st, 2009 , Informative

I broke my Splunk server by accidentally deleting the Linux libc6 libraries, and it ended up being easier to install FreeBSD than to fix it. I wasn’t going to reinstall Linux…with Splunk available for FreeBSD, why would I do that? ;)

The install went fairly well after installing the compat6x port. Splunk say that the software works with 6.0 “or higher”. Thats a white lie. It doesn’t work natively on 7.0 (yet).

Anyway I got it running, started configuring it, and all seemed sweet. Had it index a bit of data and what not all good. When I started adding in the sources from my other servers, things went weird. Ok there was about 600M of logs, but I black listed well over half that. I figured I might go over the 500M limit of the free licence while importing everything but oh well. Anyway the server kept churning at 100% CPU usage for about 5 hours. This took me over midnight, which allowed me to see how much data had been indexed. Apparently it indexed 8G, which is really weird when there is not 8G of log files.

I tried fine tuning the black list to remove more files and limiting the time stamp information that was getting collected, but I couldn’t make Splunk finish indexing. I knew it was indexing because that’s the only thing other than searching that really ramps up the CPU usage in Splunk…it’s a fairly single minded application. I watched the splunkd log file for a while and couldn’t see anything too wrong. I wound up editing splunk/etc/system/log.cfg and setting category.FileInputTracker=WARN to info like I’d done before to see what phantom files it was indexing. It turns out there were 2 files it was getting stuck on. One was the original Debian installer log, which is about 2300 lines or so. The other was the syslog.0 file, which was about 5000 lines. What it looks like in the splunkd.log is this (the change in CRC on the splunkd.log file is because of all the info being pumped out to it):

01-19-2009 06:26:17.906 INFO  FileInputTracker – Computing CRC for seekPtr=5d188000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:17.911 INFO  FileInputTracker – Computing CRC for seekPtr=939298 filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:17.982 INFO  FileInputTracker – Computing CRC for seekPtr=5d190000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:17.988 INFO  FileInputTracker – Computing CRC for seekPtr=93939d filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:18.060 INFO  FileInputTracker – Computing CRC for seekPtr=5d198000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:18.068 INFO  FileInputTracker – Computing CRC for seekPtr=9394a2 filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:18.139 INFO  FileInputTracker – Computing CRC for seekPtr=5d1a0000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:18.145 INFO  FileInputTracker – Computing CRC for seekPtr=9395a7 filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:18.221 INFO  FileInputTracker – Computing CRC for seekPtr=5d1a8000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:18.226 INFO  FileInputTracker – Computing CRC for seekPtr=9396ac filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:18.295 INFO  FileInputTracker – Computing CRC for seekPtr=5d1b0000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:18.300 INFO  FileInputTracker – Computing CRC for seekPtr=9397b1 filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:18.371 INFO  FileInputTracker – Computing CRC for seekPtr=5d1b8000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:18.376 INFO  FileInputTracker – Computing CRC for seekPtr=9398b6 filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:18.446 INFO  FileInputTracker – Computing CRC for seekPtr=5d1c0000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:18.451 INFO  FileInputTracker – Computing CRC for seekPtr=9399bb filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:18.523 INFO  FileInputTracker – Computing CRC for seekPtr=5d1c8000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:18.529 INFO  FileInputTracker – Computing CRC for seekPtr=939ac0 filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:18.599 INFO  FileInputTracker – Computing CRC for seekPtr=5d1d0000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:18.604 INFO  FileInputTracker – Computing CRC for seekPtr=939bc5 filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:18.686 INFO  FileInputTracker – Computing CRC for seekPtr=5d1d8000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:18.696 INFO  FileInputTracker – Computing CRC for seekPtr=939cca filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log
01-19-2009 06:26:18.784 INFO  FileInputTracker – Computing CRC for seekPtr=5d1e0000 filename=/mnt/spanky_log/syslog.0
01-19-2009 06:26:18.790 INFO  FileInputTracker – Computing CRC for seekPtr=939dcf filename=/usr/local/splunk/splunk/var/log/splunk/splunkd.log

And it did that pretty much ad infinitum. The seekPtr DID go up, but it never completes the file. How do I know it wasn’t just taking it’s time? Because it did a whole bunch of files larger than 5000 lines each in about 4 seconds. An hour for the syslog file didn’t make sense. I deleted the syslog.0 file, so I don’t know wha the deal was there. I did keep the Debian installer log file that did the same thing, but I’m not in touch with any of the developers so it’ll probably just sit on my HDD. At least on this blog Google will pick it up for other people to know! By the way, don’t forget to change log.cfg and set category.FileInputTracker back to WARN else you will fill up your splunkd log file with self-replicating entries!

Oh yeah, another note, in FreeBSD 7.0, Splunkd will not show up with the correct CPU usage. It will say 1 to 9% on mine, but the system says there is 99% in user. if you run:

top -IS

You will see only the processes using CPU. I know splunkd isn’t displayed properly because it’s the only process displayed as running when there is 99% user!

EDIT 26th-Jan: After running Splunk over the weekend and watching the issues, I’ve discovered that it DOES come out of the loop eventually, it just re-reads the file many times. Case in point, I had a test Icecast server turned on, but it wasn’t doing anything, so the /var/log/icecast/stats.log file was 606 bytes…Splunk had indexed 202 MEGABYTES of it (found by looking at the index stats dashboard plugin). I’m also exceeding my quota many times because of these problems that crop up…the easiest way I can see to avoid them is still just to keep an eye on the index and just delete them as you see them.

Written on January 19th, 2009 , Informative

I use SSHFS which uses FUSE. On FreeBSD, I generally compile core components with ports. On attempting to compile fusefs-sshfs, I got an error with fusefs-kmod:

fusefs-kmod  requires the userland sources to be installed. Set SRC_BASE if it is not in /usr/src

Actually the error is fusefs-kmod-xxxxxx for the version. It’s a pretty well known error. I figured it needed the kernel sources. I was wrong (it actually says kernel source if you’re missing that). What it wants, is some of the source for the FreeBSD software. The next question was which software does it want the source for? The answer is “mount” which is in “sbin”. So if you’re getting this error, do a sysinstall and install the source packages for base, sys and sbin and you’ll be right. No need to install ALL userland source like the message implies!

Written on January 18th, 2009 , Informative

I’ve recently been playing with Linux LiveCD’s for a project I have. I was originally using Morphix, but it seems that is so out dated that no resource is ‘complete’. The morphtools kind of worked, but it was incredibly difficult to specify temporary directories to the point where I would have to hack the scripts up directly.

Anyway long story short I did a survey of what else was out there to build Linux LiveCD’s and specifically Debian LiveCD’s. I found that Debian have an official project for this. It’s not as user-friendly as some of the other LiveCD tools perhaps, but I know it will last the test of time because that’s what Debian is good at.

So I’ve been playing around with the live-helper applications, and I stumbled upon an error. Since Google didn’t help me, I thought I’d post it.

If you perform an lh_build in a particular directory, it creates a .stage directory. Because this is a dot (.) file, it doesn’t show up in the listing. There are a variety of files in this directory that store the current status of the build. If something happens in the build, the status may not be cleared properly, and it ‘locks’ the build. This means if you try to perform the build again, you get something like this happening:

$ sudo lh_build
P: Begin caching bootstrap stage…
P: Begin bootstrapping system…
W: skipping bootstrap
P: Begin caching bootstrap stage…
W: skipping bootstrap_cache.save
P: Begin caching chroot stage…
P: Begin mounting /dev/pts…
E: system locked

Note that it says the system is locked. The solution is to delete the .stage directory. I didn’t bother examining the contents of the directory because all I was interested in was keeping the cache. Now Google will have something on this error if anyone else has the problem though.

Written on January 12th, 2009 , Informative

In my professional capacity I have a client who is hosting a site on a VPS. The site is basically structured as a large blog / forum and it receives many (in the millions) page views each day. The database is approximately 200MB. I originally had the site hosted ona 256MB VMWare VI3 VPS. Needless to say it thrashed the virtual memory because there was not enough RAM. RAM on a VMWare VPS (or any VPS) in Australia is expensive, so I shifted it off site and trebled the RAM: 768MB with Linode. The site has continued to run into problems with the load scaling way above what it should given that there is so much CPU and Memory free: The load consistenly reaches up to 2. It’s a quad core server so in theory anything up to 4 is “fine”, but if every site this size put this level of hardware up to that load then I think server prodivders would be making a lot more money than they do!

After working on some home-server projects and re-analysing the performance metrics I have of the server / site, I’ve come to the conclusion that the server is I/O bound on the database site. MyTop shows lots of sleeping processes: They can’t be waiting on CPU because the CPU is 80% idle, and the memory has 200M free. The I/O is operating at about 6MB/s read and 2MB/s write. That’s seriously low for even a consumer SATA drive. I really expected better from such a large Linode plan. It seems I’m not the only person to have had this problem on Linode.

I have (again in the professional sense) some sites on a server through Slicehost and I’ve not run into I/O problems. That said, I don’t have any sites on the Slicehost servers of this magnitude. I’m wondering if it will be just as bad. If it is, then I/O may become a deterrent for me to use VPS. I wonder how many other people have had to make the shift from virtual to dedicated because of I/O bottlenecks? I’m hoping not, judging by my own internal tests and what some other people are saying.

While I’m on the topic of Slicehost, I have to say, I’m REALLY impressed with their customer service. I’ve only had a couple of problems since I’ve been dealing with them (a couple of years now) and they have always gone way beyond what I would consider their duty of care in order to help me. Compare that the Linode (about a year of service): Linode are helpful, but they don’t give any support out of the ordinary. If you’re asking questions about something they don’t officially support, then they don’t try to help. Slicehost does :D

Written on January 6th, 2009 , Informative

TorrentFlux (actually TorrentFlux B4rt is what I’m using) is a PHP / AJAX Bitorrent front end that uses the Java Azereus BT client to actually connect into the network. I use it because this way I can run BT on one of my little home servers that is on 24/7 rather than having to leave my huge desktop computer on 24/7 or using my laptop for BT.

One problem I’ve always had with TorrentFlux / Azereus is the amount of memory it consumes. I have a feeling that is a side effect of using a Java BT client: There is probably a lot of connection table caching going on. Don’t have a solution to that one (yet). Another problem I’ve had with it is that it’s /slow/ compared to normal desktop BT clients. Sure, I expect it to be a little slower but I mean it’s slllllloooooowww. When adding a new file it takes 2 – 3 minutes to download it and another 90 seconds to sart it (it says “processing” for that time). My original solution to this was to improve the performance of the database: I shifted from using SQLite to using MySQL. I found that bogged down the server I run it on too much, so when I rolled out a dedicated MySQL box, I moved the database over to that. Well, now it didn’t bog down the server with the BT client; it bogged down the MySQL server. The weird thing is, with the performance monitoring data I had, I couldn’t figure out where the slow down was coming from. The CPU wasn’t tapping out, nor the memory. What I found was that hte I/O was thrashing.

I ran MyTop for a little while to examine the queries on the TorrentFlux database and discovered that the database gets hammered with status updates about the files that are in the queue. The specific queries are always of the form:

SELECT xxxx FROM tf_log WHERE file=’xxxx’

So obviousl I checked the details of the tf_log table. It’s a huge table holding hundreds of thousands of records an almost a hundred meg. I checked the indexes and suprisingly found that file was not one of the indexes. No wonder it’s so slow and the I/O is thrashing; every time a WHERE query is performed it’s having to do table scans! What I did was create an INDEX for the file column in the tf_log and now my TorrentFlux is about 5x faster when adding / removing / pausing / resuming downloads. The exact query you would need to issue to MySQL would be:

CREATE INDEX ON tf_log (file(20));

I’m limiting the index to 20 characters of the file name because I don’t generally have many files with similar names in my list. This way the index is kept relatively small compared to the full filename size but still having a high probablity of being unique i.e. more efficient.

Written on January 4th, 2009 , Informative

Because I use Splunk to track the logs on my home server, I have setup some reports that show me the level of errors relative to the total log lines that allows me to notice trends. One file that has cropped up a lot is the /var/munin/munin-update.log file. This is the file that I have my Munin master logging to. The particular error that keeps cropping up is:

Jan 03 20:20:44 [3622] – Client reported timeout in fetching of cpu_tmp_sensors

In Munin this is realised by a broken graph viz:

So what is happening is that the munin node is timing out the response from the plugin, and then passing the timeout response on to the munin master. I couldn’t actually find any documentation on this timeout amount to even see what the default was except by looking in the source code itself.

Some Googling did reveal that I’m not the only person to have noticed this though. It is reported that if you add the keyword “timeout 60″ (or whatever value you want in seconds) then Munin will use this as a Global default timeout for the plugins. It is also reported that if you place this in the scope of your plugin configuration in /etc/munin/plugin-conf.d/<your plugin config file> like this:

[myplugin]
timeout 60
user root

That it will then only apply the timeout value to that plugin. It makes sense. It didn’t help me solve the problem with my CPU temp sensor, but it’s still useful to know what is going on behind the scenes.

Written on January 4th, 2009 , Informative

My Splunk version 3.3.1 seems to have been having some issues with my SSHFS mounts – actually it was an underlying file system problem – so I decided to update to 3.4. Version 3.4.3 is out now, so I figure that the bugs in the first 3.4 release should have been fixed.

The major new features touted in 3.4 are the Windows compatbility and the addition of a light weight forwarding application. If you’re using the free license version of splunk then the forwarding application is rather meaningless anyway though, because the free license does not permit ingress of Splunk data from other servers. So in theory it’s a relatively meaningless update feature wise, I was simply hoping they had matured and optimized the code base.

The verdict? Well it looks like they have. I’ve only been running the updated version for about 12 hours, but it’s sitting on 20% less virtual memory and about 10% more real memory. What does that mean? Well the amount of memory that was commited to Splunk used to start off at about 600M and rise until plateuing at just over 1G. Committed memory is memory allocation requested by the application that is not necesserily used by the application. If it isn’t actually used by the application then the memory system can allocate more memory elsewhere. The general idea (hope) is that if the committed memory is actually called upon, then the kernel will be able to free up real memory elsewhere in order to resposnd to that real request.

In my personal opinion I think such large commitment of memory is silly. The Splunk application (3.3) was requesting 1G of memory, but only using 200M. Wha-? Committing 5x the amount of actual memory consumed?? It’s not the first time I’ve seen it, but it is the first time I’ve seen it in a (on my configuration) single threaded server application.

So now I have 220M of real memory in use by Splunk, which is fine, I guess it needs it and is doing something useful with it. It also has requested 800M of memory, so it’s still requesting just under 4x what it is using, but hey, it’s better than before! I wonder if they tuned the memory usage or just tuned something else that incidentally resulted in better memory usage…

Note that I haven’t changed anything in my configuration, I’ve just upgraded the Debian package and restarted Splunk and left it running for a while. The memory usage isn’t rising after the initial start so it seems to have stablilzed.

Oh by the way the file system problem was a result of SSHFS failing the SSH connection and not reconnecting correctly. Actually I already knew about that and I had crontab remounting the file systems every half an hour. Of course, with Splunk reading off the file system, they weren’t unmounting properly, which was also causing the remount to fail (-o remount does not seem to work with SSHFS). The solution was just to do a lazy unmount which allows the remount to work correctly (unmount -l /mnt/xxxx).

Written on January 3rd, 2009 , Informative

Munin is a really nice light weight record keeper for pretty much anything. It stores the values of anything and generates graphs for those values and leaves interpretation up to us (the viewer). It’s primarily designed for server monitoring and it’s great because the abstractness of the program allows us to monitor hardware, software, and user activity as well as anything else we can think of !

The trick is that we have to write a plugin for Munin to understand what we want it to do. I haven’t actually written any myself, but I have modified others to tweak what they do. I installed an iostat plugin for Munin FreeBSD (it doesn’t come with one in FBSD for the iostat) but was having a hard time getting it to work. It worked on the command line and all but it wouldn’t show the graph. Yes I had restarted the node program but it still wasn’t showing up. I even went to the point of upgrading my munin-node on the server.

The problem? The iostat plugin was not owned by root:wheel and set to 755. I.e. the permissions were wrong and it was not executable. Munin appears to run ./plugin rather than $PERL ./plugin. A little annoying as long as you know about this; a lot annoying if you don’t! Changing the persmissions resulted in it showing up on the graph after 2 polling cycles.

Written on December 22nd, 2008 , Informative

SirSpanky.com – The Secret Diary of James Pearce Aged 20-Something is proudly powered by WordPress and the Theme Adventure by Eric Schwarz
Entries (RSS) and Comments (RSS).

SirSpanky.com – The Secret Diary of James Pearce Aged 20-Something

Personal jorunal of a professional geek – James Pearce in Perth, Australia