« January 2005 | Main | March 2005 »

February 19, 2005

Some things you learn or discover

During the last 48 hours a lot of things have happened and I dealt with a lot of hardware. I took some notes for future reference.

First, always take a camera with you. Take pictures of everything, and then catalog them. You can always go back and see how things where inside your servers or around them. Don't remember if a ethernet cable is connected to the onboard ethernet or the external one? Just go and look.

Second, some vendors have some strange notions about accessibility, so be prepared.

Third, forget about boot floppies and CDs, just install a recent linux distribution in a brand new IDE disk, and use that as a rescue system.

Fourth, make sure your backups work, and make sure they are compreensive.

Fifth, if your monitoring system is hosed, don't wait a month or two to put the new one in place.

That's all folks,

Recover day

Yesterday, Feb 18 2005, I had a special day.

I arrived at work in the morning, and I was catching up on email, rss, and reviewing my task list when I get a call from my wife.

We have a e-learning site in Portugal, where we sell online courses. We have several different sites selling those courses, some of them affiliated with portals or job offers sites.

So this site was down, had been for a couple of hours.

First mental note of the day: when your previous monitoring solution is totally trashed, don't take more than a week to setup something else. The previous spong-based monitoring system was dead for quite some time, and the new nagios-based one didn't monitor this site yet: I finished installing it the previous day. So Murphy was clearly awake and at work.

The data-center where we have the server has all the redundant stuff one would like (power, network, the works), and I could log on to the back-end server, where the database was located. Only the front-end server, where all the sites where, was down. So I could rule out external factors.

The servers are connected to a IP KVM from Dell, so I started up the Java console and I selected the front-end server. The message was clear: blah blah blag kernel panic blah blah scsi blah error blah blah you are toasted bla blah.

Yep, this is going to be a fun day.

I go down to the data-center and reboot it. The system is able to fsck the boot disk (IDE disk), but the SCSI data disk is a no go.

So now, I switch to the back-end server and see if my daily backups (rsync with snapshots) are OK. The file count seems right, the logs are ok, no errors reported, so we are in good shape. Or maybe not.

Second mental note of the day: whenever you add a new software server to your solution, make sure your backups cover it.

Some months ago, I added a new web server to the mix, and the config file took some working to fine tune. And no, it was not in the backups.

So recovering the disk is now very important to me.

First, get fresh media. I went out and bought two 80 Gig IDE disk drives. I could use one right now, and then add the second one in a RAID-1 setup.

I got with the two disks to the data-center, I power down, connect one of them to the server and boot up. Time to do some brain surgery. I decided to move all the data from the boot disk and the data disk to this new one. An old 8Gb IDE and a 9Gb SCSI should fit into a 80Gb drive, right?

Well, wrong.

Although the motherboard and BIOS could see all the 80Gb, the Linux could not, only seeing 10Gb worth. This is a RedHat 7.2, and there are probably some parameter to turn on LBA support or something, but I didn't have that kind of time, so I just use the jumpers to limit the disk to 32Gb.

Reboot again, and now I could see the 32Gb.

Time to get the tools. A [friend][4] had used [dd_rescue][1] with great success so I went with that.

I also needed a rescue disk. The server did not have a CD so it must be a floppy. I decided to use [Toms rescue disk][3]. I copied dd_rescue to another floppy.

So boot from the rescue floppy, mount the new disk, partition in a similar way to the others, mount the old IDE, and tar'ed the OS over to the new disk. Edit fstab and make sure everything is ok. So far so good.

Now start dd_rescue, copying /dev/sda1 to a file in the new disk. Later we will run fsck on it, and mount it with a loop device.

Invalid instruction.

Ok, this is a P3 server, and I compiled dd_rescue on a P4, so my bad. Recompile on a P3. Copy to floppy, copy from floppy. Sneaker net at his best.

Run it again, and it starts to do his job. It should take little time...

... and it's over. My 9Gb disk drive is now in a file, with only 6kb missing. Wow! Thats great, I now have a 9Gb file in the new drive with only 6Kb missing. Let me just confirm all the files are ok...

ls -la

hmms... that sda1.img is smaller than I though... 2147483648 bytes? That number looks familiar... Doh! 2Gb limit! Arrgs. The rescue floppy is a 2.0 kernel...

So now I need a decent rescue alternative. Well, the OS is is already in the new disk, so let's boot from that one and finish the job.

Disconnect old drive, connect new drive with OS as primary, keep SCSI connected. Boot from rescue disk, mount root filesystem on new drive, chroot to it, run lilo, remove floppy, reboot.

The system starts cleanly, cool. So the first tar part went very well, and my fstab editing was also good. Well, thanks for that.

By now, I add found a faster alternative to dd_rescue, a shell-based front-end named [dd_rhelp][2].

So I compiled dd_rescue. Went well. Then dd_rhelp. Nooo, you need newer version os autoconf! Download autoconf, configure, make, make install. Recompile. success.

Now try to run dd_rhelp. Syntax errors all over... So, like, you shell, you know, it's too old... wget bash, configure, yada yada yada...

I run it again and it works! And it should be faster. Also, it has prettier graphics... :)

ls -la

Something about a value to big for something else... du cannot see the file either.

So this glibc can create the file, but you cannot stat it, and you cannot mount it...

Let me recap: we now have a 9Gb file in the new disk will almost all of the content from the old disk, but I cannot work with it.

And I have an extra 80Gb disk drive next to me, free.

Get the Fedora network install CDs, get another computer with a CD, plug second drive, insert CD, answer couple questions, and now we have a bootable fedora core 3 server install, our new rescue disk.

Connect new Fedora as boot device, move the new "32Gb" disk to other IDE channel, boot.

Ahhs, there he is, my 9Gb image is finally visible... losetup and fsck /dev/loop0. He goes and does he's thing, and reports 6 (6!) files missing, and those files are generated files, rsync'ed from the back-office. So no data loss at all. First good news (after 8 hours work).

Mount loopback device, and rsync the data over. Power down, remove Fedora, reconnect main disk to primary channel.

Reboot.

All is well, the sites come back alive. It's now 23pm, some 10 hours later. I'll cleanup the war zone tomorrow (today).

Found it!

In case you're wondering why this server was off the network from time to time, well, the poor thing was crashing. On a regular basis. Like a religion.

I made all the upgrades, I tried some tips from friends, but it kept crashing.

But now I know. Bad memory.

Well, I've removed the bad memory (at least I hope I did remove the right one...) so it should be more stable now.

BTW, memtest86 is available directly from any Fedora boot CD. Just type memtest86 at the prompt.

February 16, 2005

St. Gadget Day

Saturday was St. Gadget Day. I went to Porto to rent my old apartment. I was done earlier than I though, and by that time I got an SMS from Rui saying that he saw a Mac mini at a local store in Lisboa.

So I went to the local FNAC to see if I could spot one (first excuse). I also needed a new phone (second excuse) because my T68i had died that morning. The On/Off/No key did not work anymore. I had already talked to Rui about this, and settled on a K700i (also from Sony Ericsson).

So there I am, at FNAC. No Mac mini was found, but I saw the new (by portuguese standards) DSC-L1, a 4.1MegaPixels Cybershot. My previous older P1 was stolen some months ago and I was shopping around for a new one for quite some time.

The K700i was not available at that particular store, but the L1 was very nice. So I'm now the proud owner of a DSC-L1 Cybershot.

It's very small, that was my first impression. It fits neatly in the palm of my hand. Picture quality is very good for my average consumer standards. My wife as a 5MegaPixel Cybershoot, I don't remember the model, and I never liked it because it had to many controls and menus. This one is much simpler to use, which is nice to me. I read the manual from cover to cover, and I can memorize some things, but I don't use it everyday, so I tend to forget about a lot of them. It becomes very important to have a simple interface.

Anyway, I was a couple hundred Euros lighter (DSC, plus a Memory Stick Duo Pro 256Mb; I also bought a Griffin-Technology iSqueeze), and still no phone.

So I went to another FNAC, in Gaia. And yep, they had it. And yep, I have it too now.

It's a very nice phone. Nice and bright display, much improved over the T630 that I got at work (that I passed on to my wife). The camera (not that I use it much) is VGA quality (you can check the usual "Dead or Alive" pic), and it's good enough. Sending MMS is very nice, because it sends them in the background, freeing the phone for other stuff.

The phone is not tied to any operator so I'll be able to use it in the future (I'm changing networks in a month or two, more on that later), and that also means that I don't get any stupid locked icons and backgrounds. I was able to configure it with my home-page in no time, and I was able to sync with my Powerbook with no problems. 10 minutes later I add all my contacts inside the phone. iSync totally rocks.

One thing new for me: I can use Bluetooth File Exchange (a Apple app) to browse the content of the phone and get the pictures. Very nice. I was not able to do this with my T68i, not that I needed though. But it's nice to browse all the stuff inside the phone.

I haven't tried GPRS with the Mac yet, but there is no rush (famous last words).

The games that come with the phone (not that important) are nice. I was impressed with the quality of a tennis game that was preloaded. It's much better than Virtua Tennis that I got in my Nintendo SP. Impressive stuff.

Pairing the bluetooth HBH-600 headset was also trivial, but I haven't got the setup exactly as I wanted. I can't get the headset to be recognized all the time.

So, at the end of the day, I add a new camera and a new phone. Plus minor gadgets... It was a good day.

As I was leaving the store, I remembered that this particular FNAC has the Apple stuff in the first floor. Well, I still had a couple of minutes available, so I climbed the stairs.

And there it was, a 1.2Ghz Mac mini. What can I say? It's really really good looking piece of hardware. I took it in my hand (singular really), it's really light.

So yeah, Nuno lost the bet (I can't find a reference to it in his site, though...) and he owes me lunch.

February 05, 2005

Using mairix with Mail.app

I was reading Sam Tregar post in use.perl regarding mairix and I though: "that would be cool with Mail.app and my bazillion folders...". At least until Spotlight gets here...

So after 15 minutes trying to understand how Mail.app stores the IMAP messages locally (it stores them in a mh folder, at least similar enough for mairix), I did a small .mairixc, set the output as mbox into a newly created folder, and pointed mairix to a incoming IMAP folder.

Run it once without parameters to index all the stuff, and run it again with a string to query the database.

Matched 82 messages

Whoa! Already? :) Jump back into Mail.app, look at the folder created and lo and behold, they where there.

Very cool!

So, the steps you need to take (this is a beta how-to, ok? the steps will get better):

  1. Download and install mairix: it compiles cleanly in my Mac, but the install failed for some reason, just copy the binary to someplace in your $PATH;
  2. Create a Results folder on you Local Mac. It must be created in the Local Mac.
  3. Use my modified .mairixrc and adjust the following variables:
    1. base: the base where all your Mail.app info is stored, usually /Users/**your_short_name**/Library/Mail;
    2. Comment all the lines starting with maildir and mbox;
    3. mh should be the path of one of your folders in a IMAP account, something like IMAP-**name_of_account**/INBOX/Sent.imapmbox/CachedMessages. One way to find this out is to run this command (find ~/Library/Mail -type d -name CachedMessages) from the terminal. You can put as many lines as you want;
    4. mfolder should be changed to point to your results folder that you created above. If the name is Results, mfolder should point to Mailboxes/Results.mbox/mbox;
    5. Remove the comment in the mformat=mbox line: this specifies the result folder as a mbox-style folder;
    6. Point database to a place in your home directory: mine is /Users/melo/Library/Mail/mairix_database;
  4. run mairix once. It indexes all the folders you specified in step 3;
  5. now query the created database: mairix some_string - it should display the number of messages found. Jump back into Mail.app, check the Results folder and they should be in there.

Don't forget to run mairix from time to time to update the database. Use Cronix or edit a crontab maybe.

Next steps: I only indexed one folder as you can see from the example .mairixrc. I need to write a small perl script to generate my .mairixrc file with all my mailboxes, and do a AppleScript to call the mairix command with the query.

February 03, 2005

Still room to grow

I love distributed systems, and I love reading about it.

Thanks to posts like this, I get plenty or extra reading material. I would like to work there, the kinds of problems they are solving are really what makes my brain wake up in the morning with a smile.

Seatle is a long way from Portugal, though.