« September 2009 | Main | November 2009 »

October 24, 2009

A faster configuration for CPAN::Reporter

The idea of CPAN::Reporter is great: take advantage of all those daily uses of the cpan shell to collect reports from a large network of users.

I tried several times to enable CPAN::Reporter but I always found that it delayed just enough of my workflow that I found it a nuisance. After each test phase, it would start a SMTP connection and send the report. Those 3 or 4 seconds where a bit too much for me.

After a bit of reading I found a good compromise to report my test runs without affecting the performance. The setup is simple: make CPAN::Reporter write the test results to a directory and create a command to send them later.

To set this up, first you install CPAN::Reporter as usual and then you tweak the configuration to store the reports in a directory. My ~/.cpanreporter/config.ini looks like this:

email_from = "Pedro Melo" 
edit_report = no
send_report=unknown:yes fail:yes pass:yes na:no

transport=File /Users/melo/.cpan/reports

The trick is to use the File transport. You configure it with the directory where the test reports will be stored. In my case I choose /Users/melo/.cpan/reports. You need to make sure that directory exists.

From now on, every time you use the cpan shell to install a module, the test reports will be stored in you test report directory.

The final step is sending them. I wrote a simple script to take care of that, that you can find on my scripts repository: x-perl-send-test-reports (download).

How and when you run it is up to you. You can run it manually from time to time, or from a cron, or use something like folder actions to monitor the directory and start the script whenever a new file is placed there. I run it manually for now.

My command line to use the script is this:

x-perl-send-test-reports         \
    --from melo@simplicidade.org \
    --server smtp.gmail.com      \
    --transport 'Net::SMTP::TLS
         User melo@simplicidade.org
         Password my_password
         Port 587'               \
    ~/.cpan/reports

So one less excuse not to report your test results. If you are not using CPAN::Reporter, start now.

October 23, 2009

An update on PGP WDE

Since May I've been using PGP Whole Disk Encryption on my laptop and his Time-Machine external drive.

Almost 6 months later I can report that it works great, you don't notice it at all. Strongly recommended, if you need this sort of thing.

But there are no completely secure software-only solutions, and its good to know the limitations, like the "Evil Maid" Attacks on Encrypted Hard Drives.

The comments on the article are also worth a read. There are some proposals in there that might work and defend against this kind of attacks.

October 18, 2009

Just another data point

(Update: I've pushed my code, including three new scripts, to the nfsd_report_bench/ directory on my examples repository. See below for some clarifications based on comments I received).

A former colleague of mine at PT had a small reporting problem, and he ended up comparing several languages for the job: C, Perl, PHP, and Python.

I was curious about the results, so I took the latest version of the Perl script that he was using and set off to work.

The first thing that you should be aware is where your bottleneck is. Take a look at this small script:

#!/usr/bin/env perl

use strict;
use warnings;

my $lines;
while () {
#  my @fields = split / /;
  $lines++;
}
print "$lines\n";

A basic line counter. Compare it to the system wc:

$ gzcat nfsd.gz | time wc -l
 12236390
       16.86 real        16.24 user         0.43 sys
$ gzcat nfsd.gz | time ./wc_simple.pl 
12236390
       10.13 real         8.51 user         0.84 sys

So, a bit faster than the C version, doing 1.2M lines per second on my laptop.

But if you remove the comment on the split(), we have:

$ gzcat nfsd.gz | time ./wc_simple_with_split.pl
12236390
       228.78 real       224.39 user         2.34 sys

A lot less: 53k lines per second.

So the first bottleneck is the split(). Lets improve on that. After some attempts I came up with this:

my $lines;
while () {
  my ($ts)    = /^(\d+)/gc;
  my ($type)  = /(\w)\sV/gc;
  my ($op)    = /\s(\d\d?)\s\w/gc;
  my ($bytes) = /\w\s(\d+)\s/gc;
  $lines++;
}

I make use of the gc flags to start the next match where the previous one ended. I also take advantage of patterns in the lines that I need to match, like the V in the NFS version.

With this version we get:

$ gzcat nfsd.gz | time ./wc_simple_with_regexps.pl
12236390
      109.62 real       107.42 user         1.55 sys

A bit better: 111k lines per second, a bit over 2x the previous result.

If we apply this gain to the reported times for the Perl script (419m11.267s), we get 200m49.97 which places Perl 1st, even above the C version by a minute or so.

My adjusted version of the stats_basic_optimized.pl script is melo_stats_basic.pl.

This was good enough I think, but I wanted to study the I/O gains that could be made. Stop reading here if you are already bored, it will only get worse. We did manage to get a few more minutes back, though... ;)

So I went after the I/O performance. Fist I wanted to rule out the pipe as the bottleneck.

$ cat /dev/zero | pv | cat > /dev/null
4.61GB 0:00:10 [ 498MB/s]

$ gzcat nfsd.gz | pv | cat > /dev/null 
1GB 0:00:04 [ 242MB/s]

So the pipe is not the bottleneck but we will never reach the full speed, gzcat will be our limitation.

I did try to read the gzip file directly into the Perl script and uncompressing it there, but it was very slow.

So assuming a limit of 242MB/s on the input side, how fast are we chopping lines? The size of the input is 1073743224 bytes, and our simple wc_simple.pl (no split, no regexp) took 10.13 seconds above, so we are chopping the input at a rate of 101MB/s.

So there is room to grow there (101MB/s to 242MB/s). I did some experiments:

$ gunzip nfsd.gz
$ time wc_simple.pl < nfsd
12236390

real    1m23.212s
user    0m7.434s
sys 0m2.166s

Yeah, the file doesn't fit in the cache like the gziped version, so there is real I/O, and the times go through the roof.

There is no point doing experiments with the big nfsd file. All of them will result in real I/O, and that is always slower than memory.

Lets try to do bigger reads and parse the results:

use strict;
use warnings;

my $size = shift || 2 ** 20; ## 1Mb default
my $offset = 0;
my $buf = '';
my $lines = 0;

while () {
  my $n = sysread(\*STDIN, $buf, $size, length($buf));

  while ($buf =~ /.+\n/gc) {
    $lines++;
  }
  last unless $n > 0;

  print "$lines $n\n" unless $lines & 0x1ffff;
  $buf = substr($buf, pos($buf));
}
print "$lines\n";

I tried several block sizes but with my OS the most I could read in one call was 64k. So asking 1MB and getting 64k reads we get:

$ gzcat nfsd.gz | time  ./wc_batch.pl
12236390
    8.37 real         7.52 user         0.58 sys

We get 1.46M lines per second, a 17% improvement. Lets adjust to retrieve our fields. The inner loop becomes:

while ($buf =~ /(.+)\n/gc) {
  $_ = $1;
  my ($ts)    = /^(\d+)/gco;
  my ($type)  = /(\w)\sV/gco;
  my ($op)    = /\s(\d\d?)\s\w/gco;
  my ($bytes) = /\w\s(\d+)\s/gco;

  $lines++;
}

and the runtime:

$ gzcat nfsd.gz | time  ./wc_batch_with_regexps.pl
12236390
   95.58 real        93.21 user         1.14 sys

So from 109.62s to 95.58s, 12% better (or comparing with our baseline wc_simple_with_split.pl at 228.78s, 58% better). Adjusting this to the reported results we would go from 419m11.267s down to 175m5.6842s.

I don't think I can improve on this unless we can have a bigger pipe reads. For example, forcing the reads to 8k:

$ gzcat nfsd.gz | time  ./wc_batch.pl 8192
12236390
       12.08 real        10.93 user         0.70 sys

A lot worse compared with the 8.37s we got with 64k reads. So the size of the pipe is the next factor we could explore, if that is even an option with your kernel.

But I'm happy now.

The next day

Or so I though. First there was doubts that the split() was faster than regexps. I wrote bench_splitters.pl (output on my laptop, download link) to compare split with my regexps. The regexps are a bit over twice as fast, but I found big differences between the Mac OS system perl (5.8.8 on my Leopard OS) and the 5.10.1 that I compiled: system perl was between 20 and 30% faster.

The same bench_splitters.pl gives you the max rate of extraction that you can expect from the global script. I also included timing of the bookkeeping parts of the original script. The only noteworthy detail is the fact than when you hit the second level condition, you pay the price of the modulus operator big time. I also think that something is wrong with the input. Those time stamps don't look like normal second-precision time stamps. They are too big. So I don't know if $ts % 3600 is the proper way to group performance by hour.

Second I wrote a max_line_rate.pl (output on my laptop, download link) that gives you the upper bound on the max rate that you can expect while parsing the required fields. You can run this script, and stop it at any point in time with ctrl-c, and it will print a performance report up to that point. Every 128k lines, a single line performance report is also printed.

You can use this max_line_rate.pl to compare your system perl with the 5.10.1 you compiled. I had much better performance with 5.8.8 in this particular application.

Finally I rewrote the statistics script. I did that to deal with the report that my previous version was consuming 7.5Gb of RAM. The reason is simple enough: I don't have access to the original input, only to a six-line excerpt that was posted. Therefore the regexps I use to extract the required fields might fail.

The new script, fast_stats.pl (output on my laptop, download link), is more robust, and should deal with lines that cannot be parsed: it will print the line that couldn't be matched and ignore it. Also: I've included the output of the ps command at the start and end of both fast_stats.pl and max_line_rate.pl to show that the RSS doesn't change that much.

To compare the original stats_basic_optimized.pl with my fast_stats.pl I wrote a small shell script bench.sh (output on my laptop). The nfsd.gz input file was generated with the build_source_file.pl script with the command:

build_source_file.pl 1073741824 | gzip --best > nfsd.gz

The new fast_stats.pl is almost twice as fast as the old one on my laptop.

On a final note (I wasted too much time already on this...), I'm not out to compare Perl with Python or Ruby or even PHPO. But I would like to know how we measure up against C though. The reason is simple: when Perl programmers feel that something is slow, they turn to C, not another scripting language.

This experiment is mostly to show that writing fast perl will sometimes take you to unexpected paths (like regexps beating a split), and that you should benchmark carefully if performance is critical to you.

October 17, 2009

CPAN::Shell 's' command

I'm playing with a new command for the CPAN::Shell: 's' for search on http://search.cpan.org.

It takes a single argument (can be a module, distribution, bundle or author name), checks the CPAN indexes to see which type it is, creates the proper URL for it at search.cpan.org and opens your browser with it.

The last bit, opening a browser with it, is very very immature. Right now it only works on Mac OS X. I'm hopping to get the experience to do it right from the Browser::Open distribution.

If no object is found, sends the user to the generic search interface.

The current hackish implementation can be found on my s_command topic branch (its a topic branch, I will rebase it on occasion onto master).

Browser::Open

I've uploaded a small module to CPAN, Browser::Open (give it a couple of minutes to show up).

It does one simple thing: given a $url, it opens the default browser with it.

The difficult part is deciding how to open the "default browser". On Mac OS X, this is easy: just execute the open command.

On Windows, there is a start command that should do the trick, but I'm not a Windows user so I cannot test this. Any Windows users out there that can point me to the relevant information on how to open a URL with a simple command, I would appreciate it.

For Linux, you have too many choices it seems: you could use gnome-open but your user might be using KDE. There is a xdg-open command described at the FreeDesktop site that seems to do what I want. We can always fallback to firefox though. Fragmentation++!

October 12, 2009

Java 1.6 on Leopard

I did the research on this a month ago and I forgot to write it down, so I just spent another hour doing it again. I should know better by now.

Anyway, you can download the Java 1.6 update for Mac OS X Leopard from the Apple Software site, but its only 64-bit.

I do have a desktop that is 64-biT, unfortunately my Macbook Pro laptop is only a 32-bit Core Duo. Hence, I cannot run the new version of Java.

If you do have a 64-bit CPU, and after you install the update, you might want to switch the default version of Java to 1.6. To do that, the proper way to do that is to run the /Applications/Utilities/Java\ Preferences.app and drag the 1.6 version to the top of the list.

I just wanted to try out Jake... Guess I wont be able to, at least not until I can get my hands on the wife's laptop... hmms... she's sleeping already...

October 08, 2009

Attaching jobs in Gearman

I've used Gearman on and off in the past but for a new project, I've decided to explore some features I rarely made use of previously. Most notably, the unique ID that clients can submit with each job.

Let me just clarify some behaviors for non-background jobs:

  • the worker will execute the job until it finishes even if the client that submitted it dies;
  • if the client dies before the job is passed on to a worker, it will be removed and will never execute.

For background jobs, the behavior is different: a client submits a background job, the gearmand daemon queues this (on stable storage if you use the persistent queues of the latest versions of the C implementation), and sends it to a worker as soon as one is available.

This is basic stuff, I'm just making sure we are all on the same page regarding Gearman behavior.

Back to the the unique parameter. You can define a unique key for your jobs. Those keys should be unique per function, so you can have the same key on different functions and they will be two different jobs.

The fun part with unique is that you can have multiple clients listening to status and completion events of the same job if you share the key between them. Say that you start a job with a key alpha. If other clients submit a job with the same key, and (and this is an important) the job is still running, this second client will attach itself to the same job.

To test this I wrote a small shell script worker (save it as slow.sh and make it executable with chmod 755 slow.sh):

#!/bin/sh

echo "Starting a slow worker..." >>/dev/fd/2
for i in 5 4 3 2 1 ; do
  echo $i >>/dev/fd/2
  sleep 1
done
echo done $PID

This worker does nothing expect print to STDERR a count, and then sends to the client a done string with the worker process ID. But it takes 5 seconds to run so it allows us some time to switch between windows.

After you start your gearmand server (I used gearmand -vv just to have some debug) you can start a worker process like this:

gearman -w -f slow ./slow.sh

This worker registers the slow function and will execute the slow.sh per job.

Now open two new terminal windows and type on each one:

gearman -f slow -u alpha -s

This will submit a non-background job for the slow function, with the unique key alpha. The -s means that we won't be sending any data.

You'll see one execution of the worker, and then both clients will output something like this:

done 57998

You can experiment, and attach the second client only half-way through the worker run. Or you can stop the worker, start both clients, and then start the worker. The result will be the same: both clients receive the output for the same job.

This is a very very cool feature, and can be used easily for slow processing inside a web request.

First, if a web request requires a slow processing phase, we store all the relevant data and submit a background job to do the processing with some random key. The user receives a "Processing page" on his browser, that includes this key.

A small javascript program using AJAX, connects to our long-pooling server and submits a non-background job to the same function and using the key. This second client request will now wait until the processing is done, and can even receive status updates from the worker and send them back to the browser.

This attach-to-job-using-key is very reliable. I've tested several combinations with and without workers running, strange order of job submission, adding multiple clients to the same job, and all of them work as expected.

The only real problem with this is that the clients need to use the same API that is used to submit a job. Instead of a new API, like ATTACH_JOB for example, you use the same SUBMIT_JOB API that you use for new jobs. This works fine, until your second client attaches with a key for a job that has already ended. The gearmand server will fail to find it, and will dutifully create a new job.

My current workaround for this is to send a specific payload on the attach requests, to signal the worker that this is a attach request and not a new job. The worker, if he detects this signal, just ends the processing. For example, if your jobs require payload data, you can use an empty data field as the flag.

This is not optimal of course, you would be waking up workers just to return immediately, but it would work.

If you have multiple gearmand servers, you need to make sure that the clients that will be attaching to a job use the same server. A solution would be to pass the server identification (IP or even better hash-of-IP) along with the key.

The traditional way of doing this is have the workers store the completed result in a database somewhere. The solution presented here does not invalidate that, it only provides a very light notification of completion to background jobs. It beats pooling the database to see if the background job is completed.

October 07, 2009

Dist::Zilla::Plugin::LatestPrereqs

This all started with an article by Marcel Gruenauer "hanekomu", "Repeatedly installing Task::* distributions".

What he wants is a way to tell CPAN this: "install the latest versions of my dependencies".

His solution wont work unfortunately. The code that he gives us will prevent the Task:: module from being installed but it will not guarantee that the latest version of the prereqs will be installed in the following runs.

the reason is simple: if you don't ask for a specific version of your prereqs, CPAN will accept any version, so it will only install each prereq once, the first time.

The perfect solution would be to create a marker on each prereq that would tell the CPAN tool chain that you want the latest version. This does not exist yet. You could probably standardize on version -1 meaning the last one (on a twisted parallel with the last index of Perl lists), but its all speculation. Its just not supported yet.

The next best thing would be to include the version required on each of your prereqs, and keep those values up-to-date to the latest available on CPAN whenever you rebuild your package.

This is a half-way solution. It wont guarantee the latest version at the time your package is installed but it would make sure you get the latest version at the time your package was built on your system before uploading to PAUSE.

This is actually a good compromise given that you are probably listing the versions of the prereqs that you tested your package with on your system.

And, better yet, this half-solution can be automatized. I wrote a Dist::Zilla plugin to do just that. The code is very simple and you can just adapt this into a Module::Install plugin or whatever you use to build your packages.

You can find the code for Dist::Zilla::Plugin::LatestPrereqs at my Github repository for Dist::Zilla tools. Its not on CPAN yet, and for now there are no plans to publish it. The reason is simple: it requires a small patch to the core Dist::Zilla. The patch is a single commit that you can find on Dist::Zilla fork. I've asked Ricardo Signes to accept the patch. If he likes the code, I'll release my plugin after the next Dist::Zilla release.

If you look at the LatestPrereqs code, you'll notice that it is very simple, but it does load the CPAN package and that is a big one. You could write this code directly on your Makefile.PL and have the very latest versions of your prereqs at install time, but this would assume that the system as a properly configured CPAN.

That was a risk that I'm not willing to take on my distributions. If you do it that way, ping me. I would like to follow your module CPAN Testers feed for a while.

Back to Marcel post, if you do need to prevent the install phase for your distribution, then I've also uploaded a Dist::Zilla plugin to do just that: Dist::Zilla::Plugin::MakeMaker::SkipInstall. It might be handy sometimes.

October 04, 2009

AnyEvent::Mojo 0.8

I've uploaded to PAUSE release 0.8 of AnyEvent::Mojo. It should be on your local CPAN mirror in a little while.

This was a long time coming unfortunately, and I accumulated FAIL test reports on CPANTS, but its here now.

Given that it uses the latest Mojo release, it supports HTTP keep-alive and pipelining, chunked-encoding and 100-Continue requests.

Although the test suite passes, I'm not fully confident on the pipelining code. My next step is to write a client with a slow network reader to exercise some corner cases of that part of the code. In particular, I'm concerned about the interaction of Mojo pipeline code with my request pause functionality that I use to implement long-polling servers.

It could be argued that long-polling and pipeline don't mix, but I think that the pause functionality could also be used on regular requests. For example, if one request needs an answer from a memcached server, the handler can start the memcached GET, pause the Mojo transaction, and resume it when the memcached response arrives. While that is going on, the server should be able to keep writing out previous requests, and reading the next ones.

Maybe thats a bit extreme, but I do hope to have this working, for complex pipelining situations.

October 01, 2009

bash completion for Github gem

For some time now, I had the Github gem installed (if you want to know more, I suggest a old blog post about the Github gem). This gives you a small gh script that interface to Github APIs and make common operations like creating repositories, cloning, and fetching other repositories in the project network easy and fast.

But I'm a lazy bastard, and the lack of a bash completion script was getting on my nerves.

So here: you can get the new gh-completion.bash script from my repository. I've sent a pull request to Chris Wanstrath (defunkt)

Please note that this is still work-in-progress.

Right now, most of the command and options have completion, but I'm still trying to understand certain operations (like pull-request and track) that I never use. I don't really understand what they do, so I have no way to create a decent completion rule for them.

But using this bash completion is totally safe... Until you hit the ENTER key, that is.

Contacts

melo@simplicidade.org (XMPP/email)
+351 302 029 050 (voice)
melopt (Skype)

IronMan challenge

Iron Man badge Are you ready to be an Iron Man? Join the challenge and find out! (what is the meaning of this little man?)

Moosaico

Junta-te!

Recent Comments

Powered by Disqus
Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.2