<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" 
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/"
  xmlns:admin="http://webns.net/mvcb/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:atom="http://www.w3.org/2005/Atom">

<channel>
<title>Notes</title>
<link>http://www.simplicidade.org/notes/</link>
<atom:link href="http://www.simplicidade.org/notes/42.xml" rel="self" type="application/rss+xml" />
<description>Building simplicidade.org: notes, projects, and ocasional rants</description>
<dc:language>en-us</dc:language>
<dc:creator>melo@simplicidade.org</dc:creator>
<dc:date>2010-01-29T11:54:30+00:00</dc:date>
<admin:generatorAgent rdf:resource="http://www.movabletype.org/?v=3.2" />
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>

<item>
<title>Clarifications about AnyEvent::Mojo</title>
<link>http://www.simplicidade.org/notes/archives/2010/01/clarifications.html</link>

<description>It seems that I need to clarify some stuff, based on Sebastian comments in my last post and via Twitter. I really hope this is the last time I have to talk about it. I rather spend my time coding. First, you can look at the code of AnyEvent::Mojo::Server::Connection: no private Mojo APIs are being used. They might have been with early releases, but as Sebastian says, I did work with him to improve Mojo so that I didn&apos;t have to depend on private APIs. That would be wrong. In fact, given that I was able to write a AnyEvent-based...</description>

<content:encoded><![CDATA[
<p>It seems that I need to clarify some stuff, based on Sebastian comments in <a href="http://www.simplicidade.org/notes/archives/2010/01/anyeventmojo_up_1.html">my last post</a> and via Twitter. I really hope this is the last time I have to talk about it. I rather spend my time coding.</p>

<p>First, you can look at the code of <a href="http://github.com/melo/anyevent--mojo/blob/master/AnyEvent-Mojo/lib/AnyEvent/Mojo/Server/Connection.pm"><code>AnyEvent::Mojo::Server::Connection</code></a>: no private <code>Mojo</code> APIs are being used.</p>

<p>They might have been with early releases, but as Sebastian says, I did work with him to improve <code>Mojo</code> so that I didn't have to depend on private APIs. That would be wrong.</p>

<p>In fact, given that I was able to write a <code>AnyEvent</code>-based <code>Mojo</code> server proves that the code was properly structured.</p>

<p>Second, I have no problems with changing state machines. I like state machines and expect them to change. I was eager to see Sebastian add a "pause" state, that would greatly simplify my own work regarding long-pooling. And the first break included changes in the state machine that I fixed on my side.</p>

<p>But the second break was caused not by the introduction of new states, but the abstraction of some code into the new <code>Mojo::IOLoop</code>. After the notification of no changes until 1.0.</p>

<p>Now, I totally accept that Sebastian needs to evolve his code base to support the new stuff he wants to support, like WebSockets. But it was nonetheless new code, and a re-factoring of all the Server interface.</p>

<p>I was pissed at having my code broken in that way after the "no changes" declaration, but only for a couple of days, <strong>but</strong>, and significantly more importantly, that has nothing to do with my deprecation of <code>AnyEvent::Mojo</code>.</p>

<p>I'm deprecating my module because there is a better way to do it now. I no longer need to keep this module up-to-date, I just have to work with the larger community of the PSGI/Plack, the Perl Web Server. It just makes sense to drop one-off module and switch to a system that is being reused by so many projects.</p>

<p>In a nutshell, I'm deprecating <code>AnyEvent::Mojo</code> because you have better solutions now. Solutions that I will move to and recommend to others. Solutions based on the <code>PSGI</code>/<code>Plack</code> stack, the <a href="http://plackperl.org/">Perl Web Server</a>.</p>

<p>I am glad to see a clarification about the deprecation policy in the <a href="http://github.com/kraih/mojo/commit/3c1d3f78a240b67c01a4511d018862b4c217efa8">Changes</a> and the http://github.com/kraih/mojo/commit/89fc6baf390d0065ad319abc540dc6a47e0bc812 for the Mojo project, I think it is a big step in the right direction; it can only give assurance to potential users of Mojolicious, and help the project.</p>

<p>I did learn a lot about HTTP that I hadn't pay attention before, the tiny details in the protocol. I do like <code>Mojolicious</code> MVC stuff, I just think that the <code>Plack</code> HTTP stack is better than the <code>Mojo</code> HTTP stack.</p>

<p>And I might get the cake and eat it too. I'll wait for <code>Mojo</code> 1.0 but it should be possible to use <code>Mojolicious</code>, the best part of the project, with <code>Plack</code>.</p>

]]></content:encoded>

<guid isPermaLink="false">1009@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2010-01-29T11:54:30+00:00</dc:date>


</item>
<item>
<title>Anyevent::Mojo update</title>
<link>http://www.simplicidade.org/notes/archives/2010/01/anyeventmojo_up_1.html</link>

<description>The current version of AnyEvent::Mojo is failing some tests. The Mojo API that I was using changed yet again and I don&apos;t have the tuits to fix it right now. I&apos;ll explain how we got here, what are the next steps, and finally I&apos;ll comment on lessons learned. A long, long time ago... When I wrote AE::M, I was looking for a way to do long-polling in Perl, with decent performance and cool stuff like epoll/kqueue support. At the time, the only solution might have been POE, but I have other problems with that module, so I decided I wanted...</description>

<content:encoded><![CDATA[
<p>The current version of <a href="http://search.cpan.org/dist/AnyEvent-Mojo/"><code>AnyEvent::Mojo</code></a> is <a href="http://www.cpantesters.org/distro/A/AnyEvent-Mojo.html?grade=3&amp;perlmat=2&amp;patches=2&amp;oncpan=1&amp;distmat=2&amp;perlver=ALL&amp;osname=ALL&amp;version=0.8003">failing some tests</a>. The <a href="http://search.cpan.org/dist/Mojo/"><code>Mojo</code></a> API that I was using changed yet again and I don't have the <a href="http://www.perlfoundation.org/perl5/index.cgi?tuit">tuits</a> to fix it right now. I'll explain how we got here, what are the next steps, and finally I'll comment on lessons learned.</p>

<h3>A long, long time ago...</h3>

<p>When I wrote AE::M, I was looking for a way to do long-polling in Perl, with decent performance and cool stuff like epoll/kqueue support. At the time, the only solution might have been <a href="http://search.cpan.org/dist/POE/"><code>POE</code></a>, but I have other problems with that module, so I decided I wanted a <a href="http://search.cpan.org/dist/AnyEvent/"><code>AnyEvent</code></a>-based solution.</p>

<p>I went looking for a HTTP-stack that I could extend and found <a href="http://mojolicious.org/">Mojo</a>. The code was working and ready in a afternoon, and I was happy.</p>

<p>But then we had the <a href="http://labs.kraih.com/blog/2009/08/version-numbers-and-backwards-compatibility.html">famous</a> <a href="http://github.com/kraih/mojo/blob/33eb290df3825510eb7b88cb9eef581d9a7fec59/Changes">0.991250 release</a>. A lot of backwards incompatible changes, and AE::M broke. I was caught at a bad time and could not update my own module for some weeks. In the Changes file, Sebastian explained why he broke backwards compatibility and that it would be the last time before 1.0; so I finally found the time and updated my code.</p>

<p>CPAN Testers was giving me green lights once more, and I was happy.</p>

<p>Until 2 months later that is, when we got the new <a href="http://search.cpan.org/dist/Mojo/lib/Mojo/IOLoop.pm"><code>Mojo::IOLoop</code></a> code. AE::M broke again, and its broken since then. The place where I was using AE::M needed some fixes, and the work to adjust my code to the new <code>Mojo</code> is not as simple as it was the first time around.</p>

<p>So I took the dirty way out: included the last pre-<code>Mojo::IOLoop</code> code with my project, implemented the new features, and shipped. Not pretty. At all.</p>

<p>Meanwhile, in a distant planet, <a href="http://bulknews.typepad.com/">miyagawa</a> was busy changing the Perl landscape with regards to HTTP integration. He wrote the <a href="http://search.cpan.org/dist/PSGI/"><code>PSGI</code></a> spec, and the <a href="http://search.cpan.org/dist/Plack/"><code>Plack</code></a> toolkit. It is a <a href="http://bulknews.typepad.com/blog/2009/12/plackpsgi-ecosystem.html">wonderful piece of work</a>, and the <a href="http://plackperl.org/">Perl Web Server</a> of choice.</p>

<p>And <a href="http://bulknews.typepad.com/blog/2009/10/psgiplack-streaming-is-now-complete.html">after some API work</a>, with the help of <a href="http://nothingmuch.woobling.org/">nothingmuch</a>, it <a href="http://blog.woobling.org/2009/10/event-driven-psgi.html">included a very decent support for asynchronous HTTP servers</a>, including all the bits to make long-pooling work right.</p>

<p>Then in December, we got <a href="http://github.com/kraih/mojo/blob/master/lib/Mojo/Server/PSGI.pm"><code>Mojo::Server::PSGI</code></a>, native support for <code>PSGI</code> 
inside <code>Mojo</code>.</p>

<h3>Next steps</h3>

<p>Right now, I'm waiting for the next release of Mojo, and then I'll immediately release a new version of AnyEvent::Mojo, deprecating it.</p>

<p>With the native support for <code>PSGI</code>, there is no reason to maintain the AE::M interface, just use the <a href="http://search.cpan.org/dist/Plack-Server-AnyEvent/"><code>Plack::Server::AnyEvent</code></a> (or <code>AnyEvent::HTTPD::PSGI</code> when <a href="http://bulknews.typepad.com/blog/2010/01/is-plack-middleware-is-it-library-or-is-it-a-server.html">miyagawa finishes the cleanup he talked about</a>).</p>

<p>But I will update the AE::M one last time to make it compatible with <code>Mojo::IOLoop</code>, as soon as I have the time. I don't want to leave it broken.</p>

<p>In the meantime, if you are using AE::M, I suggest that you bundle an earlier version of <code>Mojo</code>. I'll run a <code>git-bisect</code> to figure out which version is best and update this post.</p>

<h3>Lessons learned</h3>

<p>Not sure. On one hand, it is a version below 1.0 so we can expect some breakage.</p>

<p>On the other hand, the Changes file mentions that no more backwards incompatible changes would be made to <code>Mojo</code> before 1.0.</p>

<p>I still like the <code>Mojolicious</code> MVC framework and the dispatcher behind it, but I fear the <code>Mojo</code> stack and deprecation policy of the project now.</p>

<p>For now, all my async work is moving to <code>Plack</code>. I still need a asynchronous MVC framework, but I can wait a bit, or even write a very simple one if need be. Maybe steal code that I like from the others.</p>

<p>But I'm back to <a href="http://www.catalystframework.org/"><code>Catalyst</code></a> (and I plan to talk about my feelings about the attribute-based dispatcher soon) because I feel safer. A lot safer. I know that there are other people out there that depend on <code>Catalyst</code> every day and in with big sites, much bigger than mine. And I trust the Cat developers not to break my code.</p>

<p>I do hope to see <code>Mojo</code> 1.0 soon, and I hope it goes well for all the gang at <code>#mojo</code>, but I will also look for some form of backwards compatibility policy before I even mention <code>Mojo</code> ever again in my presentations about Perl.</p>

]]></content:encoded>

<guid isPermaLink="false">1008@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2010-01-29T02:49:42+00:00</dc:date>


</item>
<item>
<title>Muito obrigado</title>
<link>http://www.simplicidade.org/notes/archives/2009/11/muito_obrigado.html</link>

<description>I&apos;m not American so officially I don&apos;t get a Thanksgiving holiday. Still, we are reaching the end of 2009 and looking back I think we do have a lot of things to be grateful for. Giving thanks, like Christmas, doesn&apos;t need a special day, and for us perl hackers our favorite team, the perl-porters, giving thanks couldn&apos;t be easier. So lets all, no matter where you are, run the perlthanks command line today, and celebrate our new schedule for Perl5 releases and happy days and long lives to our pumpkins. On a more individual note, I can&apos;t forget three little...</description>

<content:encoded><![CDATA[
<p>I'm not American so officially I don't get a Thanksgiving holiday. Still, we are reaching the end of 2009 and looking back I think we do have a lot of things to be grateful for.</p>

<p>Giving thanks, like Christmas, doesn't need a special day, and for us perl hackers our favorite team, the perl-porters, giving thanks couldn't be easier.</p>

<p>So lets all, no matter where you are, run the <code>perlthanks</code> command line today, and celebrate our new schedule for Perl5 releases and happy days and long lives to our pumpkins.</p>

<p>On a more individual note, I can't forget three little big persons: cog, ambs and Magda. YAPC::EU raised the bar on so many levels.</p>

<p>But we don't live without CPAN, and so we acknowledge the PAUSE and CPAN teams, and all individuals that keep the mirror network up and running.</p>

<p>Finally, to those teams that make my day-to-day simpler and fun: the Moose cabal and the Plack gang (with a note of envy for miyagawa productivity...).</p>

<p>To all, a very big Muito Obrigado.</p>

]]></content:encoded>

<guid isPermaLink="false">1005@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-11-27T07:40:43+00:00</dc:date>


</item>
<item>
<title>Feed generator</title>
<link>http://www.simplicidade.org/notes/archives/2009/11/feed_generator.html</link>

<description>I was mostly away from Perl this last days, and I&apos;m eager to get back to it next week. The small work I did was to start a script that converts a directory of files into a RSS or Atom feed. For now it sits at the app-files2feed repository. I need to fix the last bugs with enclosure support and add documentation before releasing it to CPAN. Non-perl work sucks....</description>

<content:encoded><![CDATA[
<p>I was mostly away from Perl this last days, and I'm eager to get back to it next week.</p>

<p>The small work I did was to start a script that converts a directory of files into a RSS or Atom feed.</p>

<p>For now it sits at the <a href="http://github.com/melo/app-files2feed">app-files2feed repository</a>. I need to fix the last bugs with enclosure support and add documentation before releasing it to CPAN.</p>

<p>Non-perl work sucks.</p>

]]></content:encoded>

<guid isPermaLink="false">1004@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-11-19T08:40:00+00:00</dc:date>


</item>
<item>
<title>More Browser::Open</title>
<link>http://www.simplicidade.org/notes/archives/2009/11/more_browserope.html</link>

<description><![CDATA[I pushed to CPAN a new release of Browser::Open. I've added more commands to test (courtesy of code "borrowed" from SD, and the Launchy gem), and made the test suite more robust in case we don't find a suitable command. &lt;rant&gt; I'm amazed that something as simple as opening a URL is such a complicated task on most UNIX-based systems. I have a single command to use on Mac OS X and on Windows based systems, but there seems to be no standard way of doing this simplest of things on UNIX systems. &lt;/rant&gt;...]]></description>

<content:encoded><![CDATA[
<p>I pushed to <a href="http://search.cpan.org/">CPAN</a> a new release of <a href="http://search.cpan.org/dist/Browser-Open/">Browser::Open</a>.</p>

<p>I've added more commands to test (courtesy of code "borrowed" from <a href="http://syncwith.us/sd/">SD</a>, and the <a href="http://copiousfreetime.rubyforge.org/launchy/">Launchy gem</a>), and made the test suite more robust in case we don't find a suitable command.</p>

<p>&lt;rant&gt; <br />
I'm amazed that something as simple as opening a URL is such a complicated task on most UNIX-based systems.</p>

<p>I have a single command to use on Mac OS X and on Windows based systems, but there seems to be no standard way of doing this simplest of things on UNIX systems. <br />
&lt;/rant&gt;</p>

]]></content:encoded>

<guid isPermaLink="false">1003@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-11-10T00:37:14+00:00</dc:date>


</item>
<item>
<title>Perl testing and git pre-commit hooks</title>
<link>http://www.simplicidade.org/notes/archives/2009/11/perl_testing_an.html</link>

<description>An article on Colin&apos;s blog mentioned a technique that I also use, using your test suite as a git pre-commit hook. I&apos;ll expand on some of his ideas, showing my own setup. Before each commit you should run the full test-suite of your code. If your test-suite has grown so much that it takes a long time to run, it makes more sense to run a smaller part of it, and let the continuous integration system (you do have one, right?) report back any problems with your tree later. At the very least you should make sure that all your...</description>

<content:encoded><![CDATA[
<p><a href="http://www.perldreamer.com/blog/webgui-testing-skipping-the-skip-block">An article</a> on <a href="http://www.perldreamer.com/blog">Colin's blog</a> mentioned a technique that I also use, using your test suite as a <a href="http://git-scm.com/">git</a> <a href="http://book.git-scm.com/5_git_hooks.html">pre-commit hook</a>. I'll expand on some of his ideas, showing my own setup.</p>

<p>Before each commit you should run the full test-suite of your code. If your test-suite has grown so much that it takes a long time to run, it makes more sense to run a smaller part of it, and let the continuous integration system (you do have one, right?) report back any problems with your tree later.</p>

<p>At the very least you should make sure that all your modules compile correctly. And that means checking the <code>00-compile.t</code> test file.</p>

<p>To make the whole process automatic, I use the following <code>pre-commit-hook</code>:</p>

<pre class="brush: bash;">
#!/bin/sh

## Set any ENV vars that your test suite requires here

exec prove -l -Q t/00*
</pre>

<p>I usually write a script called <code>my-git-pre-commit-hook</code> and <code>exec</code> that one from the <code>.git/hook/pre-commit</code> file, this way I can run it by hand if I need to.</p>

<p>I also assume that all tests named <code>00-*</code> are important enough to run before each commit, which lets me add more specific tests at any point.</p>

<p>This solves half of the problem. I also don't want to keep editing the <code>00-compile.t</code> file each time I add another module to my project. So I wrote it like this:</p>

<pre class="brush: perl;">
#!perl

use strict;
use warnings;
use Test::More;
use Path::Class;
use File::Find;

my $lib = dir('lib')->absolute->resolve;
find({
  bydepth => 1,
  no_chdir => 1,
  wanted => sub {
    my $m = $_;
    return unless $m =~ s/[.]pm$//;

    $m =~ s{^.*/lib/}{};
    $m =~ s{/}{::}g;
    use_ok($m) || BAIL_OUT("***** PROBLEMS LOADING FILE '$m'");
  },
}, $lib);

done_testing();
</pre>

<p>This code uses <a href="http://search.cpan.org/dist/perl/lib/File/Find.pm"><code>File::Find</code></a> to list all the <code>.pm</code> files in your <code>lib/</code> directory, and then tries to use them. It also use submodules before using the main module, courtesy of the <code>bydepth</code> switch.</p>

<p>If a problem is found, it immediately bail out of the entire test suite, mentioning the file that has problems. On previous versions it would test all modules and only bail out at the end if any had problems. I found that version less productive because the same problem would be reported multiple times.</p>

<p>This works out fine for my projects. It adds at most a couple of seconds to each commit (in the current project it adds less than 4 seconds) but I find that acceptable.</p>

<p>It would be interesting to write a <code>00-test-commited-files.pl</code> file that would look at the files being updated by the commit and run the tests that cover them. This could be done with the help from <a href="http://search.cpan.org/dist/Devel-CoverX-Covered/"><code>Devel::CoverX::Covered</code></a> but I haven't done it yet.</p>

]]></content:encoded>

<guid isPermaLink="false">1001@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-11-01T09:28:35+00:00</dc:date>


</item>
<item>
<title>A faster configuration for CPAN::Reporter</title>
<link>http://www.simplicidade.org/notes/archives/2009/10/a_faster_config.html</link>

<description>The idea of CPAN::Reporter is great: take advantage of all those daily uses of the cpan shell to collect reports from a large network of users. I tried several times to enable CPAN::Reporter but I always found that it delayed just enough of my workflow that I found it a nuisance. After each test phase, it would start a SMTP connection and send the report. Those 3 or 4 seconds where a bit too much for me. After a bit of reading I found a good compromise to report my test runs without affecting the performance. The setup is simple:...</description>

<content:encoded><![CDATA[
<p>The idea of <a href="http://search.cpan.org/dist/CPAN-Reporter/"><code>CPAN::Reporter</code></a> is great: take advantage of all those daily uses of the <code>cpan</code> shell to collect reports from a large network of users.</p>

<p>I tried several times to enable <code>CPAN::Reporter</code> but I always found that it delayed just enough of my workflow that I found it a nuisance. After each test phase, it would start a SMTP connection and send the report. Those 3 or 4 seconds where a bit too much for me.</p>

<p>After a bit of reading I found a good compromise to report my test runs without affecting the performance. The setup is simple: make <code>CPAN::Reporter</code> write the test results to a directory and create a command to send them later.</p>

<p>To set this up, first you <a href="http://wiki.cpantesters.org/wiki/CPANInstall">install CPAN::Reporter as usual</a> and then you tweak the configuration to store the reports in a directory. My <code>~/.cpanreporter/config.ini</code> looks like this:</p>

<pre class="brush: plain;">
email_from = "Pedro Melo" <melo@simplicidade.org>
edit_report = no
send_report=unknown:yes fail:yes pass:yes na:no

transport=File /Users/melo/.cpan/reports
</pre>

<p>The trick is to use the <code>File</code> transport. You configure it with the directory where the test reports will be stored. In my case I choose <code>/Users/melo/.cpan/reports</code>. You need to make sure that directory exists.</p>

<p>From now on, every time you use the <code>cpan</code> shell to install a module, the test reports will be stored in you test report directory.</p>

<p>The final step is sending them. I wrote a simple script to take care of that, that you can find on <a href="http://github.com/melo/scripts">my scripts repository</a>: <a href="http://github.com/melo/scripts/blob/master/bin/x-perl-send-test-reports"><code>x-perl-send-test-reports</code></a> (<a href="http://github.com/melo/scripts/raw/master/bin/x-perl-send-test-reports">download</a>).</p>

<p>How and when you run it is up to you. You can run it manually from time to time, or from a cron, or use something like folder actions to monitor the directory and start the script whenever a new file is placed there. I run it manually for now.</p>

<p>My command line to use the script is this:</p>

<pre class="brush: bash;">
x-perl-send-test-reports         \
    --from melo@simplicidade.org \
    --server smtp.gmail.com      \
    --transport 'Net::SMTP::TLS
         User melo@simplicidade.org
         Password my_password
         Port 587'               \
    ~/.cpan/reports
</pre>

<p>So one less excuse not to report your test results. If you are not using CPAN::Reporter, start now.</p>

]]></content:encoded>

<guid isPermaLink="false">1000@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-10-24T11:08:28+00:00</dc:date>


</item>
<item>
<title>Just another data point</title>
<link>http://www.simplicidade.org/notes/archives/2009/10/just_another_da.html</link>

<description>(Update: I&apos;ve pushed my code, including three new scripts, to the nfsd_report_bench/ directory on my examples repository. See below for some clarifications based on comments I received). A former colleague of mine at PT had a small reporting problem, and he ended up comparing several languages for the job: C, Perl, PHP, and Python. I was curious about the results, so I took the latest version of the Perl script that he was using and set off to work. The first thing that you should be aware is where your bottleneck is. Take a look at this small script: #!/usr/bin/env...</description>

<content:encoded><![CDATA[
<p>(<strong>Update</strong>: I've pushed my code, including three new scripts, to the <a href="http://github.com/melo/perl-examples/tree/master/nfsd_report_bench/"><code>nfsd_report_bench/</code> directory on my examples repository</a>. See <a href="#the_next_day">below</a> for some clarifications based on comments I received).</p>

<p>A former colleague of mine at PT had a <a href="http://blog.sig9.net/2009/10/17/php-vs-perl/">small reporting problem</a>, and he ended up comparing several languages for the job: C, Perl, PHP, and Python.</p>

<p>I was curious about the results, so I took the <a href="http://dev.sig9.net/PHPvsPerl/stats_basic_optimized_pl.txt">latest version of the Perl</a> script that he was using and set off to work.</p>

<p>The first thing that you should be aware is where your bottleneck is. Take a look at this small script:</p>

<pre class="brush: perl;">
#!/usr/bin/env perl

use strict;
use warnings;

my $lines;
while (<STDIN>) {
#  my @fields = split / /;
  $lines++;
}
print "$lines\n";
</pre>

<p>A basic line counter. Compare it to the system <code>wc</code>:</p>

<pre><code>$ gzcat nfsd.gz | time wc -l
 12236390
       16.86 real        16.24 user         0.43 sys
$ gzcat nfsd.gz | time ./wc_simple.pl 
12236390
       10.13 real         8.51 user         0.84 sys
</code></pre>

<p>So, a bit faster than the C version, doing 1.2M lines per second on my laptop.</p>

<p>But if you remove the comment on the <code>split()</code>, we have:</p>

<pre><code>$ gzcat nfsd.gz | time ./wc_simple_with_split.pl
12236390
       228.78 real       224.39 user         2.34 sys
</code></pre>

<p>A lot less: 53k lines per second.</p>

<p>So the first bottleneck is the <code>split()</code>. Lets improve on that. After some attempts I came up with this:</p>

<pre class="brush: perl;">
my $lines;
while (<STDIN>) {
  my ($ts)    = /^(\d+)/gc;
  my ($type)  = /(\w)\sV/gc;
  my ($op)    = /\s(\d\d?)\s\w/gc;
  my ($bytes) = /\w\s(\d+)\s/gc;
  $lines++;
}
</pre>

<p>I make use of the <code>gc</code> flags to start the next match where the previous one ended. I also take advantage of patterns in the lines that I need to match, like the <code>V</code> in the NFS version.</p>

<p>With this version we get:</p>

<pre><code>$ gzcat nfsd.gz | time ./wc_simple_with_regexps.pl
12236390
      109.62 real       107.42 user         1.55 sys
</code></pre>

<p>A bit better: 111k lines per second, a bit over 2x the previous result.</p>

<p>If we apply this gain to the <a href="http://dev.sig9.net/PHPvsPerl/">reported times for the Perl script</a> (419m11.267s), we get 200m49.97 which places Perl 1st, even above the C version by a minute or so.</p>

<p>My adjusted version of the <code>stats_basic_optimized.pl</code> script is <a href="http://www.simplicidade.org/share/melo_stats_basic_pl.txt"><code>melo_stats_basic.pl</code></a>.</p>

<p>This was good enough I think, but I wanted to study the I/O gains that could be made. Stop reading here if you are already bored, it will only get worse. We did manage to get a few more minutes back, though... ;)</p>

<p>So I went after the I/O performance. Fist I wanted to rule out the pipe as the bottleneck.</p>

<pre><code>$ cat /dev/zero | pv | cat &gt; /dev/null
4.61GB 0:00:10 [ 498MB/s]

$ gzcat nfsd.gz | pv | cat &gt; /dev/null 
1GB 0:00:04 [ 242MB/s]
</code></pre>

<p>So the pipe is not the bottleneck but we will never reach the full speed, <code>gzcat</code> will be our limitation.</p>

<p>I did try to read the gzip file directly into the Perl script and uncompressing it there, but it was <em>very</em> slow.</p>

<p>So assuming a limit of 242MB/s on the input side, how fast are we chopping lines? The size of the input is 1073743224 bytes, and our simple <code>wc_simple.pl</code> (no split, no regexp) took 10.13 seconds above, so we are chopping the input at a rate of 101MB/s.</p>

<p>So there is room to grow there (101MB/s to 242MB/s). I did some experiments:</p>

<pre><code>$ gunzip nfsd.gz
$ time wc_simple.pl &lt; nfsd
12236390

real    1m23.212s
user    0m7.434s
sys 0m2.166s
</code></pre>

<p>Yeah, the file doesn't fit in the cache like the gziped version, so there is real I/O, and the times go through the roof.</p>

<p>There is no point doing experiments with the big <code>nfsd</code> file. All of them will result in real I/O, and that is always slower than memory.</p>

<p>Lets try to do bigger reads and parse the results:</p>

<pre class="brush: perl;">
use strict;
use warnings;

my $size = shift || 2 ** 20; ## 1Mb default
my $offset = 0;
my $buf = '';
my $lines = 0;

while () {
  my $n = sysread(\*STDIN, $buf, $size, length($buf));

  while ($buf =~ /.+\n/gc) {
    $lines++;
  }
  last unless $n > 0;

  print "$lines $n\n" unless $lines & 0x1ffff;
  $buf = substr($buf, pos($buf));
}
print "$lines\n";
</pre>

<p>I tried several block sizes but with my OS the most I could read in one call was 64k. So asking 1MB and getting 64k reads we get:</p>

<pre><code>$ gzcat nfsd.gz | time  ./wc_batch.pl
12236390
    8.37 real         7.52 user         0.58 sys
</code></pre>

<p>We get 1.46M lines per second, a 17% improvement. Lets adjust to retrieve our fields. The inner loop becomes:</p>

<pre class="brush: perl;">
while ($buf =~ /(.+)\n/gc) {
  $_ = $1;
  my ($ts)    = /^(\d+)/gco;
  my ($type)  = /(\w)\sV/gco;
  my ($op)    = /\s(\d\d?)\s\w/gco;
  my ($bytes) = /\w\s(\d+)\s/gco;

  $lines++;
}
</pre>

<p>and the runtime:</p>

<pre><code>$ gzcat nfsd.gz | time  ./wc_batch_with_regexps.pl
12236390
   95.58 real        93.21 user         1.14 sys
</code></pre>

<p>So from 109.62s to 95.58s, 12% better (or comparing with our baseline <code>wc_simple_with_split.pl</code> at 228.78s, 58% better). Adjusting this to the reported results we would go from 419m11.267s down to 175m5.6842s.</p>

<p>I don't think I can improve on this unless we can have a bigger pipe reads. For example, forcing the reads to 8k:</p>

<pre><code>$ gzcat nfsd.gz | time  ./wc_batch.pl 8192
12236390
       12.08 real        10.93 user         0.70 sys
</code></pre>

<p>A lot worse compared with the 8.37s we got with 64k reads. So the size of the pipe is the next factor we could explore, if that is even an option with your kernel.</p>

<p>But I'm happy now.</p>

<p><a name="the_next_day"></a></p>

<h2>The next day</h2>

<p>Or so I though. First there was doubts that the split() was faster than regexps. I wrote <a href="http://github.com/melo/perl-examples/blob/master/nfsd_report_bench/bench_splitters.pl"><code>bench_splitters.pl</code></a> (<a href="http://github.com/melo/perl-examples/blob/master/nfsd_report_bench/bench_splitters_report.txt">output on my laptop</a>, <a href="http://github.com/melo/perl-examples/raw/master/nfsd_report_bench/bench_splitters.pl">download link</a>) to compare split with my regexps. The regexps are a bit over twice as fast, <strong>but</strong> I found big differences between the Mac OS system perl (5.8.8 on my Leopard OS) and the 5.10.1 that I compiled: system perl was between 20 and 30% faster.</p>

<p>The same <code>bench_splitters.pl</code> gives you the max rate of extraction that you can expect from the global script. I also included timing of the bookkeeping parts of the original script. The only noteworthy detail is the fact than when you hit the second level condition, you pay the price of the modulus operator big time. I also think that something is wrong with the input. Those time stamps don't look like normal second-precision time stamps. They are too big. So I don't know if <code>$ts % 3600</code> is the proper way to group performance by hour.</p>

<p>Second I wrote a <a href="http://github.com/melo/perl-examples/blob/master/nfsd_report_bench/max_line_rate.pl"><code>max_line_rate.pl</code></a> (<a href="http://github.com/melo/perl-examples/blob/master/nfsd_report_bench/max_line_rate_report.txt">output on my laptop</a>, <a href="http://github.com/melo/perl-examples/raw/master/nfsd_report_bench/max_line_rate.pl">download link</a>) that gives you the upper bound on the max rate that you can expect while parsing the required fields. You can run this script, and stop it at any point in time with ctrl-c, and it will print a performance report up to that point. Every 128k lines, a single line performance report is also printed.</p>

<p>You can use this <code>max_line_rate.pl</code> to compare your system perl with the 5.10.1 you compiled. I had much better performance with 5.8.8 in this particular application.</p>

<p>Finally I rewrote the statistics script. I did that to deal with the report that <a href="http://twitter.com/nunoloureiro/statuses/4969723345">my previous version was consuming 7.5Gb of RAM</a>. The reason is simple enough: I don't have access to the original input, only to a six-line excerpt that was posted. Therefore the regexps I use to extract the required fields might fail.</p>

<p>The new script, <a href="http://github.com/melo/perl-examples/blob/master/nfsd_report_bench/fast_stats.pl"><code>fast_stats.pl</code></a> (<a href="http://github.com/melo/perl-examples/blob/master/nfsd_report_bench/fast_stats_report.txt">output on my laptop</a>, <a href="http://github.com/melo/perl-examples/raw/master/nfsd_report_bench/fast_stats.pl">download link</a>), is more robust, and should deal with lines that cannot be parsed: it will print the line that couldn't be matched and ignore it. Also: I've included the output of the <code>ps</code> command at the start and end of both <code>fast_stats.pl</code> and <code>max_line_rate.pl</code> to show that the RSS doesn't change that much.</p>

<p>To compare the original <code>stats_basic_optimized.pl</code> with my <code>fast_stats.pl</code> I wrote a small shell script <a href="http://github.com/melo/perl-examples/blob/master/nfsd_report_bench/bench.sh"><code>bench.sh</code></a> (<a href="http://github.com/melo/perl-examples/blob/master/nfsd_report_bench/report.txt">output on my laptop</a>). The <code>nfsd.gz</code> input file was generated with the <a href="http://github.com/melo/perl-examples/blob/master/nfsd_report_bench/build_source_file.pl"><code>build_source_file.pl</code></a> script with the command:</p>

<pre><code>build_source_file.pl 1073741824 | gzip --best &gt; nfsd.gz
</code></pre>

<p>The new <code>fast_stats.pl</code> is almost twice as fast as the old one on my laptop.</p>

<p>On a final note (I wasted too much time already on this...), I'm not out to compare Perl with Python or Ruby or even PHPO. But I would like to know how we measure up against C though. The reason is simple: when Perl programmers feel that something is slow, they turn to C, not another scripting language.</p>

<p>This experiment is mostly to show that writing fast perl will sometimes take you to unexpected paths (like regexps beating a split), and that you should benchmark carefully if performance is critical to you.</p>

]]></content:encoded>

<guid isPermaLink="false">998@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-10-18T10:52:19+00:00</dc:date>


</item>
<item>
<title>CPAN::Shell &apos;s&apos; command</title>
<link>http://www.simplicidade.org/notes/archives/2009/10/cpanshell_s_com.html</link>

<description>I&apos;m playing with a new command for the CPAN::Shell: &apos;s&apos; for search on http://search.cpan.org. It takes a single argument (can be a module, distribution, bundle or author name), checks the CPAN indexes to see which type it is, creates the proper URL for it at search.cpan.org and opens your browser with it. The last bit, opening a browser with it, is very very immature. Right now it only works on Mac OS X. I&apos;m hopping to get the experience to do it right from the Browser::Open distribution. If no object is found, sends the user to the generic search interface....</description>

<content:encoded><![CDATA[
<p>I'm playing with a new command for the <code>CPAN::Shell</code>: '<code>s</code>' for <em>search on http://search.cpan.org</em>.</p>

<p>It takes a single argument (can be a module, distribution, bundle or author name), checks the CPAN indexes to see which type it is, creates the proper URL for it at <a href="http://search.cpan.org/">search.cpan.org</a> and opens your browser with it.</p>

<p>The last bit, opening a browser with it, is very very immature. Right now it only works on Mac OS X. I'm hopping to get the experience to do it right from the <a href="http://search.cpan.org/dist/Browser-Open/"><code>Browser::Open</code> distribution</a>.</p>

<p>If no object is found, sends the user to the generic search interface.</p>

<p>The current hackish implementation can be found on <a href="http://github.com/melo/cpanpm/tree/s_command">my <code>s_command</code> topic branch</a> (its a <em>topic branch</em>, I will rebase it on occasion onto master).</p>

]]></content:encoded>

<guid isPermaLink="false">997@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-10-17T12:36:29+00:00</dc:date>


</item>
<item>
<title>Browser::Open</title>
<link>http://www.simplicidade.org/notes/archives/2009/10/browseropen.html</link>

<description>I&apos;ve uploaded a small module to CPAN, Browser::Open (give it a couple of minutes to show up). It does one simple thing: given a $url, it opens the default browser with it. The difficult part is deciding how to open the &quot;default browser&quot;. On Mac OS X, this is easy: just execute the open command. On Windows, there is a start command that should do the trick, but I&apos;m not a Windows user so I cannot test this. Any Windows users out there that can point me to the relevant information on how to open a URL with a simple...</description>

<content:encoded><![CDATA[
<p>I've uploaded a small module to CPAN, <a href="http://search.cpan.org/dist/Browser-Open/">Browser::Open</a> (give it a couple of minutes to show up).</p>

<p>It does one simple thing: given a <code>$url</code>, it opens the default browser with it.</p>

<p>The difficult part is deciding how to open the "default browser". On Mac OS X, this is easy: just execute the <code>open</code> command.</p>

<p>On Windows, there is a <code>start</code> command that should do the trick, but I'm not a Windows user so I cannot test this. Any Windows users out there that can point me to the relevant information on how to open a URL with a simple command, I would appreciate it.</p>

<p>For Linux, you have too many choices it seems: you could use <code>gnome-open</code> but your user might be using KDE. There is a <a href="http://portland.freedesktop.org/xdg-utils-1.0/xdg-open.html"><code>xdg-open</code> command</a> described at the <a href="http://freedesktop.org/">FreeDesktop</a> site that seems to do what I want. We can always fallback to <code>firefox</code> though. Fragmentation++!</p>

]]></content:encoded>

<guid isPermaLink="false">996@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-10-17T12:13:30+00:00</dc:date>


</item>
<item>
<title>Dist::Zilla::Plugin::LatestPrereqs</title>
<link>http://www.simplicidade.org/notes/archives/2009/10/distzillaplugin.html</link>

<description>This all started with an article by Marcel Gruenauer &quot;hanekomu&quot;, &quot;Repeatedly installing Task::* distributions&quot;. What he wants is a way to tell CPAN this: &quot;install the latest versions of my dependencies&quot;. His solution wont work unfortunately. The code that he gives us will prevent the Task:: module from being installed but it will not guarantee that the latest version of the prereqs will be installed in the following runs. the reason is simple: if you don&apos;t ask for a specific version of your prereqs, CPAN will accept any version, so it will only install each prereq once, the first time....</description>

<content:encoded><![CDATA[
<p>This all started with an article by <a href="http://hanekomu.at/">Marcel Gruenauer "hanekomu"</a>, "<a href="http://hanekomu.at/blog/dev/20091005-1227-repeatedly_installing_task_distributions.html">Repeatedly installing Task::* distributions</a>".</p>

<p>What he wants is a way to tell <code>CPAN</code> this: "install the latest versions of my dependencies".</p>

<p>His solution wont work unfortunately. The code that he gives us will prevent the <code>Task::</code> module from being installed but it will not guarantee that the latest version of the prereqs will be installed in the following runs.</p>

<p>the reason is simple: if you don't ask for a specific version of your prereqs, <code>CPAN</code> will accept any version, so it will only install each prereq once, the first time.</p>

<p>The perfect solution would be to create a marker on each prereq that would tell the <code>CPAN</code> tool chain that you want the latest version. This does not exist yet. You could probably standardize on version -1 meaning the last one (on a twisted parallel with the last index of Perl lists), but its all speculation. Its just not supported yet.</p>

<p>The next best thing would be to include the version required on each of your prereqs, and keep those values up-to-date to the latest available on <code>CPAN</code> whenever you rebuild your package.</p>

<p>This is a half-way solution. It wont guarantee the latest version at the time your package is installed but it would make sure you get the latest version at the time your package was built on your system before uploading to <a href="http://pause.perl.org/">PAUSE</a>.</p>

<p>This is actually a good compromise given that you are probably listing the versions of the prereqs that you tested your package with on your system.</p>

<p>And, better yet, this half-solution can be automatized. I wrote a <a href="http://search.cpan.org/dist/Dist-Zilla/"><code>Dist::Zilla</code></a> plugin to do just that. The code is very simple and you can just adapt this into a <a href="http://search.cpan.org/dist/Module-Install/"><code>Module::Install</code></a> plugin or whatever you use to build your packages.</p>

<p>You can find the <a href="http://github.com/melo/dist-zilla-tools/tree/master/Dist-Zilla-Plugin-LatestPrereqs/">code for <code>Dist::Zilla::Plugin::LatestPrereqs</code> at my Github repository</a> for <a href="http://github.com/melo/dist-zilla-tools">Dist::Zilla tools</a>. Its not on <a href="http://search.cpan.org/">CPAN</a> yet, and for now there are no plans to publish it. The reason is simple: it requires a small patch to the core <code>Dist::Zilla</code>. <a href="http://github.com/melo/dist-zilla/commit/a616ea6c3aafaf4e1771bab0d0cffb2e8c12ac8a">The patch is a single commit that you can find on Dist::Zilla fork</a>. I've asked <a href="http://search.cpan.org/~rjbs/">Ricardo Signes</a> to accept the patch. If he likes the code, I'll release my plugin after the next <code>Dist::Zilla</code> release.</p>

<p>If you look at the <a href="http://github.com/melo/dist-zilla-tools/blob/master/Dist-Zilla-Plugin-LatestPrereqs/lib/Dist/Zilla/Plugin/LatestPrereqs.pm">LatestPrereqs code</a>, you'll notice that it is very simple, but it does load the <a href="http://search.cpan.org/dist/CPAN/"><code>CPAN</code></a> package and that is a big one. You could write this code directly on your <code>Makefile.PL</code> and have the very latest versions of your prereqs at install time, but this would assume that the system as a properly configured <code>CPAN</code>.</p>

<p>That was a risk that I'm not willing to take on my distributions. If you do it that way, ping me. I would like to follow your module <a href="http://static.cpantesters.org/">CPAN Testers</a> feed for a while.</p>

<p>Back to Marcel post, if you do need to prevent the install phase for your distribution, then I've also uploaded a <code>Dist::Zilla</code> plugin to do just that: <a href="http://search.cpan.org/dist/Dist-Zilla-Plugin-MakeMaker-SkipInstall/"><code>Dist::Zilla::Plugin::MakeMaker::SkipInstall</code></a>. It might be handy sometimes.</p>

]]></content:encoded>

<guid isPermaLink="false">993@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-10-07T16:41:02+00:00</dc:date>


</item>
<item>
<title>AnyEvent::Mojo 0.8</title>
<link>http://www.simplicidade.org/notes/archives/2009/10/anyeventmojo_08.html</link>

<description>I&apos;ve uploaded to PAUSE release 0.8 of AnyEvent::Mojo. It should be on your local CPAN mirror in a little while. This was a long time coming unfortunately, and I accumulated FAIL test reports on CPANTS, but its here now. Given that it uses the latest Mojo release, it supports HTTP keep-alive and pipelining, chunked-encoding and 100-Continue requests. Although the test suite passes, I&apos;m not fully confident on the pipelining code. My next step is to write a client with a slow network reader to exercise some corner cases of that part of the code. In particular, I&apos;m concerned about the...</description>

<content:encoded><![CDATA[
<p>I've uploaded to <a href="https://pause.perl.org">PAUSE</a> <a href="http://github.com/melo/anyevent--mojo/tree/v0.8">release 0.8 of AnyEvent::Mojo</a>. It should be on your <a href="http://mirrors.cpan.org/">local CPAN mirror</a> in a little while.</p>

<p>This was a long time coming unfortunately, and I <a href="http://www.cpantesters.org/distro/A/AnyEvent-Mojo.html">accumulated FAIL test reports on CPANTS</a>, but its here now.</p>

<p>Given that it uses the <a href="http://search.cpan.org/dist/Mojo/">latest Mojo release</a>, it supports HTTP keep-alive and pipelining, chunked-encoding and 100-Continue requests.</p>

<p>Although the test suite passes, I'm not fully confident on the pipelining code. My next step is to write a client with a slow network reader to exercise some corner cases of that part of the code. In particular, I'm concerned about the interaction of Mojo pipeline code with my request pause functionality that I use to implement long-polling servers.</p>

<p>It could be argued that long-polling and pipeline don't mix, but I think that the pause functionality could also be used on regular requests. For example, if one request needs an answer from a memcached server, the handler can start the memcached GET, pause the Mojo transaction, and resume it when the memcached response arrives. While that is going on, the server should be able to keep writing out previous requests, and reading the next ones.</p>

<p>Maybe thats a bit extreme, but I do hope to have this working, for complex pipelining situations.</p>

]]></content:encoded>

<guid isPermaLink="false">992@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-10-04T08:28:35+00:00</dc:date>


</item>
<item>
<title>Generating charset_table maps for Sphinx</title>
<link>http://www.simplicidade.org/notes/archives/2009/09/generating_char.html</link>

<description>At work, I wanted to improve the back-office search engine and installed Sphinx. On a regular basis, it indexes all our users, courses, teachers and other important tables. It works great, low install barrier, low maintenance, and it is very very fast. Perfect. One of the problems that we found, that limited the usefulness of the full-text search engine, is that a lot of our text has accents, and it would be better to ignore those. Also we don&apos;t need case-sensitive-ness. So I needed to generate a charset_table map, what Sphinx uses to normalize the text that you give him...</description>

<content:encoded><![CDATA[
<p>At work, I wanted to improve the back-office search engine and installed <a href="http://www.sphinxsearch.com/">Sphinx</a>.</p>

<p>On a regular basis, it indexes all our users, courses, teachers and other important tables. It works great, low install barrier, low maintenance, and it is very very fast. Perfect.</p>

<p>One of the problems that we found, that limited the usefulness of the full-text search engine, is that a lot of our text has accents, and it would be better to ignore those. Also we don't need case-sensitive-ness.</p>

<p>So I needed to generate a <a href="http://www.sphinxsearch.com/docs/current.html#conf-charset-table"><code>charset_table</code></a> map, what Sphinx uses to normalize the text that you give him to index.</p>

<p>And being a (very) lazy person, I prefer to write a Perl script to do it. The result is the <a href="http://github.com/melo/scripts/blob/master/bin/x-sphinx-charset-generator"><code>x-sphinx-charset-generator</code></a>, now part of <a href="http://github.com/melo/scripts/">my script stash</a>.</p>

<p>It takes an optional parameter, the charset that you are using on your text defaulting to 'utf8', the <a href="http://jeremy.zawodny.com/blog/archives/010546.html">loose version of UTF-8</a>, and generates a <code>charset_table</code> for the most common accented characters, mapping them to the lower-case version of the same letter without the accent.</p>

<p>I've only include the common Portuguese characters. Patches accepted for others characters that you might need.</p>

<p>The only part that I don't really like is that I need to apply the same logic to cleanup the strings that users use to search. I would prefer to have a module that would take the characters that I want to allow as valid, and have that module provide the <code>charset_table</code> and a function to clean search inputs. Interesting, but for now this will solve the important 80% of the problem.</p>

]]></content:encoded>

<guid isPermaLink="false">990@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-09-26T17:16:26+00:00</dc:date>


</item>
<item>
<title>A new look at Mason</title>
<link>http://www.simplicidade.org/notes/archives/2009/09/a_new_look_at_m.html</link>

<description>On my way to E5, I have to deal with all the legacy sites that came before it, and the vast majority of them are written in Mason. At the time, we used HTML::Mason 1.05, and only after 1.30-ish (when the internal buffering changes introduced in the 1.10 release where reverted) did we upgraded to something more recent. To get an idea of the size of this Mason project, the current sites have a little over 450 different components across 7 sites (different layouts but same content) and 1 management site. A lot of those components are no longer in...</description>

<content:encoded><![CDATA[
<p>On my way to <a href="http://www.simplicidade.org/notes/archives/2009/08/start.html">E5</a>, I have to deal with all the legacy sites that came before it, and the vast majority of them are written in <a href="http://www.masonhq.com/">Mason</a>.</p>

<p>At the time, we used <a href="http://search.cpan.org/dist/HTML-Mason/">HTML::Mason</a> 1.05, and only after 1.30-ish (when the internal buffering changes introduced in the 1.10 release where reverted) did we upgraded to something more recent. To get an idea of the size of this Mason project, the current sites have a little over 450 different components across 7 sites (different layouts but same content) and 1 management site. A lot of those components are no longer in use, and where left there due to bad VCS practices. I would estimate about 200 actual useful components. And it makes heavy use of <code>autohandlers</code>, <code>dhandlers</code>, and multiple component roots, to implement site inheritance.</p>

<p>The <a href="http://www.simplicidade.org/notes/archives/2009/08/template_system.html">E5 template discussion</a> is also still on my mind, without a clear winner yet.</p>

<p>And finally, last week I read <a href="http://www.openswartz.com/">Jonathan Swartz</a> article about what <a href="http://www.openswartz.com/2009/09/01/what-mason-2-0-would-look-like/">Mason 2.0 would look like</a>.</p>

<p>So I took the time last night to re-evaluate Mason. As most powerful tools (and be sure that it is a very powerful tool), the problem with Mason is that you can easlly make a big mess of things. More than with other solutions like Catalyst and Mojo, that provide you with a clear separation of Controller, Model and View, Mason make is very easy to mix the three. You could end up with a lot of logic that should be in your models inside your templates.</p>

<p>But it has several advantages:</p>

<ul>
<li>its easy to start and add a new page. Just create the file and start typing: no need to jump between controller and template, and a restart (this alone makes for speedy development);</li>
<li>it has decent wrapper functionality for skinning: <code>autohandlers</code> are great;</li>
<li>the multiple component roots logic is very powerful, and its used both during the dispatch phase and component calls;</li>
<li>the view logic is Perl: no need to learn a new language and be exasperated with their limitations like TT.</li>
</ul>

<p>There are several downsides of course: for one, the split of Controller/View of modern frameworks allows you to reuse controller logic with multiple views. For example, you could output HTML, JSON and XML with the same controller code.</p>

<p>But the biggest downside is this: deployment is a bitch.</p>

<p>For production environments, deployment usually means mod_perl but I find <a href="http://www.fastcgi.com/">FastCGI</a> easier to deploy now-a-days. Yet, this option is <a href="http://www.masonhq.com/?FastCGI">only briefly mentioned on the MasonHQ site</a>.</p>

<p>So I created a small experimental project (you can find all the files at the <a href="http://github.com/melo/exp-mason-fcgi">exp-mason-fcgi project on GitHub</a>). It has a <a href="http://github.com/melo/exp-mason-fcgi/blob/master/mason_fcgi.pl">FastCGI startup script</a> to power two virtual hosts. Each one shares a <a href="http://github.com/melo/exp-mason-fcgi/tree/master/master/">master component root</a>, and has a local per-site component root to override the master site behavior when needed.</p>

<p>The setup works just fine under <a href="http://nginx.net/">nginx</a>+FastCGI (<a href="http://github.com/melo/exp-mason-fcgi/blob/master/nginx.conf">partial nginx.conf included</a>), but I did get into some trouble. Mason usually delegates some stuff to <a href="http://httpd.apache.org/">Apache</a> and without his big daddy around, he can get lost.</p>

<p>The first problem is <a href="http://httpd.apache.org/docs/2.2/mod/mod_dir.html#directoryindex">directory index files</a>. When you request a directory, Apache will help Mason out and point it to the proper <code>index.html</code> file. Without Apache, request to <code>http://your-fastcgi-mason-site/</code> will just fail, because Mason cannot find the component for <code>/</code>. It has no logic to map <code>/</code> into <code>/index.html</code> for example.</p>

<p>You could implement this with a global <code>dhandler</code>, and it is probably the best solution, because it can also deal with 404 situations (another one that Apache could cleanup after).</p>

<p>I have a <a href="http://github.com/melo/exp-mason-fcgi/commit/2360faa2c94f44b2be4f4cce6dbdcbfdbb788107">proof of concept hack in the repo that mimics the Apache DirectoryIndex directive</a>. It is a hack, it should be in the Interp.pm and not in the Request. I'll clean it up later. But it does work, and it might be useful in some scenarios.</p>

<p>This patch makes a <code>/</code> request work just fine.</p>

<p>The second problem I have with Mason is the order of evaluation of templates. The current order for a request to <code>/index.html</code> is <code>/autohandler</code> which calls <code>$m-&gt;call_next</code> and that calls the <code>/index.html</code> component.</p>

<p>This makes it hard to influence the wrapper with content generated by the <code>/index.html</code> component.</p>

<p>The solution was to create <a href="http://github.com/melo/exp-mason-fcgi/commit/814aa71a24b7e8b5d730446bca76b51876057f84">a new <code>HTML::Mason::Request</code> method called <code>scall_next()</code></a>. It merges the <a href="http://www.masonhq.com/docs/manual/Request.html#call_next"><code>$m-&gt;call_next()</code></a> and the <a href="http://www.masonhq.com/docs/manual/Request.html#scomp"><code>$m-&gt;scomp()</code></a> calls into one, and allows me to <a href="http://github.com/melo/exp-mason-fcgi/commit/54c68fa73d5bea05903e30419c33990e735161bb">use it in a <code>autohandler</code> like this</a>. The <code>$m-&gt;scall_next()</code> will call the next component, get the generated HTML, and only then generate the HTML wrapper.</p>

<p>The last piece of the puzzle is a way for the <code>/index.html</code> and the parent <code>autohandler</code> to communicate, for example, to pass along the title for the page.</p>

<p>The Mason-recommended way is to use <a href="http://www.masonhq.com/docs/manual/Request.html#notes"><code>$m-&gt;notes()</code></a> API, similar to the Catalyst stash concept. It works very well, but I prefer to take advantage of the fact that all components live inside the <code>HTML::Mason::Commands</code> namespace and just <a href="http://github.com/melo/exp-mason-fcgi/commit/f3ef0e0cef66ecf7db7e6671db08086f11b566fd">declare a shared <code>%stash</code> there and clean it up per request</a>. With this, its <a href="http://github.com/melo/exp-mason-fcgi/commit/bfe815aa90f686dc3a613629707af1bf4e128ab4">easy to implement dynamic page titles</a>.</p>

<hr />

<p>All in all, I keep coming back to Mason. I do like most of what it provides, and for quick sites, it beats all the other alternatives in Perl-land.</p>

<p>A couple of months ago there was some discussion about a option to give web developers that was FTP friendly, in that you could upload your pages to a server and it would just work. Mason is the closest that Perl has to that goal.</p>

<p>But the deployment must be made simpler, and that I one of the things that Mason 2.0 should focus on: make it simpler to deploy.</p>

<p>So my laundry list for Mason 2.0 (and most of them can be implemented on 1.x):</p>

<ul>
<li>FastCGI support out-of-the-box;</li>
<li>support for directory index files (without using <code>dhandlers</code>);</li>
<li><code>$m-&gt;scall_next</code>;</li>
<li>better hooks for debugging and 404 errors.</li>
</ul>

<p>I don't know. I'm strongly considering go Mason all the way for E5, and if that goes forward, I guess I'll have to write these four pieces myself.</p>

]]></content:encoded>

<guid isPermaLink="false">988@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-09-19T12:38:21+00:00</dc:date>


</item>
<item>
<title>Bitten by prototypes</title>
<link>http://www.simplicidade.org/notes/archives/2009/09/bitten_by_proto.html</link>

<description><![CDATA[I just spent the best part of an hour around a problem caused by the behavior of Perl prototypes. I used the following test case to figure it out: use Test::More tests =&gt; 1; use Encode qw( encode decode ); sub u8l1 { return encode('iso-8859-1', @_); } my $ola_u8 = decode('utf8', 'Olá'); my $ola_l1 = encode('iso-8859-1', $ola_u8); is(u8l1($ola_u8), $ola_l1); The output of prove x.t is this: t/x.t .. 1/1 # Failed test at t/x.t line 12. # got: '1' # expected: 'Ol?' # Looks like you failed 1 test of 1. t/x.t .. Dubious, test returned 1 (wstat 256, 0x100)...]]></description>

<content:encoded><![CDATA[
<p>I just spent the best part of an hour around a problem caused by the behavior of Perl prototypes.</p>

<p>I used the following test case to figure it out:</p>

<pre><code>use Test::More tests =&gt; 1;
use Encode qw( encode decode );

sub u8l1 {
  return encode('iso-8859-1', @_);
}

my $ola_u8 = decode('utf8', 'Olá');
my $ola_l1 = encode('iso-8859-1', $ola_u8);
is(u8l1($ola_u8), $ola_l1);
</code></pre>

<p>The output of <code>prove x.t</code> is this:</p>

<pre><code>t/x.t .. 1/1 
#   Failed test at t/x.t line 12.
#          got: '1'
#     expected: 'Ol?'
# Looks like you failed 1 test of 1.
t/x.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/1 subtests
</code></pre>

<p>The <code>got: '1'</code> had me for quite some time. Until I changed the <code>u8l1()</code> helper to this:</p>

<pre><code>sub u8l1 {
  return encode('iso-8859-1', $_[0]);
}
</code></pre>

<p>And it just works.</p>

<p>The problem is the definition of the <code>Encode::encode()</code> function. It has a prototype like this:</p>

<pre><code>sub encode($$;$)
</code></pre>

<p>So our <code>@_</code> is interpreted in scalar context, and so evaluates to the number of parameters, 1.</p>

<p>I don't like it at all because it changes the standard Perl behavior of expanding lists. Its action at the distance. The fact that you cannot pass a single element list is also not mentioned in the documentation.</p>

<p>The only really useful use of Perl prototypes is using a <code>&amp;</code> as the initial char, that allows you to write a function that looks like some built-ins like sort or map, that take a anonymous sub as the first parameter.</p>

]]></content:encoded>

<guid isPermaLink="false">987@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-09-16T22:42:13+00:00</dc:date>


</item>
<item>
<title>Log::Log4perl tip</title>
<link>http://www.simplicidade.org/notes/archives/2009/09/loglog4perl_tip.html</link>

<description>I use Log::Log4perl for all my logging needs. Ok, I lie. I use a wrapper that deals with some stuff that I just don&apos;t like with Log::Log4perl, but that is a story for another day. One thing that we inherited from log4j was the notion that a message can match multiple loggers in your logging hierarchy. The logic is simple and explained in detail on a Log::Log4perl FAQ entry. If you write something like this in your logger configuration file: log4perl.logger.Cat = ERROR, Screen log4perl.logger.Cat.Subcat = WARN, Screen which define two loggers. Cat and Cat.Subcat, the second a subcategory of...</description>

<content:encoded><![CDATA[
<p>I use <a href="http://search.cpan.org/dist/Log-Log4perl/"><code>Log::Log4perl</code></a> for all my logging needs. Ok, I lie. I use a <a href="http://github.com/melo/mytk/blob/master/lib/MyTK/Logger.pm">wrapper that deals with some stuff that I just don't like with <code>Log::Log4perl</code></a>, but that is a story for another day.</p>

<p>One thing that we inherited from <a href="http://logging.apache.org/log4j/">log4j</a> was the notion that a message can match multiple loggers in your logging hierarchy.</p>

<p>The logic is simple and explained in detail on a <a href="http://search.cpan.org/dist/Log-Log4perl/lib/Log/Log4perl/FAQ.pm#I_keep_getting_duplicate_log_messages!_What's_wrong?"><code>Log::Log4perl</code> FAQ entry</a>. If you write something like this in your logger configuration file:</p>

<pre><code>log4perl.logger.Cat        = ERROR, Screen
log4perl.logger.Cat.Subcat = WARN, Screen
</code></pre>

<p>which define two loggers. <code>Cat</code> and <code>Cat.Subcat</code>, the second a subcategory of the first, and then use:</p>

<pre><code>my $logger = get_logger("Cat.Subcat");
$logger-&gt;warn("Warning!");
</code></pre>

<p>you'll get a duplicate message in your log file because it matches both loggers.</p>

<p>I knew that and I always added a line saying:</p>

<pre><code>log4perl.additivity.Cat.Subcat = 0
</code></pre>

<p>that prevented this behavior, but this required a line like that per logger. Pain. Not lazy, at all.</p>

<p>But for some reason (stupidity comes to mind) I didn't read the FAQ completely, because at the end, there is a solution. Just put this in your logger configuration file:</p>

<pre><code>log4perl.oneMessagePerAppender = 1
</code></pre>

<p>Bliss, pure bliss.</p>

<p>Mind you that <code>oneMessagePerAppender</code> is not compatible with log4j, something that <code>Log::Log4perl</code> tries very hard to be, and therefore this feature is not documented at all except on this FAQ entry.</p>

]]></content:encoded>

<guid isPermaLink="false">985@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-09-10T11:51:35+00:00</dc:date>


</item>
<item>
<title>Bootstrap Perl</title>
<link>http://www.simplicidade.org/notes/archives/2009/09/bootstrap_perl.html</link>

<description>Whenever a new version of Perl is released, I install it in a separate directory and re-install all my modules into a new local::lib-powered directory. This takes a lot of time, but I had most of the process already in auto-pilot. But still it was a hack, so I decided to take the opportunity of the 5.10.1 release and make something more pretty and reliable. The result is my Perl bootstrap repo. There are two scripts. The first, bootstrap.sh, will install the local::lib module and prepare the environment. Its still not finished, it doesn&apos;t alter the .bashrc file, but it...</description>

<content:encoded><![CDATA[
<p>Whenever a new version of Perl is released, I install it in a separate directory and re-install all my modules into a new <a href="http://search.cpan.org/dist/local-lib/"><code>local::lib</code></a>-powered directory.</p>

<p>This takes a lot of time, but I had most of the process already in auto-pilot.</p>

<p>But still it was a hack, so I decided to take the opportunity of the 5.10.1 release and make something more pretty and reliable.</p>

<p>The result is <a href="http://github.com/melo/My-Perl-bootstrap/">my Perl bootstrap repo</a>.</p>

<p>There are two scripts. The first, <code>bootstrap.sh</code>, will install the <code>local::lib</code> module and prepare the environment. Its still not finished, it doesn't alter the <code>.bashrc</code> file, but it will get there.</p>

<p>The second, <code>install_deps.sh</code> will use the cpan shell to install a local <code>Task::Bootstrap</code> module. This <code>Task</code> has all the modules that I want installed.</p>

<p>There are still some problems. I still lack some distro prefs for a couple of them that pause the process and ask for user input. And some of the modules won't install without force (<code>Mac::Carbon</code> is the one that fails the most).</p>

<p>Other modules just don't install correctly on Mac OS X. <code>Danga::Socket</code> for example, requires <code>Sys::Syscall</code>, but this one fails the tests because Mac OS X lies about <code>sendfile</code> support: the <code>sys/syscalls.ph</code> includes the <code>SYS_sendfile</code> constant, but when you actually call it, we get a <code>Function not implemented</code>. I'm sure I could work around it, and probably fix it, but I no longer use <code>Danga::Socket</code> so I'll probably just remove that dependency.</p>

<p>The other was <code>Mac::AppleEvents::Simple</code>. Finder.app has a different naming scheme for its windows, and <code>t/simple.t</code> was failing. I've send a patch to the module RT Queue.</p>

<p>I still have small failures, but right now, I can mostly use this two scripts to setup a Perl environment from bare metal.</p>

<p><em>Update</em>: I removed my <code>~/.perl5/5.10.1/</code> directory and ran <code>time ./bootstrap.sh</code>. The results:</p>

<pre><code>real 56m4.598s
user 37m17.724s
sys  7m51.923s
</code></pre>

<p>So about an hour on a MacBook Pro 2.16Ghz Core Duo, running Leo.</p>

]]></content:encoded>

<guid isPermaLink="false">982@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-09-06T14:12:16+00:00</dc:date>


</item>
<item>
<title>Template systems</title>
<link>http://www.simplicidade.org/notes/archives/2009/08/template_system.html</link>

<description>I have a love/hate relationship with Template systems for quite some time. The following is a brain-dump about where I stand right now regarding which system to use in the E5 project. When I started using them at work (circa 1998 or 99) the goal was simple: separate programmers logic from the layout so that designers can tweak the templates without breaking the code. It was a simple and very worthwhile goal. And for the most part, it worked. You still need to coordinate which templates will be created, and how will the page be broken up into pieces to...</description>

<content:encoded><![CDATA[
<p>I have a love/hate relationship with Template systems for quite some time. The following is a brain-dump about where I stand right now regarding which system to use in the <a href="http://www.simplicidade.org/notes/archives/2009/08/start.html">E5</a> project.</p>

<p>When I started using them at work (circa 1998 or 99) the goal was simple: separate programmers logic from the layout so that designers can tweak the templates without breaking the code.</p>

<p>It was a simple and very worthwhile goal. And for the most part, it worked. You still need to coordinate which templates will be created, and how will the page be broken up into pieces to enable reuse of common parts, but it mostly worked.</p>

<p>The template systems where designed to be used by web designers, and given that those where accustomed to HTML tags, the more powerful template system that needed loops and decision constructs opted to create those with a simple syntax. The template systems became domain specific languages (DSL).</p>

<p>But, with the improvement of the HTML, CSS and JS capabilities, I don't know if the designers are still the target audience of template systems. As I split a set of HTML pages into a set of reusable templates, I try to create blocks that make sense in terms of caching, data-store life cycles, and other subtle factors that a web designer just doesn't care about. The page breakout is becoming a engineering exercise.</p>

<p>As programmers/engineers, we don't need the comfy world of a DSL. We want to work in the same language that we use to code. Templates are, at most, a small macro language that gets compiled into source code, in the same language that we have the rest of the system.</p>

<p>So I'm starting to move away from <a href="http://search.cpan.org/dist/Template-Toolkit/">Template Toolkit</a>, my preferred template system, into something that uses native Perl as the DSL. I don't know where I'll end up yet. I'm looking at <a href="http://search.cpan.org/dist/Tenjin/">Tenjin</a>, <a href="http://search.cpan.org/dist/Text-MicroMason/">Text::MicroMason</a>, and <a href="http://search.cpan.org/dist/Mojo/">Mojo::Template</a>. There aren't many template systems that use Perl as the control language, actually.</p>

<p>The first two should be the most stable ones, older ones. Tenjin is also the fastest template system I've tested, although speed wouldn't be a deal breaker.</p>

<h2>Controller-centric or View-centric render process</h2>

<p>But we can't judge template systems without considering where and how they will be used. In my case they will the
last step of a web request, handled by a web framework.</p>

<p>I'll use a BBC site page to illustrate my current dilemma, an article about <a href="http://news.bbc.co.uk/2/hi/technology/8220220.stm" title="BBC NEWS | Technology | Wikipedia to launch page controls">Wikipedia page controls</a>. Things of interest to me:</p>

<ul>
<li>the central part of the page is the main content, and directly related the the URL. This is the content that you came here to get. It has a lot of HTML before and after that are present to add value to the site (we hope), and provide exit destinations (the navigation and related stories);</li>
<li>the title of the page, early in the HTML, is also directly related to the URL and the story: this is important because the template system must support inside-out rendering (first the central part, and then the wrappers), or have all the information in memory before starting the render process;</li>
<li>some parts (the header, footer, and left-hand sidebar) are mostly static, and with low or no relation to the main content: in this case only the category of the article influences the navigation. It might also influence the related BBC sites;</li>
<li>the right-hand side shows related stories. The "related" part might be highly dynamic, or static per category or a mix in between. The related external links are probably part of the story.</li>
</ul>

<p>The path for a typical request (like  <code>/2/hi/technology/8220220.stm</code>) looks like this:</p>

<ol>
<li>collect all the important data from the URL: in this case the it seems to react to the <code>hi</code>/<code>low</code> part to pick the version of the master template, the category (controls the right-hand side column) and the article ID;</li>
<li>fetch the correct article and related informations (author, publish date, picture URL, caption...);</li>
<li>fetch the related articles for the right-hand sidebar;</li>
<li>call the view renderer with all of this.</li>
</ol>

<p>Pretty basic and straight forward. Steps 1, 2 and 4 don't pose to many questions, they are required on any pipeline I can think of.</p>

<p>But the proper way to do 3 is a matter of discussion. On one hand you can put enough information on the URL to decide what the right-hand sidebar needs and fetch it in with some calls inside the Controller (I'm assuming here a MVC-style framework). In this model the controller prepares everything that we need to display the page.</p>

<p>But this makes the view a bit inflexible because if we need to had another box the top or bottom of the page, we would need to tweak the controller code.</p>

<p>On the other hand if our template system is powerful enough, we can have a small API that we can call back into Perl land, and have these little reusable boxes generated for us. The view controls the entire process of display.</p>

<p>The advantage of having a decent view system, even a object oriented view framework, is the extensibility that you get. Imagine a Moose-based template system: view are classes that you can extend, override, add before and after processing. Rendering is just creating an instance and calling a small method, <code>as_string()</code> or even <code>stream_to_socket()</code>.</p>

<p>A system that brings the full power of Perl to the template world is <a href="http://search.cpan.org/dist/Template-Declare/">Template::Declare</a>. It treats templates as classes that you can subclass and extend. It seems very similar to what I like but I have zero experience with it so far.</p>

<p>No decisions have been made yet. I need to evaluate Template::Declare and see if it really fits my brain.</p>

]]></content:encoded>

<guid isPermaLink="false">979@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-08-25T17:22:22+00:00</dc:date>


</item>
<item>
<title>Start</title>
<link>http://www.simplicidade.org/notes/archives/2009/08/start.html</link>

<description> A beginning is the time for taking the most delicate care that the balances are correct. from Manual of Muad&apos;Dib by the Princess Irulan extracted from Dune, by Frank Herbert In the next weeks I&apos;ll be publishing a series of articles about a new project that I&apos;m starting. Actually, I started this project once already, but I didn&apos;t like the path it was taking, so I&apos;m restarting it. The new system will replace the current EVOLUI.COM code base. Why rewrite a system that is in production for 8/9 years and that works reasonably well? There are several reasons actually....</description>

<content:encoded><![CDATA[
<blockquote>
  <p>A beginning is the time for taking the most delicate care that the balances are correct.</p>
  
  <blockquote>
    <p>from <em>Manual of Muad'Dib</em> by the Princess Irulan</p>
    
    <p>extracted from <em>Dune</em>, by Frank Herbert</p>
  </blockquote>
</blockquote>

<p>In the next weeks I'll be publishing a series of articles about a new project that I'm starting. Actually, I started this project once already, but I didn't like the path it was taking, so I'm restarting it.</p>

<p>The new system will replace the current <a href="http://evolui.com/">EVOLUI.COM</a> code base. Why rewrite a system that is in production for 8/9 years and that works reasonably well? There are several reasons actually.</p>

<p>The first reason is code rot: the current system is spread across 3 code bases (called internally E1 through E3). E1 was the first stab (who would guess, right?), and grew organically inside the company, without much planning, and coded by several programmers. E2 added a better management interface, and E3 added a new LMS and forum system.</p>

<p>A lot has changed in the last 8 years. And my inexperience in certain topics shows by the lack of proper tests and documentation. Also, it has grown to 850 Perl packages. Yes, eight hundred and fifty packages. I expect that more than half of those are no longer in active use.</p>

<p>The second, and more important reason, is that the new business requirements require a deep restructuring of the code, but given that we don't have a decent test suite, it would be madness to change stuff in one of those 850 packages.</p>

<p>The way I see it, I can write the new system properly, using a test-driven methodology, making sure that all the business logic is properly tested.</p>

<p>I'm not starting from scratch, I plan to reuse certain small but critical parts of the old system, but going over them to add tests and documentation.</p>

<p>The failed attempt was called E4, so this new version will be called E5.</p>

<p>I plan to cover the following topics (not sure of the order yet):</p>

<ul>
<li>source control, and ticketing systems;</li>
<li>project directory layout;</li>
<li>documentation techniques;</li>
<li>business layer techniques;</li>
<li>databases used and ORM layers;</li>
<li>web frameworks;</li>
<li>template systems;</li>
<li>logging;</li>
<li>smoke testing;</li>
<li>deployment.</li>
</ul>

<p>Hope you enjoy the ride.</p>

]]></content:encoded>

<guid isPermaLink="false">978@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-08-21T17:04:50+00:00</dc:date>


</item>
<item>
<title>Where to put you validation</title>
<link>http://www.simplicidade.org/notes/archives/2009/06/where_to_put_yo.html</link>

<description>Yesterday I saw a question by fREW Schmidt regarding where to place the validation code in your apps. I made a mental note to answer it, but as most of my mental notes, it was quickly forgotten. Today, a new article describes the answer he got. In the end he decided to put the validation inside the model. I can only say: good for you. But one of the reasons against the validation-inside-the-model that he received baffled me: Models don’t know about the current user (or other higher level information). If your models don&apos;t know who is the person authenticated...</description>

<content:encoded><![CDATA[
<p>Yesterday I saw a question by <a href="http://blog.afoolishmanifesto.com/">fREW Schmidt</a> regarding <a href="http://blog.afoolishmanifesto.com/archives/819">where to place the validation code in your apps</a>. I made a mental note to answer it, but as most of my mental notes, it was quickly forgotten.</p>

<p>Today, a <a href="http://blog.afoolishmanifesto.com/archives/828">new article describes the answer he got</a>. In the end he decided to put the validation inside the model. I can only say: good for you.</p>

<p>But one of the reasons against the validation-inside-the-model that he received baffled me: Models don’t know about the current user (or other higher level information).</p>

<p>If your models don't know who is the person authenticated (and please read <a href="http://www.blogger.com/profile/03975438115490089158">Yuval</a> <a href="http://blog.woobling.org/2009/06/users-accounts-identities-and-roles.html">article about modelling identity</a> to understand how complex it can become), then you are doing it wrong.</p>

<p>Models must know who is the authenticated user. They must know it because that is the only sane way to implement authorization inside your methods, and to generate a audit log of operations.</p>

<p>What I usually do is to create a Session class. For each request (be it Catalyst request, incoming email message, or XMPP request), I create a new Session instance initialized with the current user information plus some tidbits like channel (Web, email, XMPP) and IP address (if available).</p>

<p>Then all accesses to my API are either via the session object (create methods to access most important parts of your API), or by passing this session object to the API you are calling.</p>

<p>You should also use your Session as the access point to your logging and auditing capabilities.</p>

<p>The article mentions other issues like error messages. I haven't add the necessity of developing an application with multiple language support yet, but I think that the current throw-exception-objects strategy, with all the information (including the authorized user to figure our preferred language) will be able to deal with the problems that I can think of now.</p>

<p>In the end, I'm actually really curious about why would people think that putting your validation code in the controllers is a good idea.</p>

<p><em>Update:</em> a lot of stuff in the comments if you care about this topic.</p>

]]></content:encoded>

<guid isPermaLink="false">974@http://www.simplicidade.org/notes/</guid>
<dc:subject>Perl</dc:subject>
<dc:date>2009-06-18T14:20:32+00:00</dc:date>


</item>


</channel>
</rss>