Linguistics Miscellany: 2013

Tuesday, August 27, 2013

Packaging XML::LibXML with PAR Packer on Windows

PAR Packer is an excellent utility for delivering your Perl scripts as standalone executables. A standalone executable is highly desired in, for example, a corporate environment where everyone needs a program you wrote but you can't expect anyone to learn how to run Perl programs.

A recent requirement at $work was for a standalone executable. Originally, I was supposed to let my coworker work his magic (and his ActiveState PerlPacker license), but the client required an all-open-source solution. Thus I turned to PAR Packer and its pp utility.

So far, the most difficult aspect of using pp is that it doesn't detect all dependencies. It requires the user to explicitly list many required DLL's. I needed to list DLL's for two libraries: Wx and XML::LibXML.

Creating Wx apps with pp is a solved problem: wxpar, bundled with Wx::Perl::Packager, is a pp wrapper and adds all of the required Wx DLL's.

Getting it to work with XML::LibXML required some trial and error. I would create the executable, move it to another computer without Perl or C, run it from the command line (clicking the file hid certain error messages), and write down the name of the library that was missing. It turned out that three DLL's needed to be explicitly added: libxml2-2__.dll, libiconv-2__.dll and libz__.dll. On my computer these were located in C:\strawberry\c\bin. So, the final command I used to build my application was thus:

wxpar -o MyApp.exe -I lib -l C:/strawberry/c/bin/libxml2-2__.dll -l C:/strawberry/c/bin/libiconv-2__.dll -l C:/strawberry/c/bin/libz__.dll MyApp.pl

Is there a simpler way to do this? What's with all the underscores? Comments and questions welcome below.

Monday, April 15, 2013

The Extended Euclidian Algorithm in Perl

This week I learned about the extended Euclidian algorithm for finding a linear combination of two numbers that yields their GCD. For example, the GCD of 213 and 171 is 3, and -4*213 + 5*171 = 3. This algorithm is important in the RSA encryption scheme.

I had quite a difficult time getting myself to fully understand how it works. I jumped between Wikipedia, my data structures textbook (don't buy it), a YouTube video, and this excellent number theory class lecture.The lecture is the best, though I think there may be a typographical error in the recursive formula.

The basic idea uses recursion with an easy base step. We call Euclid(a,b) with a ≥ b:

The base case is when b is 0. The GCD of x and 0 is always x, and the coefficients to produce a GCD of 0 are 1 and 0 (or anything else): 1*x + 0(or anything)*0 = x. So the base case returns (1,0)
Any other step starts by recursively calling Euclid(b, a mod b). We know that the GCD of a and b is the same as the GCD of b and a mod b (lemma 12 in the lecture). This recursive call is guaranteed to eventually get to the base case of b = 0.
After finding the coefficients for producing the GCD from b and a mod b, we can calculate the ones for producing the GCD from a and b, because a mod b can be put in terms of a and b (see the code comments for the formulas).

To really help myself understand the whole thing, I wrote a Perl script to illustrate it. I put in lots of comments as I worked my way through it.

use strict;
use strict;
use warnings;
use 5.010;
#start with a >= b
my @nums = sort {$b <=> $a} @ARGV;

gcd(@nums);

#input: two numbers (a,b) a >= b > 0
#output: the coefficients which which yield their GCD; 
sub gcd {
 my ($a, $b) = @_;
 
 #base case; the GCD of x and 0 always x;

 #and the coefficients will always be 1 and 0 (or anything) because
 #1*x + 0*0 = x
if($b == 0){
  say "GCD is $a";
  say "(a,b) = ($a,$b), coefficients = (1,0)";
  say "1x$a + 0x$b = $a";
  return (1, 0);
 }
 
 #otherwise, we evaluate u and v for k = ub + vr, where r is a mod b
 #gcd(b, a%b) gives the same value
 my $remainder = $a % $b;
 my ($u, $v) = gcd($b, $remainder);
 #now we can find k in terms of a and b because we know r in terms a and b
 #r = a - bq, where q = the whole part of a/b
 #k = ub + vr = ub + v(a - bq) = va + b(u-qv)
 #so the coefficient on a is v, and the coefficient on b is 1-qv
 my $x = $v;
 my $q = int(($a/$b));
 my $y = $u - $q*$x;
 say "(a,b) = ($a,$b), coefficients are ($x,$y)";
 say "${x}x$a + ${y}x$b = " . ($x*$a + $y*$b);
 return ($x, $y);
}

Feel free to leave a comment if you think that something could be stated more clearly. I hope it helps anyone else trying to learn how the extended Euclidian algorithm works.

Sunday, April 7, 2013

Running Perl with Sublime Text 2

I've been having fun trying out Sublime Text. It's pretty, fast, and extremely extensible.

The first thing that I wanted was to be able to work well with Perl. I installed Package Control, followed by SublimeLinter, which has the perlcritic command built in. Making this useful requires a little finagling; perlcritic is by no means a quick program (being a really thorough linter for a language which is complex to parse), and the defaults for SublimeLinter cause it run over and over again as you type. To fix this, I edited Packages/SublimeLinter/SublimeLinter.sublime-settings and changed the "sublimelinter" setting to false. Now, in order to lint the current file, I have to press ctrl+alt+l. (Update: I don't recommend this for Sublime Text 2 because of speed problems. See this issue on Github. ST3 should be fine, though.)

Next, I wanted to be able to run my Perl scripts. Sublime has the ctrl+b shortcut for running a build for the current file. What the build actually does is specified in either a build file or the project file. To create a new build file for perl, go to Tools->Build System -> New Build System. The build file I've seen on different sites for Perl looks like this:

{ "cmd": ["perl", "$file"], "file_regex": ".* at (.*) line ([0-9]*)", "selector": "source.perl" }

Save this as perl.sublime-build. With this, whenever you are working on a Perl file and hit ctrl+b, the command "perl -w your_file.pl" will be run. This, however, was not good enough for me. Most of the time I am working on tests for a Perl module, so I have to run perl -Ilib t/my_test_file.t. I also want to be able to run individual tests as well as prove using shortcuts.

To do this, we need to turn the module directory into a Sublime Text project. This is pretty simple. First, open the module directory in Sublime Text. Select Project->Save Project As, then choose the name of the project and save it in the top directory of the module. Paste the following simple contents into the project file:

{ "folders": [ { "path": "." } ] }

All this does is add the entire directory to the project. Next, we edit the Perl build file to reference the root of the project so we can add the top-level lib directory to our include path:

{ "cmd": ["perl", "-Ilib", "$file"], "working_dir": "$project_path", "file_regex": ".* at (.) line ([0-9])", "selector": "source.perl",

}

Great! Now we can run Perl on tests contained in module directories. This still works fine for standalone scripts, too.

Now I'd like to run my whole test suite using prove. By default, ctrl+shift+b runs a build variant with the name "Run", so we'll just make a prove variant with that name. I'd much rather give it a more descriptive name, but the Sublime shortcut requires this name. You can change the shortcut, but then you wouldn't be able to use the shortcut for other builds (other languages). It's all up to you. Here is the final build file:

{ "cmd": ["perl", "-Ilib", "$file"], "working_dir": "$project_path", "file_regex": ".* at (.) line ([0-9])", "selector": "source.perl", "variants": [ { "cmd": ["prove", "-vlr", "--merge"],

"working_dir": "$project_path",

"name": "Run", "windows": {

"cmd": ["prove.bat", "-vlr", "--merge"]

}

} ] }

Note that I needed a Windows variant for prove since the Sublime editor doesn't work the same as cmd. You could, alternatively, add '"shell":true' to use the system's command shell so you don't need a separate command for Windows.

With this build file in place, I can now press ctrl+b to run any Perl script, with it's project lib directory in @INC, and ctrl+shift+b to run prove. Voila!

Here are the final files:
project file (put a copy in your project root folders)
Perl build file (only one is needed per ST installation)

Sunday, February 3, 2013

Managing Global State: the Flip-Flop Operator

Today I was faced with another mysterious failing test while writing a test suite for some legacy code. I knew it had to be a problem with persisting state because this particular test only failed when processing a particular data set with the same object which was just used to process another set.

My first step to trying to fix this was to delete all of the values stored in the object during the processing procedure:

delete $self->{stateDatum1};

delete $self->{stateDatum2};

#etc....

Nothing changed. I reduced the problematic code into a small example for this post. First, the module to be tested:

package Demo::Bad::GlobalFlipFlop;
use strict;
use warnings;
use autodie;
use 5.010;

sub new {
 my ($class) = @_;
 my $self = {};
 bless $self, $class;
 return $self;
}

#return true if parsing succeeded, false otherwise.
sub parse {
 my ($self, $file) = @_;
 open my $file_in, '<', $file;
 
 my $started = 0;
 while( <$file_in> ){
  
  #flip-flop 
  next unless /^=startHere/i .. 0;    # start processing
  $started = 1;
  #continue doing something with file contents...
  # say 'hello:)' if(/hello/);
  # say 'goodbye:(' if(/goodbye/);
 }
 if(not $started){
  say "File not processed; missing '=startHere' line.";
  return;
 }
 close $file_in;
 return 1;
}

1;

The main idea here is that we are processing some file and returning a boolean representing its validity. The only requirement of validity of the file is that a certain start token is found within it; everything before the start sequence is ignored. Here are valid and invalid example files:

#good_file.txt
=startHere
hello
goodbye

#bad_file.txt- doesn't contain a start sequence
hello
goodbye

Now, the test file:

use strict;
use warnings;
use autodie;
use Test::More tests => 2;
use File::Slurp;
use Demo::Bad::GlobalFlipFlop;

my $good_name = 'good_file.txt';
my $bad_file = 'bad_file.txt';

my $demo = Demo::Bad::GlobalFlipFlop->new();

ok( $demo->parse($good_name) );
ok( not $demo->parse($bad_file) );

The output of running this file:

>perl test.pl
1..2
hello:)
goodbye:(
ok 1
hello:)
goodbye:(
not ok 2
# Failed test at test.pl line 61.
# Looks like you failed 1 test of 2.

Why did it fail the second test, which involves checking that an invalid file is considered invalid?
The bug is in the line which matches the start token:

next unless /^=startHere/i .. 0;    # start processing

The regex, flip-flop operator and 0 were clearly some sort of idiom that I was unfamiliar with. I had only ever used the flip-flop with numbers, such as 1..10, which iterates from numbers 1 through 10. How does it work? Let's check perlop:

Each ".." operator maintains its own boolean state, even across calls to a subroutine that contains it. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again.

The mysterious line thus worked like this:

Skip lines of the input file until the left side, a match for the start token, is true
Don't skip lines again until the right side, 0, is evaluated as true (which never happens).
The state of this flip-flop operator is stored between subsequent calls to the subroutine. It's a hidden global variable!

Usually flip-flop operators are used in contexts that are guaranteed a reset after iteration (such as 1..10). Not so here! I replaced the offending code with some that keeps state for me:

my $started = 0;

while(<$file_in>){

    if(/^=startHere/i){

        $started = 1;

    }

    next unless $started;

#continue processing...

With this, everything works as expected:

>perl test.pl
1..2
ok 1
File not processed; missing '=startHere' line.
ok 2

Note that this bug only presented itself to me because I changed the legacy standalone script to be its own module, creating the possibility of storing state between subroutine calls.

Sunday, January 27, 2013

When not to use Perl's Implicit close; Suffering from Buffering

This post is a quick not on a bug I had difficulty tracking down.

One nice feature of Perl, introduced long before my time, is that of implicit closing. Perl closes filehandles for you when you forget (maybe on purpose). So the following is not a resource leak as a standalone script:

open my $file, '>utf8', '/path/to/new/file'

    or die "couldn't open file: $!";

print $file 'Hello!';

When the script finishes, Perl will close $file for you, so you can be nice and lazy. The caveat to this is that the variable $. isn't reset as it would be with a normal close (see docs here). $. holds the current line number from the last file read. So if you were processing a file line-by-line and found an error, you might print an error like 'bad value foo on line XYZ' using the $. variable for XYZ. I raised a question about this on StackOverflow.

Today I found another case where not explicitly closing a filehandle means trouble. I was working on testing a modulino-style script with flexible outputs. You can call a method to set the handle that this script prints to. In my test script, I was setting the handle to be some filehandle and then checking the contents of the file against a string. The problem? The file was always empty at run time, but contained what I expected it to when I manually inspected it. Here's some example broken code:

#ImplicitClose.pm

package Demo::Bad::ImplicitClose;

use strict;

use warnings;

sub new {

 my ($class) = @_;

 my $self = {};

 bless $self, $class;

 return $self;

}

sub output_fh {

    my ( $self, $fh ) = @_;

    if ($fh) {

        if ( ref($fh) eq 'GLOB' ) {

            $self->{output_fh} = $fh;

        }

        else {

            open my $fh2, '>', $fh or die "Couldn't open $fh";

            $self->{output_fh} = $fh2;

        }

    }

    $self->{output_fh};

}

sub some_long_method {

 my ($self, $text) = @_;

 print { $self->{output_fh} } $text;

}

1;

#test.pl

use strict;

use warnings;

use autodie;

use Test::More tests => 1;

use File::Slurp;

use Demo::Bad::ImplicitClose;

my $file_name = 'file1.txt';

#make sure we pass the test from outputting something *this* run

unlink $file_name if -e $file_name;

my $print = 'some junk';

my $demo = Demo::Bad::ImplicitClose->new();

$demo->output_fh($file_name);

$demo->some_long_method($print);

my $contents = read_file($file_name);

is($contents, $print);

If you run test.pl, you'll see that its one and only test fails:

>perl -I[folder where you put the Demo directory] test.pl

1..1

not ok 1

# Failed test at test.pl line 68.

# got: ''

# expected: 'some junk'

# Looks like you failed 1 test of 1.

Then, when you inspect the contents of file1.txt, you have:
some junk

What happened here? I was suffering from buffering. Because neither test.pl nor ImplicitClose.pm closed the file, it was still open when I was trying to read it. Nothing had been written to it yet because the amount printed was so small that it had to wait in the buffer either until there was more to write or until the file was closed, which would flush the buffer. Implicit close wouldn't be performed until the the filehandle's reference count reached 0, and the $demo object still had a reference to it. So the test would have worked fine if I had assigned undef to $demo, or just closed the filehandle.

Watch those implicit closes.

Sunday, January 20, 2013

Testing Perl Distributions with Test Subdirectories

Normally I run my test suites with the prove utility:
prove -vl --merge
The v option turns on verbose processing, and the l option adds lib/ to the include path. prove then runs all of the tests in the t/ folder.

Today, I had a new problem. I merged multiple distributions into one (without losing any Git history!), and each had a test suite that I wanted to keep separate. Naturally, I moved the tests from each distribution into its own subdirectory under t/. However, this time when I ran prove -vl, I got this message:

Files=0, Tests=0, 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Result: NOTESTS

Dubious... Well, I needed to know how to test with subdirectories in the t/ folder, so I looked at the prove documentation and found the -r option. The r stands for "recurse", meaning that the test files would be found by recursing into the directories of the distribution (starting at the top in the root of the distribution). That turned out to be exactly what I needed!

prove -vlr

t/parser/01-testParser.t

...

All tests successful.

Files=28, Tests=1815, 211 wallclock secs ( 0.92 usr + 0.28 sys = 1.20 CPU)

Result: PASS

Woohoo!

Also, both MakeMaker and Module::Build recurse in the same way during module testing. If you use Dist::Zilla, then you'll probably have the plugins [MakeMaker] and [ModuleBuild]. Using these, dzil test will recurse in the same way.