Ruby 1.9 comes with a new powerful regular expression engine called Oniguruma. It sports better handling of UTF8 encoded content, plus goodies like positive and negative look-behind or named matches. Here’s a good overview about these and some more of the new features of Oniguruma.
There are two ways to get Oniguruma into a pre-1.9 Ruby: You can patch the Ruby source tree with Oniguruma and build your own Ruby, or use the Oniguruma gem, which makes it fairly easy to use the new style regular expressions in any Ruby 1.8.x project. Here’s how:
$ wget http://www.geocities.jp/kosako3/oniguruma/archive/onig-4.7.1.tar.gz
$ tar xzf onig-4.7.1.tar.gz
$ cd onig-4.7.1
$ ./configure --prefix=/usr
$ sudo make install
$ sudo gem install oniguruma
prefix argument in the call to
configure - it should point to the location of your current ruby installation. So if your ruby executable is located in
/usr/bin, you’ll have to use
/usr here as shown above.
If everything went well so far, try it out in irb:
reg = Oniguruma::ORegexp.new '(?.*)(a)(?.*)'
match = reg.match( 'terraforming' )
puts match <= 'terraforming'
puts match[:before] <= 'terr'
puts match[:after] <= 'forming'
The downside of not having Oniguruma patched into a self-compiled version of Ruby is that something like
'terraforming' =~ /(?.*)(a)(?.*)/
won't work because it will be handled by your Ruby version's built in regexp rengine.