Regexps on steroids with Ruby 1.8.x
Ruby 1.9 comes with a new powerful regular expression engine called Oniguruma. It sports better handling of UTF8 encoded content, plus goodies like positive and negative look-behind or named matches. Here’s a good overview about these and some more of the new features of Oniguruma.
There are two ways to get Oniguruma into a pre-1.9 Ruby: You can patch the Ruby source tree with Oniguruma and build your own Ruby, or use the Oniguruma gem, which makes it fairly easy to use the new style regular expressions in any Ruby 1.8.x project. Here’s how:
$ wget http://www.geocities.jp/kosako3/oniguruma/archive/onig-4.7.1.tar.gz
$ tar xzf onig-4.7.1.tar.gz
$ cd onig-4.7.1
$ ./configure --prefix=/usr
$ make
$ sudo make install
$ sudo gem install oniguruma
Note the prefix
argument in the call to configure
- it should point to the location of your current ruby installation. So if your ruby executable is located in /usr/bin
, you’ll have to use /usr
here as shown above.
If everything went well so far, try it out in irb:
require 'rubygems'
require 'oniguruma'
reg = Oniguruma::ORegexp.new '(?
The downside of not having Oniguruma patched into a self-compiled version of Ruby is that something like 'terraforming' =~ /(?