Jens Krämer

The Sad State of HTML Sanitization With JRuby and Rails

 |  jruby, ruby, rails  |  2 comments

If it were not for the Redmine installation notes, I’d probably never noticed that, if you care about HTML sanitization and want to run on JRuby, Rails 4.2 or greater might not be your best choice:

Redmine 3.0 does not support JRuby because some gems do not support Rails 4.2.

They go on mentioning JDBC problems (the severity of which varies depending on the kind of database you’re using), and

Loofah

Loofah is the library that powers the rails-html-sanitizer gem, which, as the name suggests, provides HTML sanitization for Rails. Curious what might be going on I had a look at TravisCI and yes, the most recent Loofah test runs on JRuby indeed look like a lot of broken windows:

Loofah itself has no platform specifics but runs the same code for MRI and JRuby, so this seems to be a Nokogiri problem.

The fact that in Loofah, Nokogiri is not only used for the actual HTML/XML processing during sanitization, but also in test cases, makes it hard to tell apart real (and potentially dangerous) sanitization failures from pure ‘testing failures’, where the sanitization behaviour might be correct but the test itself simply fails due to a Nokogiri bug. In fact I believe the larger part of the Loofah JRuby test failures is of the second kind, but personally I’d rather not use a security relevant library based on this assumption.

So What’s Up With Nokogiri?

Nokogiri’s test suite does not show any failures on JRuby, so let’s see if we can reproduce what Loofah does in Nokogiri’s own test suite and cause some trouble here. Turns out that’s not too hard. Unfortunately fixing these bugs is much harder, especially if you’re not familiar with JRuby Java extension coding and/or Java XML libraries.

Despite bringing down the number of Loofah test failures and errors to around 60 with one change to Nokogiri, after more than a day I didn’t feel like spending any more time wading through Nokogiri-J source code. So far there was no reaction regarding my pull requests, so things most probably will go on slowly on this side anyway.

One step back

Looking for an easier / faster way to fix the HTML sanitization problem that led me down to Nokogiri-XML-parsing hell in the first place I quickly came across the OWASP Java HTML Sanitizer Project. Using the HTML sanitization library of an organization dedicated to application security doesn’t sound like the worst idea, so I built a Ruby wrapper to give it API compatibility with the stock Rails HTML Sanitizer classes. I also took the Rails HTML Sanitizer test cases, mangled them a bit to account for slight output differences, and they pass. The gem hooks into the default Sanitizer so there’s nothing more to it than adding

gem 'rails-html-sanitizer-jruby', platforms: :jruby

to your Gemfile. The Readme has more information.

Last Words

Don’t blindly trust the shiny green TravisCI badges - always check if your platform is eventually marked optional in travis.yml (Loofah, Rails HTML Sanitizer), or is part of the current CI setup at all (Rails, sadly).

Comments

Jeremy

Were you able to use your Ruby wrapper to replace loofah and nokogiri and get Redmine running on JRuby?

Thanks for this great write-up! I've been a Redmine user and admin for several years now, and really wanted to try using JRuby to improve scalability and speed for a while now. A couple years ago, the blockers were that various JRuby implementations don't seem to support native C extensions (which are required by some of my favorite Redmine plug-ins), but some recent developments (http://chrisseaton.com/rubytruffle/cext/) may prove to fix that. Your post gives me hope that Redmine could regain JRuby support in the not-so-distant future.

Jens

Jeremy,

Yes this worked fine at the time I was writing this. However I'm not using Redmine on JRuby myself at the moment so I don't have any more current information on this right now.

You can use Markdown here.

For the sake of spam checking any data you submit, including your IP address, will be transferred to the US based Akismet web service (akismet.com). If that's not acceptable for you, you can also reach me by other means.