The Sad State of HTML Sanitization With JRuby and Rails
If it were not for the Redmine installation notes, I’d probably never noticed that, if you care about HTML sanitization and want to run on JRuby, Rails 4.2 or greater might not be your best choice:
Redmine 3.0 does not support JRuby because some gems do not support Rails 4.2.
They go on mentioning JDBC problems (the severity of which varies depending on the kind of database you’re using), and
Loofah
Loofah is the library that powers the rails-html-sanitizer gem, which, as the name suggests, provides HTML sanitization for Rails. Curious what might be going on I had a look at TravisCI and yes, the most recent Loofah test runs on JRuby indeed look like a lot of broken windows:
Loofah itself has no platform specifics but runs the same code for MRI and JRuby, so this seems to be a Nokogiri problem.
The fact that in Loofah, Nokogiri is not only used for the actual HTML/XML processing during sanitization, but also in test cases, makes it hard to tell apart real (and potentially dangerous) sanitization failures from pure ‘testing failures’, where the sanitization behaviour might be correct but the test itself simply fails due to a Nokogiri bug. In fact I believe the larger part of the Loofah JRuby test failures is of the second kind, but personally I’d rather not use a security relevant library based on this assumption.
So What’s Up With Nokogiri?
Nokogiri’s test suite does not show any failures on JRuby, so let’s see if we can reproduce what Loofah does in Nokogiri’s own test suite and cause some trouble here. Turns out that’s not too hard. Unfortunately fixing these bugs is much harder, especially if you’re not familiar with JRuby Java extension coding and/or Java XML libraries.
Despite bringing down the number of Loofah test failures and errors to around 60 with one change to Nokogiri, after more than a day I didn’t feel like spending any more time wading through Nokogiri-J source code. So far there was no reaction regarding my pull requests, so things most probably will go on slowly on this side anyway.
One step back
Looking for an easier / faster way to fix the HTML sanitization problem that led me down to Nokogiri-XML-parsing hell in the first place I quickly came across the OWASP Java HTML Sanitizer Project. Using the HTML sanitization library of an organization dedicated to application security doesn’t sound like the worst idea, so I built a Ruby wrapper to give it API compatibility with the stock Rails HTML Sanitizer classes. I also took the Rails HTML Sanitizer test cases, mangled them a bit to account for slight output differences, and they pass. The gem hooks into the default Sanitizer so there’s nothing more to it than adding
gem 'rails-html-sanitizer-jruby', platforms: :jruby
to your Gemfile. The Readme has more information.
Last Words
Don’t blindly trust the shiny green TravisCI badges - always check if
your platform is eventually marked optional in travis.yml
(Loofah, Rails HTML Sanitizer),
or is part of the current CI setup at all (Rails, sadly).