I uploaded a new revision of the Sanitizer and also an updated
HTMLCleaner package to the web site just now. Get both from
This release contains quite a few enhancements to the MIME parser,
allowing Anomy to handle even more of the broken HTML out there.
It also incorperates some of the HTML cleanup suggestions from the
discussions on this list during the past few days, but they're
currently not enabled by default (flip a switch in the config file to
Note that the feat_html_unknown switch will enable the sanitizer to
run in "default allow" mode for HTML, which should greatly decrease
the modifications it makes on Outlook-style HTML. In theory this is
less secure, since HTML tags which the parser doesn't recognize and
haven't been evaluated for security impact will be passed through
unmodified, but in practise the risk should be negligable... I hope.
Give it a try and let me know what you think!
The interesting bits from the changelog entry are:
Fixed a problem with the mime-type auto-detection code which
would corrupt certain messages when feat_log_after was enabled.
This probably also have caused problems in other cases, but so
far none have been reported.
Tuned the MIME parser to catch more of the exploits illustrated
on http://testvirus.org/. Also fixed a bug in the position
counting. These two changes combined effect almost all of the
test cases (lines containing pos= and MIME info almost all
Added the following options to configure the HTML cleaner (all are off
feat_html_noexe Disallow links to executables
feat_html_unknown Allow unknown HTML tags
feat_html_paranoid Paranoid HTML Cleaner mode, bans all src= links
and enables feat_html_noexe paranoia as well.
Have fun, and please let me know if this release breaks anything!
-- Bjarni Rúnar Einarsson firstname.lastname@example.org http://bre.klaki.net/
PGP: 02764305, B7A3AB89