Re: Sanitizer HTML spew coming out in-line

From: Bjarni R. Einarsson (
Date: Tue 05 Feb 2002 - 16:33:54 UTC

  • Next message: David Tilley: "file list rules processing"

    On 2002-02-01, 13:06:36 (-0500), Jim Rosenberg wrote:
    > Last night I just upgraded to 1.48 from a much older version. I got
    > one complaint from a user who got a huge piece of sanitizer log "in-
    > line" -- visually it looks like it's inside a box. This is a heavily
    > forwarded E-mail, and it looks like it was originally edited in Word
    > as HTML. It is a prodigious amount of defanging.

    Yes - in common configurations (feat_log_inline = 1) the sanitizer's
    logging is a "best effort" thing - it basically sticks the log "anywhere
    it'll fit" as it process the message stream. The rationale for this is
    that by the time it reaches the end of the message there may not be any
    usable/visible places left.

    Enabling "feat_log_inline = 2" tells the sanitizer to rewrite all
    messages to be multipart/mixed, text/plain or text/html. This
    guarantees that the log can always be appended, so the "best effort"
    guesswork is omitted - it will always add the log once to the end of the
    message, either as an attachment (for multipart messages), text or
    inline HTML.

    The main drawback of this approach is it increases the number of
    modifications made to the messages and may be bad for compatibility -
    which is why the Anomy distribution doesn't have it enabled by default.
    But I've been using it for quite some time now and feel confident enough
    to recommend it.

    > This is one of those users who insists he "must" ("as a matter of
    > record") print and file every E-mail on certain matters, and he is
    > not amused at having all this stuff in his E-mail.

    The feat_log_inline = 2 stuff would probably help him then, since it'll
    tend to stick the sanitizer log in a seperate file/attachment.

    > It seems to me I recall reading in the change log (or somewhere) that
    > there's now some finer grain control over HTML defanging. Is there
    > somewhere I can read up on this?

    Yes, but it's at the source-code level at the moment, haven't had any
    time to write documentation. Take a look at Anomy/
    There are a bunch of tables there which you can modify to add more
    tags/attributes to the list of acceptable HTML.

    > I don't know how other people feel, but as a workaday system
    > administrator, I'm more thankful for Anomy than just about any other
    > tool I can think of. It means the difference between being able to

    I'm glad to hear it. :)

    > Any advice on how to tone down the defanging of Microsoft {X,HT}ML
    > crapola?

    In your case I'd take a look at simply modifying your logging
    configuration to make the defanging less annoying to the recipient.

    Regarding the defanging itself - the biggest single step to improve it
    and decrease "false positives" would be to teach HTMLCleaner about CSS -
    what kind of CSS is safe and what isn't. In second place would be a
    list of Microsoft-specific tags, along with some sort of estimate of how
    much of a security risk they pose.

    Bjarni R. Einarsson                           PGP: 02764305, B7A3AB89                -><-    

    Check out my open-source email sanitizer: Spammers, please send plenty of email to:

    hosted by