Date: Fös 18 Ágú 2000 - 18:38:18 UTC

On 2000-08-18, 12:30:16 (-0500), mark david mcCreary wrote:
> So instead of checking the content-type, I should probably scan the body,
> and only invoke the defang html scan if <html> is found. Or trigger off of
> either the header or body indicating html.

Note that defanging HTML doesn't mean removing it - it just mangles
tags which might be considered dangerous, such as javascript
related stuff etc.

The changes it makes are also pretty obvious, so they are easy to
remove after the fact with an editor if people need the original

So for the most part, always enabling the HTML defanger should be
relatively safe, and shouldn't keep people from using HTML-mail.

It may interfere if people are using mailing lists to cooperatively
write web sites (exchanging complex HTML files as attachments), but
that can be avoided by compressing the files before sending them or
just disabling the HTML sanitization for those lists.

Currently the only serious problem with the HTML defanger, is that
it is a little too sensitive and may defang stuff that isn't
strictly HTML - all text/* parts are scanned for HTML. This is
actually a relatively complex issue, since it's so hard to tell
what mail readers will interpret and what they won't - currently
the sanitizer is pretty sensitive/strict, since that is the safest
way to do it. But it is also the most disruptive for normal use.

At the moment I'm considering adding a "feat_html_strict" variable,
which would allow people to choose whether to HTML-sanitize all
text parts (the current behavior) or only parts which either have
the right MIME type or contain recognizable HTML tags near the top
of the file.

A third option would be to only HTML-sanitize inline HTML stuff,
leaving attachments alone. This would be pretty insecure, but
still better than nothing for lists where people are actually
swapping complex HTML files.

Does this sound like a good compromise?

