($Id: CHANGELOG.sanitizer,v 1.94 2006/01/02 03:18:56 bre Exp $) NOTE: Sanitizer development is for the most part sponsored by FRISK Software International, http://www.f-prot.com/. Please consider buying their anti-virus products to show your appreciation. Revision 1.75: (January 02, 2006) Added code to recognize the most common/important file formats based on actual file contents, not just file name and MIME-type. Added magic to detect WMF files, to allow reliable blacklisting of said files, see http://isc.sans.org/diary.php?storyid=994 for info. Added generic code to detect when people try to disguise non-JPEG/GIF/PNG as such files and defang such attachments. Removed the references from the HTML Cleaner's output, the owners of the linked web sites were unhappy because their URLs were being associated with spam as a result of being in Anomy's verbose logs. Revision 1.74: (August 05, 2005) Fixed a bug where disinfection wouldn't result in the modification count of a message being incremented. This didn't matter to users of sanitizer.pl, however some 3rd party systems relied on the modification count to determine whether to use the Sanitizer's output or not. This is a critical fix for those systems. Revision 1.73: (August 05, 2005) Fixed a bug in MIME parser when encountering junk headers at the very beginning of a new MIME part. Cleaned up some of the test cases for more recent versions of F-Prot AntiVirus, added a version check to the F-Prot regression test. Revision 1.72: (July 10, 2005) Fixed bug in code which detects Date: header buffer overflows, it was false-alarming on Yahoo DomainKeys headers. Lengthened maximum word-length in headers from 196 to 256 bytes, again to decrease the odds that we'll break DomainKeys. Added sanity checks to configuration parser, to make sure that settings such as msg_defanged and msg_blacklisted, which get used within message headers contain only valid characters (0-9, A-Z, a-z and -). Test-cases sanitizer.uu-rfc822 and sanitizer.logging are updated. Revision 1.71: (May 25, 2005) Fixed minor bug in quoted-printable encoding, as reported by Michal Weinfurtner . Fixed crashing behavior when multiple Content-Transfer-Encoding headers were present in the same message part. Added mailblogger.pl, to the distribution. This program has nothing to do with security, but uses the MIMEStream parser to extract images from e-mail and can subsequently generate thumbnails and re-post both text and images to a web-site, to implement e-mail->www gateway functionality. I use it to blog from my cell phone. :-) Revision 1.70: (January 4, 2005) Raised limits on max header size from approximately 64k to 256k. Made error reporting Added support for new F-Prot Daemon result codes to Sanitizer::FProt. Revision 1.69: (September 2, 2004) Added zip_policy.pl from Advosys (http://advosys.ca/) to the contrib/ directory, after being invited to do so by Derrick Webber of Advosys. Added sanitizer.procmail ruleset to contrib/, illustrating how to implement a quarantine and add custom headers to infected e-mails. Fixed priority bug in filename detection code, which would in some cases give higher priority to Content-IDs than it gave to the MIME filename attributes. Made the file-name/MIME-type sanity checks configurable (default on) via. the feat_sane_names variable. Set to 0 to disable. Wrote very simple HTTP client for communication with F-Prot daemon, thus eliminating the dependancy on LWP::Simple. Fixed incorrect changelog entry below (in the entry for rev 1.57 the word "ScanFile" was used where it should have said "FileScan"), and added support for scripts which want to pass the name of a detected infection using the a line "Anomy-FileScan-VirusName: blah" like. This makes the following new variables available to the file replacement tempalte: %VIRUSNAME - Propogated from Anomy-FileScan-VirusName %SUMMARY - Propogated from Anomy-FileScan-Summary %DESCRIPTION - Propogated from Anomy-FileScan-Description This corrects problems, implements and expands on suggestions (posted here http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=235352) by Derrick Hudson (dman at dman13.dyndns.org). Revision 1.68: (May 7, 2004) Added system_io_file variable to allow plugging in of custom replacements for the IO::File module, to facilitate internal FRISK development. Fixed a problem with the mime-type auto-detection code which would corrupt certain messages when feat_log_after was enabled. This probably also have caused problems in other cases, but so far none have been reported. Include the TNEF hooks in Sanitizer in default distribution and made inclusion of Anomy::TNEFStream "lazy" to save cycles in one-shot modes. Note that the Anomy::TNEFStream modules still isn't distributed by default. Tuned the MIME parser to catch more of the exploits illustrated on http://testvirus.org/. Also fixed a bug in the position counting. These two changes combined effect almost all of the test cases (lines containing pos= and MIME info almost all change). Added the following options to configure the HTML cleaner (all are off by default): feat_html_noexe Disallow links to executables feat_html_unknown Allow unknown HTML tags feat_html_paranoid Paranoid HTML Cleaner mode, bans all src= links and enables feat_html_noexe paranoia as well. Revision 1.67: (March 23, 2004) Added code to decrease the odds that attachments with content-IDs ending in ".com" get mistakenly treated as executables. Tweaked MIME parsing to catch a few more odd virus-generated messages. Obfuscated some of the testcase results using simple rot13 encoding, so the Anomy .tar.gz file wouldn't be flagged as "infected" by various virus scanners. Revision 1.66: (January 12, 2004) Fixed an endless loop caused by an error in the uuencode detection routine. Thanks to Paul M. Hirsch for reporting this one. Revision 1.65: (December 17, 2003) Fixed a bug with attachment deletion, in some cases things wouldn't actually get removed, but would be appended after the "attachment removed" message. Revision 1.64: (December 09, 2003) Modified handling of quoted-printable encoded messages in order to improve security and avoid the often-reported "PDF corruption problem" caused by silly mail clients which QP-encode binary files. Handling of broken MIME created by a couple in-the-wild viruses has also been fixed. This is a highly experimental release, use with care! Revision 1.63: (July 08, 2003) Fixed a bug in HTML cleaner to do with XML-style tag termination. Improved the MIME parser to handle better the obfuscated MIME created by the Ronoper worm. Revision 1.62: (June 19, 2003) Updated HTMLCleaner to avoid endless loop in s/// on Perl 5.00503. Updated HTMLClenaer and Sanitizer to allow users to specify in the configuration file certain replacement tags or replacement attribute names to use in place of "DEFANGED_". Added the default replacement tag

for defanged

tags. So now, defanged
tags will be replaced with

instead of just The syntax for configuring the HTML cleaner is rather ugly, and may be changed in the future. Updated testall.sh script to warn the user if he attempts to use the Sanitizer in an UTF-8 enabled environment (LANG=*.UTF8) and added a instructions regarding unicode issues in the file UNICODE.TXT. Test cases have been updated to work properly on FreeBSD 4.6.2 with Perl 5.00503, and on Unicode enabled RedHat 8 or 9 machines. Revision 1.61: (unreleased) Added recognition of the Content-Description and Content-ID headers, made the file name policy code treat them as if they might contain filenames as well in some cases. Made the MIME parser tolerate spaces in attribute definitions (e.g. Filename = "Foo.exe" instead of Filename="Foo.exe"). Also updated the MIME parser so "type=..." attributes would be interpreted separately from the Content-Type. Minor fixes to filename policy code. Minor fixes to the testcases. Revision 1.60: (May 28, 2003) Minor update to MIME type checking rules, to allow more legal MIME types. Made the multipart detection code less aggressive, in small text messages it would mistake common ascii-graphic signatures for message boundaries and mess up the parsing quite badly. Made the filename policy code check ALL possible file names against each rule, instead of just checking the "default" one. If feat_mime_files is set, then the default file-name for that mime type will be checked as well. This is a major improvement to security, but requires that filename rules are ordered so that that all DROP/DEFANG/MANGLE rules precede any ACCEPT rules. Made the sanitizer read /etc/mime.types (if it exists) to generate a more complete list of default filenames for unnamed parts. Revision 1.59: (May 9, 2003) Fixed detection and handling of GPG/PGP messages. This corrects the broken PGP handling introduced by revision 1.58, as reported by Rick Johnson (rjohnson at medata dot com). Revision 1.58: (May 8, 2003) Added the "try" prefix for use within configuration files which include other configuration files. This makes loading of the file optional - if it doesn't exist or is unreadable the Sanitizer simply ignores it. Minor change to the sanitizer.appledouble check, as suggested by Dag Nummedal (nummedal at ed.ntnu.no). Made text/html parts exempt from feat_force_name behaviore, to solve unfortunate display problems with Outlook clients as reported by David Santinoli (david at santinoli . com). Also fixed a minor bug in the feat_force_name behavior. Made minor improvements to header parsing code, to handle the broken MIME generated by a number of viruses out there. Changed the header rewriting code to quote attribute values only when they contain non-alphanumeric characters. This is done because some versions of Eudora apparently don't like format="flowed", but do like format=flowed. Fixed a minor bug in the HTML cleaner, where it was missing some invalid attribute definitions which Explorer will accept. Thanks to Paul Wallingford (paul at cybergestalt dot net) for the report. Added the Advosys tnef2multipart.pl script to the contrib/ directory. Added caching to the QP encode/decode routines, to decrease the number of times that a different QP encoding strategy is chosen for a given string. This won't solve QP-related encoding/decoding problems entirely, but it may decrease their frequency. More changes to MIME parser, to handle broken messages created by the Bridex worm and a few other odd massmailer worms/viruses. This, again, modifies white-space and log offsets in the test-cases. Revision 1.57: (November 14, 2002) Fixed a MIME header bug introduced in revision 1.56 by the Bugbear fix. The bug was causing certain MIME headers to be corrupted, primarily effecting recipeints of signed or encripted mail. Thanks to Ton Vandepoel (tom.vandepoel at be.ubizen.com) for the bug report. Augumented the config file syntax to allow any directive to be prefixed with "before X" or "after X ", where X is a valid Unix timestamp. Revision 1.56: (October 22, 2002) Modified the MIME attribute parser slightly, so it will detect the entire filename as sent by Bugbear and other viruses, in spite of those names not conforming to the MIME standard. Added more detection of potential security abuses based on invalid RFC822 comments. Added expiramental support for more detailed communication between scanners and the Sanitizer itself. STDOUT from file scanners will be scanned for the following tokens: Anomy-FileScan-Result: CODE Anomy-FileScan-Summary: summary Anomy-FileScan-Description: Text description of result Anomy-FileScan-NewName: newname.bla Anomy-FileScan-NewFile: /path/to/new/attachment/data Anomy-FileScan-NewType: MIMETYPE Anomy-FileScan-NewEnc: ENCODING The FileScan-Result code overrides the exit code of the scan program, and the summary and description fields override the defaults built into the sanitizer. As a side effect of providing a new name or new data file, the part encoding will be forced to Base64 - unless some other encoding is explicitly requested. A sample scanner script which uses some of these features is the zip_script in the contrib/ directory (a "scanner" which will encapsulate all "scanned" attachments in a ZIP file). Revision 1.55: (October 08, 2002) Fixed dependancies in sanitizer.pl, so it no longer requires LWP::Simple unless people are actually using the FProt daemon scanner. Modified the HTMLCleaner rules to avoid a rendering bug in Eudora's built-in HTML viewer, which was triggered primarily by defanging of Outlook's HTML messages. Revision 1.54: (September 18, 2002) Added the hcp:// protocol to the list of banned href= and src= destinations in HTMLCleaner.pm, for reasons discussed here: http://online.securityfocus.com/archive/1/287482/2002-08-15/2002-08-21/0 Tightened security on href= attributes in general. Revision 1.53: (September 17, 2002) Fixed a minor bug to do with F-Prot daemon support in 1.52. Revision 1.52: (unreleased) Added built-in support for F-Prot Antivirus for Linux, both the small business (command line) and enterprise (daemon) versions. The command line client is auto-detected and used if present, use of the daemon can be requested by invoking sanitizer.pl with "-fprotd" as the first argument. Enterprise customers will be able to enable automatic disinfection of incoming messages, by adding the "-disinf" parameter to the fprotd command line in the file_list_2 scanner definition, and will see the name of the detected threat in the Sanitizer logs. The default policy for infected content is to mangle the filename and MIME type, but still pass the data on to the user. Upgrading this to a "drop" or "save" policy is recommended. Note: Use of F-Prot on machines where it is available be manually disabled in two ways: either invoke sanitizer.pl with "-nofprot" as the first argument or redefine file_list_2. Revision 1.51: (unreleased) Created feat_no_partial (enabled by default), which defangs any incoming message/partial messages, to address the problems described in http://www.securiteam.com/securitynews/5YP0A0K8CM.html Added support treating the first part of a message/partial message the same was as if it were message/rfc822. This should catch any security risks present in the *first part*, but not any subsequent parts. Fixed a bunch of other minor things. Revision 1.50: (unreleased) Improved boundary guessing routine and header parser to deal with still more non-RFC compliant messages. This still needs work. Fixed the problem with $m being in the range 0-11 instead of 1-12, when used in filename templates. Thanks to Paulius Bulotas (paulius at kaktusas.org) for the report. Fixed minor logging bug to do with non-MIME messages and feat_log_inline=2. Added a few tags and attributes common to email to the HTML cleaner, to lower the noise level. Added GNU GPL, GNU LGPL and Artistic Licenses to the distribution (in the file COPYING). This bloats the distribution somewhat, but is necessary for strict compliance with the GNU licenses. Added module for interfacing Anomy with the daemon version of F-Prot Antivirus for Linux, which is significantly faster than the command-line version. The daemon version will most likely be made available to purchasers of enterprise-class licenses for F-Prot Antivirus for Linux. Fixed a bug when truncating unusually long MIME fields, as reported by Will Day (wd at hpgx.net). Updated default configuration to recognize blacklisted filenames with trailing dots (which are ignored by Windows). Added protection against MIME recursion DoS attacks. The Sanitizer itself was vulnerable to such attacks. The default maximum allowed recursion level is 20, which should be more than enough. This value can be tweaked by setting the max_mime_depth variable. Fixed a problem which occured when re-encoding Base64 encoded attachements with odd line lengths. Thanks to Joerg Lenneis for the bug report. Revision 1.49: (February 15, 2002) Fixed a minor white-space related bug in MIME header parsing. Made the configuration file parser tolerant of Windows/DOS formatted text. Created a seperate distribution containing only the stuff used by the HTML cleaner, for users of David F. Skoll's MIMEDefang program. Made minor tweaks to the messages logged by the HTML cleaner, added code to prevent re-defanging of HTML tag attributes. Implemented a much more elegant fix to the lack-of-trailing-newlines buglet fixed in 1.47. This should make all those extra newlines go away again... Added "branch" functionality to the file policy mechanism. By appending ^N (where N is a number) to an "unknown" or "warn" policy, the policy matcher can be made to branch to a given rule instead of evaluating the next rule sequentially. Improved UUencoded attachment detection to handle null modes and protect users against the silly Outlook "begin blah" annoyance, as described here: http://www.rodos.net/outlook/#begin Made the content header parser deal better with certain RFC-incompliant MIME types generated by broken PGP plugins. Added a feat_fixmime check to rewrite the offending headers so they comply with standards. Fixed broken file_list_7 in default policy - thanks to Tuomas Lukinmaa, for bringing this to my attention. Added code to defang bare CR characters in message headers, due to the following Outlook bug (enabled by feat_fixmime): http://www.openoffice.nl/special_interest/outlookbug.html Revision 1.48: (January 04, 2002) Happy new year! Updated copyright notices again. Improved HTMLCleaner to properly handle STYLE tags which have attributes. Thanks to Andrew (andrew at ledge.co.za) for pointing this out. Explicitly set all temporary files opened to binary mode, to improve portability. Improved newline handling code a bit more - it now properly handles differing newline conventions within embedded/encoded parts of the same message. Revision 1.47: (not released) Added feat_newlines, to allow people to specify what sort of newlines to use in the sanitizer's output. Default (0) is to use a newline convention "autodetected" from the first chunk of data. Attempted to address a number of platform dependant newline issues within the code itself - this still needs some work though. Added the "warn" policy, which acts just like "unknown" except for the fact that it also increments the modification counter. Added a test for this to the filenames test. Fixed a buglet to do with a lack of trailing newlines in parts which are re-encoded as 8bit instead of Base64 when feat_log_after is in use. This has the side effect of adding newlines to almost all of the test cases. Updated copyright notice in a few files to mention the year 2001. Seemed fitting to fix that before we enter 2002... Thanks to Dave Cridland for pointing that out. :-) Moved John Hardin's macro scanning code into the module Anomy::Sanitizer::MacroScanner, to facilitate sharing of that code between different implementations of the sanitizer engine. Revision 1.46: (skipped) Revision 1.45: (December 12, 2001) WARNING: Scoring works again - but not like it used to! WARNING: The default configuration has been updated quite a bit, and does some NEW THINGS. You have been warned. Most test cases were modified for this release (I've gotta start releasing things more often...). Almost complete rewrite of HTML sanitization code, to switch from a default-allow to default-deny strategy. Primary benefits: - Old problem with