anomy-list

Re: Content-ID problems - again!

From: Seth A Robertson (121905@xyz.molar.is)
Date: Tue 06 Apr 2004 - 23:44:34 GMT

  • Next message: Bjarni R. Einarsson: "Re: Content-ID problems - again!"

    I love it!...Anomy reads Content-IDs as filenames, and the Anomy list reads
    Content-IDs as email addresses (and masks them with "xyz.molar.is")!

    So, anywhere in my previous post that you see "xyz.molar.is" within the
    anomy output, imagine a Content-ID with a domain name at the end, and it
    should begin to make sense...

    Seth Robertson

    First off, kudos to the Guru Einarsson for a great product that helped
    stifle the MyDOOM and SoBig.F problems before we even knew they
    started--I've always been very impressed with the (unique stream-based)
    design and the performance of the product. We were running version 1.55 for
    over a year, but, on more than one occasion, I was almost abused like a
    tackling dummy over the months that our users suffered from PDF mangling,
    so once the fix came, I upgraded (and champagne corks popped in the techie
    closet). However, since v 1.62(/1.61), I've been plagued by the Content-ID
    problem, which occurs much more frequently than the PDF problem did. I did
    (and to some degree I still might) consider rolling back to 1.55 because
    occurrences of the PDF problem seemed to be much less frequent than the
    Content-ID problem. I've been using Rick Johnson's updates as a weathervane
    on this problem, and so I was very excited when he reported how
    comprehensively the fix corrected the problems he experienced with his
    different test cases...

    Here is the output of one of our ".com" test cases before the update
    (running on v1.66):

    ******************************************************

    Part (pos="2748"):

    SanitizeFile (filename="121992@xyz.molar.is, ABM
    SOW Overview 032504.ppt", mimetype="application/vnd.ms-powerpoint"):

    Match (names="121992@xyz.molar.is", rule="1"):

    Enforced policy: save

    Replaced mime type with: text/plain

    Replaced file name with: DEFANGED-0.txt

    ******************************************************

    From monitoring other list member's reports, I would say this is the
    exemplar (anomy uses the ".com" Content-ID as the filename and makes a
    policy decision based on this 'filename'). I applied the 1.67 update to
    upgrade from 1.66. Running this ".com" test case again, notice the correct
    filename, names, and policy decision (running on v1.67):

    ******************************************************

    Part (pos="2748"):

    SanitizeFile (filename="ABM SOW Overview 032504.ppt",

    mimetype="application/vnd.ms-powerpoint"):

    Match (names="ABM SOW Overview 032504.ppt", rule="2"):

    Enforced policy: accept

    ******************************************************

    Apparently, this particular 1.67 change is constituted by this line added
    in Sanitizer.pm (line 1291)?:

    ******************************************************

    # Skip @foo.com type names, they shouldn't match anyway.

    next if ($v =~ /@S+.com$/i);

    ******************************************************

    My initial fear was that this regex should only be a fix for commercial
    domains. So, I tried re-testing the exact same message, but this time,
    doctoring the Content-ID so that it has a ".org" Content-ID TLD (this is
    the ".org" test case)...

    ******************************************************

    Part (pos="2748"):

    SanitizeFile (filename="122058@xyz.molar.is,

    ABM SOW Overview 032504.ppt",

    mimetype="application/vnd.ms-powerpoint"):

    Match (names="ABM SOW Overview 032504.ppt", rule="2"):

    Enforced policy: accept

    ******************************************************

    Don't you love ambiguous results?! While filename is an array of both the
    ContentID and the correct filename (this is the buggy behavior we saw
    before), names takes only the correct filename (compare the names values
    above with the 1.66 results further above), and thus enforces the correct
    policy, and subsequently doesn't change the actual filename. As far as why
    names takes only the correct name (and not also the ContentID), it's the
    area of greatest confusion for me...I think this is the relevant
    decision-making code (line 1332)?:

    ******************************************************

    (my @fn_match = grep { $_ =~ $conf->{"file_list_$i"} } @filenames))

    {

    $ofn = $$fnp = $fn_match[0] if (@fn_match && ($ofn eq $$fnp));

    my %fn_match = ( names => join(', ', @fn_match)) if (@fn_match);

    my $mlog = $log->sublog("Match", SLOG_INFO,

    { rule => $i, %fn_match });

    ******************************************************

    Anyway...this attachment does pass through just fine...but it indicated
    that it may just be a matter of time until a message is sent with a
    different ContentID TLD would cause a problem...

    That did eventually happen with several real emails--below you can see the
    filename being replaced with the ContentID (this is the ".gov" test case):

    ******************************************************

    Part (pos="7407"):

    SanitizeFile (filename="122124@xyz.molar.is,
    MT800A67LNBC20040401012427.TXT", mimetype="text/plain"):

    Match (names="122124@xyz.molar.is,
    MT800A67LNBC20040401012427.TXT", rule="2"):

    Enforced policy: accept

    Replaced file name with: 10__=0ABBE4FADFD182C58f9e8a93df938690_HUD.GOV

    ******************************************************

    While the correct policy decision was made, the file has been renamed. At
    this point, I was in the situation where I had to decide between rolling
    back and having PDF problems or staying current with reduced--but still
    very present--ContentID problems, and I was leaning towards rolling back,
    purely because the PDF problems occur less frequently (PDFs just aren't
    that popular I guess).

    Basically, I want the PDF newline-encoding benefits of the newer version
    (which was a great fix--especially in light of client misbehavior being the
    catalyst) and the filename handling benefits of the older version...

    So, armed with a dash of PERL knowledge, I commented out the line that adds
    the Content-ID to the filename list altogether, and the 1.67 fix which
    potentially skips this line. I suspect I could have commented out the
    entire block, but I figured surgically removing this particular line would
    preserve any other magic my mortal powers are too inept to detect (listing
    begins at line 1286):

    ******************************************************

    foreach my $h ("_description", "_id")

    {

    foreach my $v (map { ($_->{"data"},

    $_->{"raw"}) } $part->GetMIMEAttributes("(?i)^$h$"))

    {

    $v = $1 if ($v =~ /^<+(.*?)>+s*$/);

    # Skip @foo.com type names, they shouldn't match anyway.

    # next if ($v =~ /@S+.com$/i);

    # push @filenames, $v if ($v =~ /./);

    }

    }

    ******************************************************

    Re-running the ".org" test, we get the good unambiguous output we want
    (notice that filename doesn't include the ContentID):

    ******************************************************

    Part (pos="2748"):

    SanitizeFile (filename="ABM SOW Overview 032504.ppt",

    mimetype="application/vnd.ms-powerpoint"):

    Match (names="ABM SOW Overview 032504.ppt", rule="2"):

    Enforced policy: accept

    ******************************************************

    ...and re-running the ".gov" test (which failed even in 1.67), we also get
    good results (no renaming of the file):

    ******************************************************

    Part (pos="7407"):

    SanitizeFile (filename="MT800A67LNBC20040401012427.TXT",

    mimetype="text/plain"):

    Match (names="MT800A67LNBC20040401012427.TXT",

    rule="2"):

    Enforced policy: accept

    ******************************************************

    I've also run about 10 other various tests on mails containing attachments
    of different sizes and flavors (extensions), and containing messages with
    multiple extensions which match different policies, and all functionality
    appears to be preserved...but I'm very concerned that this may have
    negative repercussions too. I'm not very clear on the potential benefit
    that 1.62/1.61 provided by substituting the ContentID for the filename (or
    supplementing it within a list of possible names?), unless it's for the
    condition where there is no filename at all in the first place.

    Any comments, ideas? Or is there anyone else out there still fighting this
    issue brave enough to also test this fix and let me know how it works for
    them? It's running live in my environment as we speak, so I'll post a
    warning if I see any bad side-effects on my end (and I'll keep my fingers
    crossed)...

    Seth Robertson



    hosted by molar.is