Dear all,
after encountering weird corrupted attachments from the Sanitizer, I
dug into the source and ended up at the following function in
MIMEStream.pm:
sub DecodeBase64
{
my $reader = shift;
my $line = shift;
# This hacks the decoder to handle mangled Base64 text properly, by
# properly ignoring white space etc. Note that this will lose the
# last 1-3 bytes of data if it isn't properly padded. We also record
# the encoded line-length, so we can re-encode stuff using the same
# length.
#
if (!$reader->{"DecodeBase64llen"})
{
$line =~ s/[^A-Za-z0-9\/+\012=]+//gs;
> my $nlpos = int((3*(index($line, "\012") + 1)) / 4);
$line =~ s/\012//gs;
> my $llen = int((3*length($line)) / 4);
my $t = $llen;
$t = $nlpos if (($nlpos < $llen) && ($nlpos > 0));
$reader->{"DecodeBase64llen"} = $t;
}
else
{
$line =~ s/[^A-Za-z0-9\/+=]+//gs;
}
$line = $reader->{"DecodeBase64"} . $line;
> $line =~ s/^((?:....)*)(.*?)$/$1/s;
$reader->{"DecodeBase64"} = $2;
return decode_base64($line);
}
All lines marked with > presuppose that the linelength of a base64
encoded attachment is a multiple of four. Unfortunately, the mailer
that caused the corruption uses 70 characters (all other Mailers I
have encountered so far use a line length that is divisible by
four). Now, from my reading of RFC 2045 that seems to be
OK. Unfortunately, I think, that the changes to the Sanitizer to
accomodate this behaviour would be quite substantial, since there is
an inherent assumption about a line never having anything to do with
the following line. Any comments?
--Joerg Lenneis 16095@xyz.molar.is