On 2003-11-17, 14:50:05 (-0600), Dustin Puryear wrote:
> Bjarni, we had actually considered this very solution. That is, that if a
> binary file is QP encoded to assume that it was done by a Windows client.
> However, we decided to just try and stop Anomy from making the CRLF/LF
> conversion in the first place. Good idea?
No, because it won't work. Anomy isn't doing the CRLF/LF
conversion (corruption), the QP standard is. There IS some other
CRLF/LF stuff going on in Anomy, but it's not the cause of the
attachment corruption. The other stuff is Anomy trying to
compensate for the fact that some MTAs will give it data using a
CRLF newline convention, while others will give data using an LF
convention.
To illustrate why this is all the QP standard's fault, consider
this three-character binary string:
<CR><LF><CR>
It actually has THREE different valid QP encodings, depending on the
source machine's newline convention:
Unix: =OD<CR><LF>=0D (The CRs get encoded, LF becomes CRLF)
DOS: <CR><LF>=0D (The trailing CR gets encoded)
MAC: <CR><LF>=0A<CR><LF> (The LF gets encoded, CRs become CRLF)
Now, if we decode each of those encoded strings using a Unix line-feed
convention, we get these results:
Unix->Unix: <CR><LF><CR>
DOS->Unix: <LF><CR>
MAC->Unix: <LF><LF><LF><LF><LF>
Fun, isn't it? :-)
This simple example explains why sending binary data QP encoded is
a broken idea. As you can see, if a Windows or Mac user sends a
Unix guy a PDF binary which has been QP encoded, then the file will
probably not get decoded properly - even without Anomy being
involved. In general, sending binaries from one OS to another using
QP encoding just won't work.
However, Anomy does magnify the problem and cause it to surface
more often. This is because internally, Anomy assumes Unix
linefeed conventions at the moment (this assumption is inherited
from the CPAN QP module), which means it causes new problems when
Windows users send QP encoded binaries to other Windows users. This
is probably by far the most common case, so brekaing it causes
quite a bit of grief. But keep in mind that it will also cause
problems when Mac users send QP coded binaries to other Mac users.
Since Windows is the most common (and quite possibly also the OS
with the most brain damaged mail clients), switching Anomy to Windows
newline conventions internally will decrease breakage quite a bit.
I may also add logic to try and detect the operating system of the
mail sender from message headers, to automatically guess when to
use Mac or Unix newline conventions instead... I'm not sure. But
this all boils down to guesswork and making the best of a bad
situation - the only real solution is to get people to upgrade to
proper mail clients which don't QP encode binaries.
> Anyway, I would love to see this code and the release that uses it. Good
> work and thanks!
I'll do my best to make it available as soon as possible. :)
-- Bjarni R. Einarsson PGP: 02764305, B7A3AB89 102413@xyz.molar.is -><- http://bre.klaki.net/Check out my open-source email sanitizer: http://mailtools.anomy.net/ Spammers, please send lots of mail to: 102535@xyz.molar.is
Was I helpful? Let others know: http://svcs.affero.net/rm.php?r=Juggler