Re: The QP encoding issue - again!

From: Bjarni R. Einarsson (
Date: Tue 18 Nov 2003 - 02:26:54 GMT

  • Next message: James Nonya: "Odd message"

    On 2003-11-17, 14:50:05 (-0600), Dustin Puryear wrote:
    > Bjarni, we had actually considered this very solution. That is, that if a
    > binary file is QP encoded to assume that it was done by a Windows client.
    > However, we decided to just try and stop Anomy from making the CRLF/LF
    > conversion in the first place. Good idea?

    No, because it won't work. Anomy isn't doing the CRLF/LF
    conversion (corruption), the QP standard is. There IS some other
    CRLF/LF stuff going on in Anomy, but it's not the cause of the
    attachment corruption. The other stuff is Anomy trying to
    compensate for the fact that some MTAs will give it data using a
    CRLF newline convention, while others will give data using an LF

    To illustrate why this is all the QP standard's fault, consider
    this three-character binary string:


    It actually has THREE different valid QP encodings, depending on the
    source machine's newline convention:

      Unix: =OD<CR><LF>=0D (The CRs get encoded, LF becomes CRLF)
      DOS: <CR><LF>=0D (The trailing CR gets encoded)
      MAC: <CR><LF>=0A<CR><LF> (The LF gets encoded, CRs become CRLF)

    Now, if we decode each of those encoded strings using a Unix line-feed
    convention, we get these results:

      Unix->Unix: <CR><LF><CR>
      DOS->Unix: <LF><CR>
      MAC->Unix: <LF><LF><LF><LF><LF>

    Fun, isn't it? :-)

    This simple example explains why sending binary data QP encoded is
    a broken idea. As you can see, if a Windows or Mac user sends a
    Unix guy a PDF binary which has been QP encoded, then the file will
    probably not get decoded properly - even without Anomy being
    involved. In general, sending binaries from one OS to another using
    QP encoding just won't work.

    However, Anomy does magnify the problem and cause it to surface
    more often. This is because internally, Anomy assumes Unix
    linefeed conventions at the moment (this assumption is inherited
    from the CPAN QP module), which means it causes new problems when
    Windows users send QP encoded binaries to other Windows users. This
    is probably by far the most common case, so brekaing it causes
    quite a bit of grief. But keep in mind that it will also cause
    problems when Mac users send QP coded binaries to other Mac users.

    Since Windows is the most common (and quite possibly also the OS
    with the most brain damaged mail clients), switching Anomy to Windows
    newline conventions internally will decrease breakage quite a bit.

    I may also add logic to try and detect the operating system of the
    mail sender from message headers, to automatically guess when to
    use Mac or Unix newline conventions instead... I'm not sure. But
    this all boils down to guesswork and making the best of a bad
    situation - the only real solution is to get people to upgrade to
    proper mail clients which don't QP encode binaries.

    > Anyway, I would love to see this code and the release that uses it. Good
    > work and thanks!

    I'll do my best to make it available as soon as possible. :)

    Bjarni R. Einarsson                           PGP: 02764305, B7A3AB89                -><-    

    Check out my open-source email sanitizer: Spammers, please send lots of mail to:

    Was I helpful? Let others know:

    hosted by