anomy-list

Custom parser configuration in Sanitizer

From: Sterling Hanenkamp (23575@xyz.molar.is)
Date: Mon 08 Oct 2001 - 18:41:42 UTC

  • Next message: Bjarni R. Einarsson: "Re: Sending From and Recpient to scanner"

    Bjarne,

    I've run into a bit of a snag working on EVPS. EVPS was originally written to use an anonymous pipe to store the email after the initial pass over the input to determine the configuration file--when the configuration needed to be determined based upon sender. However, this is a problem because a pipe is only capable of holding a few kilobytes of data. This means that my initial tests were successful, but when I ran some long emails through it blocked forever because pipe filled up and I wasn't reading from the pipe until finishing writing. So, I added yet another file IO by writing to a file in /tmp and then using that file as input--nearly completely defeating one of the primary features of MIMEStream, but I had no other quick fix and I'm already 3 weeks behind schedule for launching this service. This seemed to work initially, but by the next morning EVPS was again blocking or otherwised hung on about 30 emails it recieved that night.

    I have determined that some aspect of file IO is causing the problem, but I have not been able to determine what. I could probably rack my brain on this for a few more days and find the solution, but at this point I think I'd be better off just spending those few days bypassing the problem altogether by performing the post-header configuration you initially suggested. To do this I would need to create a parser routine that performs the configuration after looking at the header and then would start the normal parsers working to scan the email.

    Before I get started, I would like to make sure I understand how I should do this so that I don't have to back up and start over again. I also want to make sure this is fully integrated so that when updates come out in the future little or no change is necessary to evps.pl to use the new code.

    Here's what I think should be done. Please let me know where I'm incorrect or where you think I could improve on this design. First, I would create a Sanitizer object. From that object, I would get a reference to the default parsers it's using and store that in a variable. Then, I would replace that with my own parser hash reference pointing to my own parsing routine that will configure the sanitizer. I then call sanitize. Inside my parser, I read the headers, determine which configuraiton to use and load that configuration. Then, I replace the parsers hash reference with the original I stored and then call the parser that would have been run had I not replaced them in the first place.

    This is a very general outline of what I think needs to happen. I'd appreciate your input on this problem.

    Thanks,
    Sterling

    --
    <>< ><> <>< ><> <>< ><> <>< ><> <>< ><> <>< ><>
     Sterling Hanenkamp
     Software Consultant
     Network Resource Group, Inc.
     1105 Hylton Heights
     Manhattan, KS  66502
     (785) 776-5878
     23665@xyz.molar.is
    



    hosted by molar.is