Re: HTML filtering and STYLES

From: Paolo (
Date: Thu 15 Dec 2005 - 14:32:46 GMT

    On Thu, Dec 15, 2005 at 10:16:44AM +1100, Noel Clarkson wrote:
    > We have not fiddled with which HTML bits get filtered, just using the
    > default, and I'm really not that sure what might be a good compromise,
    > or if that isn't likely to help that much anyway. If anyone has ideas
    > on what settings might be good to overcome this, I'd be really greatfull
    > to know.

    guess the short-buffer streaming mode of anomy makes that problematic.
    Perhaps it's better done by a file_list_scanner, either:

    1) lynx -force_html -dump %FILENAME %ATTNAME
       that's what I'm doing, no html at all pls ;)
    2) use some html-cleaner, eg htmlclean:
       HTMLCLEAN(1) User Contributed Perl Documentation HTMLCLEAN(1)

           htmlclean - a small script to clean up existing HTML

           htmlclean [-v] [-V] file1 [file2 file3 ...]

           This program provides a command-line interface to the
           HTML::Clean module, which can help you to provide more

       in Debian something using such is in pkgs:
       $ dpkg -S htmlclean
       wml: /usr/lib/wml/exec/wml_aux_htmlclean
       libhtml-clean-perl: /usr/share/man/man1/htmlclean.1p.gz
       libhtml-clean-perl: /usr/bin/htmlclean <-- you may want just this!
       wp2x: /usr/share/doc/wp2x/filters/
       wp2x: /usr/share/doc/wp2x/filters/
       wp2x: /usr/share/doc/wp2x/filters/

    > The second problem was that the log files that are sometimes placed
    > inline rather than as an attachment often became 10 or more times longer

    never had such problem - not using inline log at all, so can't tell anything



