anomy-list

Re: HTML filtering and STYLES

From: Paolo (143774@xyz.molar.is)
Date: Thu 15 Dec 2005 - 14:32:46 GMT

  • Next message: Noel Clarkson: "Re: HTML filtering and STYLES"

    On Thu, Dec 15, 2005 at 10:16:44AM +1100, Noel Clarkson wrote:
    ...
    > We have not fiddled with which HTML bits get filtered, just using the
    > default, and I'm really not that sure what might be a good compromise,
    > or if that isn't likely to help that much anyway. If anyone has ideas
    > on what settings might be good to overcome this, I'd be really greatfull
    > to know.

    guess the short-buffer streaming mode of anomy makes that problematic.
    Perhaps it's better done by a file_list_scanner, either:

    1) lynx -force_html -dump %FILENAME %ATTNAME
       that's what I'm doing, no html at all pls ;)
    2) use some html-cleaner, eg htmlclean:
       HTMLCLEAN(1) User Contributed Perl Documentation HTMLCLEAN(1)

       NAME
           htmlclean - a small script to clean up existing HTML

       SYNOPSIS
           htmlclean [-v] [-V] file1 [file2 file3 ...]

       DESCRIPTION
           This program provides a command-line interface to the
           HTML::Clean module, which can help you to provide more
           ...

       in Debian something using such is in pkgs:
       $ dpkg -S htmlclean
       wml: /usr/lib/wml/exec/wml_aux_htmlclean
       libhtml-clean-perl: /usr/share/man/man1/htmlclean.1p.gz
       libhtml-clean-perl: /usr/bin/htmlclean <-- you may want just this!
       wp2x: /usr/share/doc/wp2x/filters/htmlcleanup1.pl
       wp2x: /usr/share/doc/wp2x/filters/htmlcleanup2.pl
       wp2x: /usr/share/doc/wp2x/filters/htmlcleanup3.pl

    > The second problem was that the log files that are sometimes placed
    > inline rather than as an attachment often became 10 or more times longer

    never had such problem - not using inline log at all, so can't tell anything
    here.

    HTH

    -- 
     paolo
     
    



    hosted by molar.is