On Thu, Dec 15, 2005 at 10:16:44AM +1100, Noel Clarkson wrote:
> We have not fiddled with which HTML bits get filtered, just using the
> default, and I'm really not that sure what might be a good compromise,
> or if that isn't likely to help that much anyway. If anyone has ideas
> on what settings might be good to overcome this, I'd be really greatfull
> to know.
guess the short-buffer streaming mode of anomy makes that problematic.
Perhaps it's better done by a file_list_scanner, either:
1) lynx -force_html -dump %FILENAME %ATTNAME
that's what I'm doing, no html at all pls ;)
2) use some html-cleaner, eg htmlclean:
HTMLCLEAN(1) User Contributed Perl Documentation HTMLCLEAN(1)
htmlclean - a small script to clean up existing HTML
htmlclean [-v] [-V] file1 [file2 file3 ...]
This program provides a command-line interface to the
HTML::Clean module, which can help you to provide more
in Debian something using such is in pkgs:
$ dpkg -S htmlclean
libhtml-clean-perl: /usr/bin/htmlclean <-- you may want just this!
> The second problem was that the log files that are sometimes placed
> inline rather than as an attachment often became 10 or more times longer
never had such problem - not using inline log at all, so can't tell anything