anomy-list

Re: HTML filtering and STYLES

From: Noel Clarkson (143822@xyz.molar.is)
Date: Fri 16 Dec 2005 - 06:05:39 GMT


Sorry if this ends up being sent twice, sent from a different address the first time and it's been held up in moderation so I thought I'd send it again so that I didn't need to wait.

I like the ideas, and certainly I wouldn't mind having no html in any
email I receive, but I'm guessing I wouldn't get away with doing that to
everyone else. Hoping for something that is less drastic, but if I
don't get anywhere else then I guess looking at a perl script or
something that will recognise the style_defang tags and remove whatever
is in between might get me over the problem - just don't want to delete
any _real_ bits and want the config to be as simple and using the least
resources possible.

Thanks for the thoughts,

noel.

On Thu, 2005-12-15 at 15:32 +0100, Paolo wrote:
> On Thu, Dec 15, 2005 at 10:16:44AM +1100, Noel Clarkson wrote:
> ...
> > We have not fiddled with which HTML bits get filtered, just using the
> > default, and I'm really not that sure what might be a good compromise,
> > or if that isn't likely to help that much anyway. If anyone has ideas
> > on what settings might be good to overcome this, I'd be really greatfull
> > to know.
>
> guess the short-buffer streaming mode of anomy makes that problematic.
> Perhaps it's better done by a file_list_scanner, either:
>
> 1) lynx -force_html -dump %FILENAME %ATTNAME
> that's what I'm doing, no html at all pls ;)
> 2) use some html-cleaner, eg htmlclean:
> HTMLCLEAN(1) User Contributed Perl Documentation HTMLCLEAN(1)
>
> NAME
> htmlclean - a small script to clean up existing HTML
>
> SYNOPSIS
> htmlclean [-v] [-V] file1 [file2 file3 ...]
>
> DESCRIPTION
> This program provides a command-line interface to the
> HTML::Clean module, which can help you to provide more
> ...
>
> in Debian something using such is in pkgs:
> $ dpkg -S htmlclean
> wml: /usr/lib/wml/exec/wml_aux_htmlclean
> libhtml-clean-perl: /usr/share/man/man1/htmlclean.1p.gz
> libhtml-clean-perl: /usr/bin/htmlclean <-- you may want just this!
> wp2x: /usr/share/doc/wp2x/filters/htmlcleanup1.pl
> wp2x: /usr/share/doc/wp2x/filters/htmlcleanup2.pl
> wp2x: /usr/share/doc/wp2x/filters/htmlcleanup3.pl
>
> > The second problem was that the log files that are sometimes placed
> > inline rather than as an attachment often became 10 or more times longer
>
> never had such problem - not using inline log at all, so can't tell anything
> here.
>
> HTH
>



hosted by molar.is