You are here:
Foswiki
>
System Web
>
FilterPlugin
E
dit
A
ttach
Tags:
create new tag
,
view all tags
---+!! %TOPIC% %TOC% ---++ Description This plugin allows to substitute and extract information from content by using regular expressions. There are three different types of new functions: 1 FORMATLIST: maniplulate a list of items; it is highly configurable to define what constitutes a list and how to extract items from it 1 SUBST, STARTSUBST/STOPSUBST: substiture a pattern in a chunk of text 1 EXTRACT, STARTEXTRACT/STOPEXTRACT: extract a pattern from a text While the START-STOP versions of SUBST and EXTRACT work on inline text, the normal versions process a source topic before including it into the current one. ---++ Syntax Rules ---+++ SUBST *Syntax*: =%<nop>SUBST{topic="..." ...}%= insert a topic by processing its content. * =topic="..."=: name of the topic text to be processed * =text="..."=: text to be processed (has got higher precedence than 'topic') * =pattern="..."=: pattern to be extracted or substituted * =format="..."=: format expression or pattern substitute * =header="..."=: header string prepended to output * =footer="..."=: footer string appended to output * =limit="<n>"= maximum number of occurences to extract or substitute counted from the start of the text (defaults to =100000= aka all hits) * =skip="<n>"= skip the first n occurences * =exclude="..."=: skip occurences that match this regular expression * =include="..."=: skip occurences that don't match this regular expression * sort="on,off,alpha,num" order of the formatted items (default "off") * =expand="on,off"=: toggle expansion of markup before filtering (defaults to =on=) ---+++ STARTSUBST, STOPSUBST *Syntax*: <verbatim> %STARTSUBST{...}% ... %STOPSUBST% </verbatim> substitute text given inline. see [[#SUBST][SUBST]]. ---+++ EXTRACT *Syntax*: =%<nop>EXTRACT{topic="..." ...}%= extract text from a topic. see [[#SUBST][SUBST]]. ---+++ STARTEXTRACT, STOPEXTRACT *Syntax*: <verbatim> %STARTEXTRACT{...}% ... %STOPEXTRACT% </verbatim> extract content given inline. see [[#SUBST][SUBST]]. ---+++ FORMATLIST *Syntax*: =%<nop>FORMATLIST{"<list>" ...}%= formats a list of items. The <list> argument is separated into items by using a split expression; each item is matched agains a pattern and then formatted using a format string while being separated by a separator string; the result is prepended with a header and appended with a footer in case the list is not empty. * <list>: the list * tokenize="...": regex to tokenize the list before spliting it up, tokens are inserted back again after the split stage has been passed * split="...": the split expression (default ",") * replace="key1=value1,key2=value2, ...": this allows to preprocess each list item by replacing the given keys with their value * pattern="...": pattern applied to each item (default "\s(.*)\s") * format="...": the format string for each item (default "$1") * header="...": header string * footer="...": footer string * separator="...": string to be inserted between list items * lastseparator="...": string separating the last item from the rest of the list * null="...": the format string to render the empty list * hideempty="on,off": when set to "on" then empty list items will not be added to the result (empty in the sense of ''); set this to "off" to still add them (default "on") * limit="...": max number of items to be taken out of the list (default "-1") * skip="...": number of list items to skip, not adding them to the result * sort="on,off,alpha,num,nocase" order of the formatted items (default "off") * reverse="on,off": reverse the sortion of the list * unique="on,off": remove dupplicates from the list * exclude="...": remove list items that match this regular expression * include="...": remove list items that don't match this regular expression * selection="...": regular expression that a list item must match to be "selected"; if this matches the =$marker= is inserted * marker="...": string to be inserted when the =selection= regex matches; this will be inserted at the position =$marker= as indicated in =format= . * map="key1=value1,key2=value2, ...": this establishes a key-value hash available via the =$map()= variable. (see also the =replace= parameter for means to preprocess list items automatically.) The pattern string shall group matching substrings in the list item to which you can refer to by using $1, $2, ... in the format string. Any format string (=format=, =header=, =footer=) may contain variables =$percnt$=, =$nop=, =$dollar= and =$n=. The variable =$index= referse to the position number within the list being formatted; =$count= refers to the total number of matched list elementsr;=$marker= is set if the =selection= regular expression matches the current item. The =$map(key)= macro returns the value for "key" as specified in the =map= argument. ---+++ MAKEINDEX *Syntax*: =%<nop>MAKEINDEX{"<list>" ...}%= formats a list into a multi-column index like in <nop>MediaWiki's category topcis. MAKEINDEX insert capitals as headlines to groups of sorted items. It will try to balance all columns equally, and keep track of breaks to prevent "schusterkinder", that is avoid isolated headlines at the bottom of a column. parameters: * <list>: the list of items * split="...": the split expression to separate the <list> into items (default ",") * pattern="...": pattern applied to each item (default "(.*)") * cols="...": maximum number of cols to split the list into * format="...": format of each list item (default "$item") * sort="on,off,alpha,num,nocase": sort the list (default "on") * unique="on/off": removed duplicates (default "off") * exclude="...": pattern to check against items in the list to be excluded * include="...": pattern to check against items in the list to be included * reverse="on/off": reverse the list (default "off") * header="...": format string to prepend to the result * footer="..." format string to be appended to the result Like in FORMATLIST the =format= parameter can make use of =$1=, =$2=, ... variables to match the groupings defined in the =pattern= argument (like in =pattern="(.*);(.*);(.*)"=) . The first matched grouping $1 will be used as the $item to sort the list. In addition =header= and =footer= might contain the =$anchors= variable which will expand to a navigation to jump to the groups within the index. ---++ Examples ---+++ EXTRACT Example 1: convert table into text One of the uses of this plugin is to extract data from tables, which is useful for creating "database-like" wiki applications where data is stored in foswiki tables. While it is certainly possible to do that without this plugin the plugin makes these requests easier to create and maintain. Note, however, that best practice is to store database-like information using Projekte/Projekte/System.DataForms, so that you don't need to parse the format of the data to extract its records repeatedly. *The table:* | *Pos* | *Description* | *Hours* | | 1 | onsite troubleshooting | 3 | | 2 | normalizing data to new format | 10 | | 3 | testing server performance | 5 | *You type:* <verbatim class="tml"> %EXTRACT{topic="%TOPIC%" expand="off" pattern="^\|\s\s(.*?)\s*\|\s*(.*?)\s*\|\s*(.*?)\s*\|" format=" * it took $3 hours $2$n" skip="1" }% </verbatim> *Expected result (simulated):* * it took 3 hours onsite troubleshooting * it took 10 hours normalizing data to new format * it took 5 hours testing server performance *Actual result (this site):* %EXTRACT{topic="%TOPIC%" expand="off" pattern="^\|\s\s(.*?)\s*\|\s*(.*?)\s*\|\s*(.*?)\s*\|" format=" * it took $3 hours $2$n" skip="1" }% ---+++ EXTRACT Example 2: convert text into table Use CSS tags to format text comments as a tabular data (e.g., to allow sorting). *The comments:* <div class="text"><div class="comment"> This is the first comment. </div><div class="posted"> -- Michael Daum on 22 Aug 2005 </div></div> <div class="text"><div class="comment"> This is the second comment. </div><div class="posted"> -- Michael Daum on 22 Aug 2005 </div></div> *You type:* <verbatim class="tml"> %EXTRACT{ topic="%TOPIC%" expand="off" pattern=".div class=\"text\">.*?[\r\n]+(.*?)[\r\n]+(?:.*?[\r\n]+)+?-- (.*?) on (.*?)[\r\n]+" format="| $3 | $2 | $1 ... |$n" header="|*Date*|*Author*|*Headline*|$n" }%</verbatim> *Expected result (simulated):* |*Date*|*Author*|*Headline*| |22 Aug 2005 | Michael Daum | This is the first comment. ... | |22 Aug 2005 | Michael Daum | This is the second comment. ... | *Actual result (this site):* %EXTRACT{ topic="%TOPIC%" expand="off" pattern=".div class=\"text\">.*?[\r\n]+(.*?)[\r\n]+(?:.*?[\r\n]+)+?-- (.*?) on (.*?)[\r\n]+" format="| $3 | $2 | $1 ... |$n" header="|*Date*|*Author*|*Headline*|$n" }% ---+++ MAKEINDEX example 1: creating an index from a chunk of text compare with [[http://en.wikipedia.org/wiki/Category:Philosophy_articles_needing_attention][Philosophy articles needing attention]] %MAKEINDEX{ "Absolute (philosophy), Accident (philosophy), Actualism, Talk:Adam Weishaupt, Alphabet of human thought, Alterity, Analytic philosophy, Analytic-synthetic distinction, Apologism, Bundle theory, Categories (Stoic), Causal chain, Causality, Coherentism, Conscience, Context principle, Contextualism, Cosmology, De dicto and de re, Dialectical monism, Difference (philosophy), Direct reference theory, Discourse ethics, Dualism, Emergentism, Essence, Ethical naturalism, Exemplification, Existentialism, Fatalism, French materialism, Futilitarianism, Hermeneutics, Hypokeimenon, Identity and change, Idolon tribus, Immanent evaluation, Indeterminacy (Philosophy), Individual, Inherence, Kennisbank Filosofie Nederland, Lazy Reason, Mike Lesser, Libertarianism (metaphysics), Logicism, Mad pain and Martian pain, Materialism, Meaning of life, Metakosmia, Metaphysical naturalism, Milesian school, Mind, Monism, Moral imperative, Multiplicity (philosophy), Mystical philosophy of antiquity, Nature (philosophy), Neomodernism, New England Transcendentalists, Nominalism, Non-archimedean time, Non-rigid designator, Object (philosophy), Ontic, Ontological reductionism, Phenomenology, Philosophical realism, Philosophical skepticism, Philosophy, Pluralism (philosophy), Post-structuralism, Postmodern philosophy, Preferentialism Present (time), Problem of universals, Process philosophy, Rational Animal, Rationalist movement, Relativism, Self (philosophy), Solipsism, Species (metaphysics), Specters of Marx, Substance theory, Talk:The Art of Being Right, Truth-value link, Universal (metaphysics), Utilitarianism, Value judgment, World riddle" cols="3" format="[[http://en.wikipedia.org/wiki/$item][$item]]" header="$anchors" }% ---+++ MAKEINDEX example 2: creating an index for a search result %MAKEINDEX{ "%SEARCH{".*" web="%USERSWEB%" scope="topic" type="regex" limit="30" nonoise="on" format="$topic;$web;$wikiusername;$date" separator="$n" excludetopic="CGI*,*Plugin" }%" cols="2" split="\n" pattern="(.*);(.*);(.*);(.*)" format="<div class='indexItem'> [[$2.$1][$1]] <div class='foswikiGrayText'>$4 - $3</div> </div>" }% <style type="text/css"> .indexItem { margin:0px 10px 10px 0px; } </style> ---++ Plugin Installation Instructions You do not need to install anything in the browser to use this extension. The following instructions are for the administrator who installs the extension on the server. Open configure, and open the "Extensions" section. Use "Find More Extensions" to get a list of available extensions. Select "Install". If you have any problems, or if the extension isn't available in =configure=, then you can still install manually from the command-line. See http://foswiki.org/Support/ManuallyInstallingExtensions for more help. ---++ Plugin Info <!-- * Set SHORTDESCRIPTION = Substitute and extract information from content by using regular expressions --> | Plugin Author: | Michael Daum | | Copyright ©: | 2005-2012, Michael Daum http://michaeldaumconsulting.com | | License: | GPL ([[http://www.gnu.org/copyleft/gpl.html][GNU General Public License]]) | | Release: | 3.10 | | Version: | 15048 (2012-06-19) | | Change History: | <!-- versions below in reverse order --> | | 19 Jun 2012: | added =lastseparator= (by Foswiki:Main/OliverKrueger);\ fixed paging when using together with =include= and =exclude= parameters | | 15 May 2012: | fixed paging through lists in FORMATLIST | | 05 May 2012: | fixed lists not being processed properly before iterating over them in FORMATLIST and MAKEINDEX | | 19 Apr 2012: | modernized plugin by using a proper OO-core; \ fixed processing of =tokenize= properly;\ added =replace= parameter for FORMATLIST; \ fixed the plugin calling =Foswiki::Func::expandCommonVariables()= itself unnecessarily | | 10 Jan 2012: | fixed filtering zero; fixed counting list items without formating them; added =hideempty= parameter to enable/disable rendering empty list items | | 29 Sep 2011: | fixed SUBST macro =topic= param processing embedded META | | 25 Aug 2011: | fixed perl rookie error initializing defaults | | 14 Jul 2011: | fixed parsing zero values in lists (by Grzegorz Marszalek) | | 06 Apr 2011: | fixed SUBST to removing everything after the last match | | 23 Jul 2010: | fixed wrapper for non-official api call to getAnchorName on foswiki-1.1 | | 07 Jun 2010: | fixed expanding standard escapes ($n, $percent, ...); improved examples in docu | | 12 Feb 2010: | ease =tokenize=; forward compatibility for newer foswikis | | 17 Nov 2009: | added =tokenize= pattern for FORMATLIST; \ fixed potential deep recursion in SUBST/EXTRACT | | 14 Sep 2009: | added =include= counterpart to already existing =exclude= params; \ fixed SUBST not to forget about the non-matching tail of a char sequence | | 17 Apr 2009: | converted to foswiki, added numerical sorting to MAKETEXT | | 08 Oct 2008: | added =$anchors= to MAKEINDEX (by Dirk Zimoch); \ added =nocase= option to FORMATLIST (by Dirk Zimoch); \ fixed null/empty string match in FORMATLIST | | 20 Aug 2008: | added =selection= and =marker= to FORMATLIST, similar in use as %SYSTEMWEB%.VarWEBLIST | | 03 Jul 2008: | sorting a list _before_, not _after_, formatting it in FORMATLIST | | 08 May 2008: | added 'text' parameter to SUBST and EXTRACT; \ fixed SUBST as it was pretty useless before | | 07 Dec 2007: | added MAKEINDEX, added lazy compilation | | 14 Sep 2007: | added sorting for EXTRACT and SUBST | | 02 May 2007: | using registerTagHandler() as far as possible; \ enhanced parameters to EXCTRACT and SUBST | | 05 Feb 2007: | fixed escapes in format strings; \ added better default value for max number of hits to prevent deep recursions \ on bad regexpressions | | 22 Jan 2007: | fixed SUBST, added skip parameter to FORMATLIST | | 18 Dec 2006: | using registerTagHandler for FORMATLIST | | 13 Oct 2006: | fixed =limit= parameter in FORMATLIST | | 31 Aug 2006: | added NO_PREFS_IN_TOPIC | | 15 Aug 2006: | added =use strict;= and fixed revealed errors | | 14 Feb 2006: | moved in FORMATLIST from the Foswiki:Extensions/NatSkinPlugin;\ added escape variables to format strings | | 06 Dec 2005: | fixed SUBST not to cut off the rest of the text | | 09 Nov 2005: | fixed deep recursion using =expand="on"= | | 22 Aug 2005: | Initial version; added =expand= toggle | | Dependency: | $Foswiki::Plugins::VERSION 1.024 | | CPAN Dependencies: | none | | Other Dependencies: | none | | Plugin Home: | Foswiki:Extensions/%TOPIC% | | Support: | Foswiki:Support/%TOPIC% |
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r3 - 14 Oct 2014,
MatthiasGeorgi
System
Log In
or
Register
Toolbox
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
User Reference
BeginnersStartHere
TextFormattingRules
Macros
FormattedSearch
QuerySearch
DocumentGraphics
SkinBrowser
InstalledPlugins
Admin Maintenance
Reference Manual
AdminToolsCategory
InterWikis
ManagingWebs
SiteTools
DefaultPreferences
WebPreferences
Categories
Admin Documentation
Admin Tools
Developer Doc
User Documentation
User Tools
Webs
Applications
ClassificationApp
Extensions
Main
System
Copyright © by the contributing authors. All material on this site is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki?
Send feedback