Groovy Documentation

java.gate.yam.convert
Class JSPWikiMarkupParser

java.lang.Object
  java.gate.yam.convert.JSPWikiMarkupParser

class JSPWikiMarkupParser

Parses JSPWiki-style markup into a WikiDocument DOM tree. This class is the heart and soul of JSPWiki : make sure you test properly anything that is added, or else it breaks down horribly.

author:
Janne Jalkanen
since:
2.4


Nested Class Summary
class JSPWikiMarkupParser.CleanTextRenderer

class JSPWikiMarkupParser.Heading

class JSPWikiMarkupParser.StartingComparator

Compares two Strings, and if one starts with the other, then returns null.

 
Field Summary
private static int ATTACHMENT

static String CLASS_EDITPAGE

The value for anchor element class attributes when used for edit page links.

static String CLASS_INTERWIKI

The value for anchor element class attributes when used for interwiki page links.

static String CLASS_WIKIPAGE

The value for anchor element class attributes when used for wiki page (normal) links.

static String DEFAULT_INLINEPATTERN

The default inlining pattern.

private static int EDIT

private static int EMPTY

private static String[] EMPTY_ELEMENTS

All elements that can be empty by the HTML DTD.

private static int EXTERNAL

private static int IMAGE

private static int IMAGELINK

private static int IMAGEWIKILINK

private static int INTERWIKI

private static int LOCAL

private static int LOCALREF

private static String OUTLINK_IMAGE

Name of the outlink image; relative path to the JSPWiki directory.

static String PROP_ALLOWHTML

If set to "true", allows using raw HTML within Wiki text.

static String PROP_CAMELCASELINKS

If true, consider CamelCase hyperlinks as well.

static String PROP_INLINEIMAGEPTRN

This property defines the inline image pattern.

static String PROP_PLAINURIS

If true, all hyperlinks are translated as well, regardless whether they are surrounded by brackets.

static String PROP_RUNPLUGINS

If set to "true", enables plugins during parsing

static String PROP_USEATTACHMENTIMAGE

If true, all outward attachment info links have a small link image appended.

static String PROP_USEOUTLINKIMAGE

If true, all outward links (external links) have a small link image appended.

static String PROP_USERELNOFOLLOW

If set to "true", all external links are tagged with 'rel="nofollow"'

protected static String PUNCTUATION_CHARS_ALLOWED

Lists all punctuation characters allowed in WikiMarkup.

protected static int PUSHBACK_BUFFER_SIZE

Allow this many characters to be pushed back in the stream.

private static int READ

static String WIKIWORD_REGEX

static String[] c_externalLinks

This list contains all IANA registered URI protocol types as of September 2004 + a few well-known extra types.

private static Comparator c_startingComparator

This Comparator is used to find an external link from c_externalLinks.

private static Logger log

private boolean m_allowHTML

If true, allows raw HTML.

private boolean m_allowPHPWikiStyleLists

protected ArrayList m_attachmentLinkMutatorChain

private boolean m_camelCaseLinks

If true, then considers CamelCase links as well.

private org.apache.oro.text.regex.PatternMatcher m_camelCaseMatcher

private org.apache.oro.text.regex.Pattern m_camelCasePattern

private JSPWikiMarkupParser m_cleanTranslator

private org.apache.oro.text.regex.PatternCompiler m_compiler

private org.jdom.Element m_currentElement

protected ArrayList m_externalLinkMutatorChain

private StringBuffer m_genlistBulletBuffer

private int m_genlistlevel

protected ArrayList m_headingListenerChain

protected PushbackReader m_in

private ArrayList m_inlineImagePatterns

Keeps image regexp Patterns

protected boolean m_inlineImages

private org.apache.oro.text.regex.PatternMatcher m_inlineMatcher

private boolean m_isEscaping

private boolean m_isOpenParagraph

private boolean m_isPre

private boolean m_isPreBlock

private boolean m_isbold

private boolean m_isdefinition

private boolean m_isitalic

private boolean m_istable

protected ArrayList m_linkMutators

protected ArrayList m_localLinkMutatorChain

Optionally stores internal wikilinks

private String m_outlinkImageURL

Holds the image URL for the duration of this parser

protected boolean m_parseAccessRules

private StringBuffer m_plainTextBuf

Keeps track of any plain text that gets put in the Text nodes

private boolean m_plainUris

If true, consider URIs that have no brackets as well.

private int m_pos

private boolean m_restartbold

private boolean m_restartitalic

Controls whether italic is restarted after a paragraph shift

private int m_rowNum

private Stack m_styleStack

Contains style information, in multiple forms.

private boolean m_useAttachmentImage

private boolean m_useOutlinkImage

If true, all outward links use a small link image.

private boolean m_useRelNofollow

 
Constructor Summary
JSPWikiMarkupParser(Reader in)

Creates a markup parser.

 
Method Summary
private org.jdom.Element addElement(org.jdom.Content e)

static String cleanLink(String link)

Cleans a Wiki name.

void disableAccessRules()

//* Adds a hook for processing link texts.

private void disableOutputEscaping()

Emits a processing instruction that will disable markup escaping.

void enableImageInlining(boolean toggle)

Use this to turn on or off image inlining.

private String escapeHTMLEntities(StringBuffer buf)

Escapes XML entities in a HTML-compatible way (i.e. does not escape entities that are already escaped).

private void fillBuffer(org.jdom.Element startElement)

private String findAttachment(String link)

private int flushPlainText()

static boolean getBooleanProperty(Properties props, String key, boolean defval)

Gets a boolean property from a standard Properties list.

private JSPWikiMarkupParser getCleanTranslator()

Does a lazy init.

static Collection getImagePatterns()

Figure out which image suffixes should be inlined.

private static String getListType(char c)

int getPosition()

Return the current position in the reader stream.

private org.jdom.Element handleApostrophe()

For example: italics.

private org.jdom.Element handleBackslash()

private org.jdom.Element handleBar(boolean newLine)

private org.jdom.Element handleClosebrace()

Handles both }} and }}}

private org.jdom.Element handleDash()

private org.jdom.Element handleDefinitionList()

private org.jdom.Element handleDiv(boolean newLine)

Handles constructs of type %%(style) and %%class

private org.jdom.Element handleGeneralList()

Like original handleOrderedList() and handleUnorderedList() however handles both ordered ('#') and unordered ('*') mixed together.

private org.jdom.Element handleHeading(String pageName)

private org.jdom.Element handleHyperlinks(String link, int pos)

Gobbles up all hyperlinks that are encased in square brackets.

private org.jdom.Element handleImageLink(String reallink, String link, boolean hasLinkText)

Image links are handled differently: 1.

private org.jdom.Element handleMetadata(String link)

Handles metadata setting [{SET foo=bar}]

private org.jdom.Element handleOpenbrace(boolean isBlock)

private org.jdom.Element handleOpenbracket()

private org.jdom.Element handleSlash(boolean newLine)

private org.jdom.Element handleTilde()

Generic escape of next character or entity.

private org.jdom.Element handleUnderscore()

private void initialize()

The WikiEngine this reader is attached to.

private static boolean isAccessRule(String link)

Returns true, if the link in question is an access rule.

static boolean isExternalLink(String link)

Figures out if a link is an off-site link.

private boolean isImageLink(String link)

Matches the given link to the list of image name patterns to determine whether it should be treated as an inline image or not.

private static boolean isMetadata(String link)

static boolean isNumber(String s)

Returns true, if the argument contains a number, otherwise false.

static boolean isPositive(String val)

Returns true, if the string "val" denotes a positive string.

private String linkExists(String page)

Returns link name, if it exists; otherwise it returns null.

private org.jdom.Element makeCamelCaseLink(String wikiname)

When given a link to a WikiName, we just return a proper HTML link for it.

private org.jdom.Element makeDirectURILink(String url)

Takes an URL and turns it into a regular wiki link.

static org.jdom.Element makeError(String error)

Writes HTML for error message.

org.jdom.Element makeHeading(int level, String pageName, String title, Heading hd)

Returns XHTML for the start of the heading.

private String makeHeadingAnchor(String baseName, String title, Heading hd)

Modifies the "hd" parameter to contain proper values.

private org.jdom.Element makeLink(int type, String link, String text, String section)

private String makeSectionTitle(String title)

protected int nextToken()

private org.jdom.Element outlinkImage()

Returns an element for the external link image (out.png).

org.jdom.Document parse()

private String peekAheadLine()

This method peeks ahead in the stream until EOL and returns the result.

private org.jdom.Element popElement(String s)

protected void pushBack(int c)

Push back any character to the current input.

private void pushBack(String s)

Pushes back any string that has been read.

private org.jdom.Element pushElement(org.jdom.Element e)

private String readBraceContent(char opening, char closing)

Reads the stream until the current brace is closed or stream end.

private String readUntil(String endChars)

Reads the stream until it meets one of the specified ending characters, or stream end.

private StringBuffer readUntilEOL()

Reads the stream until the next EOL or EOF.

private String readWhile(String endChars)

Reads the stream while the characters that have been specified are in the stream, returning then the result as a String.

Reader setInputReader(Reader in)

Replaces the current input character stream with a new one.

private void startBlockLevel()

Starts a block level element, therefore closing a potential open paragraph tag.

private org.jdom.Element unwindGeneralList()

 
Methods inherited from class Object
wait, wait, wait, hashCode, getClass, equals, toString, notify, notifyAll
 

Field Detail

ATTACHMENT

private static final int ATTACHMENT


CLASS_EDITPAGE

static final String CLASS_EDITPAGE
The value for anchor element class attributes when used for edit page links. The value is "editpage".


CLASS_INTERWIKI

static final String CLASS_INTERWIKI
The value for anchor element class attributes when used for interwiki page links. The value is "interwiki".


CLASS_WIKIPAGE

static final String CLASS_WIKIPAGE
The value for anchor element class attributes when used for wiki page (normal) links. The value is "wikipage".


DEFAULT_INLINEPATTERN

static final String DEFAULT_INLINEPATTERN
The default inlining pattern. Currently "*.png"


EDIT

private static final int EDIT


EMPTY

private static final int EMPTY


EMPTY_ELEMENTS

private static final String[] EMPTY_ELEMENTS
All elements that can be empty by the HTML DTD.


EXTERNAL

private static final int EXTERNAL


IMAGE

private static final int IMAGE


IMAGELINK

private static final int IMAGELINK


IMAGEWIKILINK

private static final int IMAGEWIKILINK


INTERWIKI

private static final int INTERWIKI


LOCAL

private static final int LOCAL


LOCALREF

private static final int LOCALREF


OUTLINK_IMAGE

private static final String OUTLINK_IMAGE
Name of the outlink image; relative path to the JSPWiki directory.


PROP_ALLOWHTML

static final String PROP_ALLOWHTML
If set to "true", allows using raw HTML within Wiki text. Be warned, this is a VERY dangerous option to set - never turn this on in a publicly allowable Wiki, unless you are absolutely certain of what you're doing.


PROP_CAMELCASELINKS

static final String PROP_CAMELCASELINKS
If true, consider CamelCase hyperlinks as well.


PROP_INLINEIMAGEPTRN

static final String PROP_INLINEIMAGEPTRN
This property defines the inline image pattern. It's current value is jspwiki.translatorReader.inlinePattern


PROP_PLAINURIS

static final String PROP_PLAINURIS
If true, all hyperlinks are translated as well, regardless whether they are surrounded by brackets.


PROP_RUNPLUGINS

static final String PROP_RUNPLUGINS
If set to "true", enables plugins during parsing


PROP_USEATTACHMENTIMAGE

static final String PROP_USEATTACHMENTIMAGE
If true, all outward attachment info links have a small link image appended.


PROP_USEOUTLINKIMAGE

static final String PROP_USEOUTLINKIMAGE
If true, all outward links (external links) have a small link image appended.


PROP_USERELNOFOLLOW

static final String PROP_USERELNOFOLLOW
If set to "true", all external links are tagged with 'rel="nofollow"'


PUNCTUATION_CHARS_ALLOWED

protected static final String PUNCTUATION_CHARS_ALLOWED
Lists all punctuation characters allowed in WikiMarkup. These will not be cleaned away.


PUSHBACK_BUFFER_SIZE

protected static final int PUSHBACK_BUFFER_SIZE
Allow this many characters to be pushed back in the stream. In effect, this limits the size of a single line.


READ

private static final int READ


WIKIWORD_REGEX

static final String WIKIWORD_REGEX


c_externalLinks

static final String[] c_externalLinks
This list contains all IANA registered URI protocol types as of September 2004 + a few well-known extra types. JSPWiki recognises all of them as external links. This array is sorted during class load, so you can just dump here whatever you want in whatever order you want.


c_startingComparator

private static Comparator c_startingComparator
This Comparator is used to find an external link from c_externalLinks. It checks if the link starts with the other arraythingie.


log

private static Logger log


m_allowHTML

private boolean m_allowHTML
If true, allows raw HTML.


m_allowPHPWikiStyleLists

private boolean m_allowPHPWikiStyleLists


m_attachmentLinkMutatorChain

protected ArrayList m_attachmentLinkMutatorChain


m_camelCaseLinks

private boolean m_camelCaseLinks
If true, then considers CamelCase links as well.


m_camelCaseMatcher

private org.apache.oro.text.regex.PatternMatcher m_camelCaseMatcher


m_camelCasePattern

private org.apache.oro.text.regex.Pattern m_camelCasePattern


m_cleanTranslator

private JSPWikiMarkupParser m_cleanTranslator


m_compiler

private org.apache.oro.text.regex.PatternCompiler m_compiler


m_currentElement

private org.jdom.Element m_currentElement


m_externalLinkMutatorChain

protected ArrayList m_externalLinkMutatorChain


m_genlistBulletBuffer

private StringBuffer m_genlistBulletBuffer


m_genlistlevel

private int m_genlistlevel


m_headingListenerChain

protected ArrayList m_headingListenerChain


m_in

protected PushbackReader m_in


m_inlineImagePatterns

private ArrayList m_inlineImagePatterns
Keeps image regexp Patterns


m_inlineImages

protected boolean m_inlineImages


m_inlineMatcher

private org.apache.oro.text.regex.PatternMatcher m_inlineMatcher


m_isEscaping

private boolean m_isEscaping


m_isOpenParagraph

private boolean m_isOpenParagraph


m_isPre

private boolean m_isPre


m_isPreBlock

private boolean m_isPreBlock


m_isbold

private boolean m_isbold


m_isdefinition

private boolean m_isdefinition


m_isitalic

private boolean m_isitalic


m_istable

private boolean m_istable


m_linkMutators

protected ArrayList m_linkMutators


m_localLinkMutatorChain

protected ArrayList m_localLinkMutatorChain
Optionally stores internal wikilinks


m_outlinkImageURL

private String m_outlinkImageURL
Holds the image URL for the duration of this parser


m_parseAccessRules

protected boolean m_parseAccessRules


m_plainTextBuf

private StringBuffer m_plainTextBuf
Keeps track of any plain text that gets put in the Text nodes


m_plainUris

private boolean m_plainUris
If true, consider URIs that have no brackets as well.


m_pos

private int m_pos


m_restartbold

private boolean m_restartbold


m_restartitalic

private boolean m_restartitalic
Controls whether italic is restarted after a paragraph shift


m_rowNum

private int m_rowNum


m_styleStack

private Stack m_styleStack
Contains style information, in multiple forms.


m_useAttachmentImage

private boolean m_useAttachmentImage


m_useOutlinkImage

private boolean m_useOutlinkImage
If true, all outward links use a small link image.


m_useRelNofollow

private boolean m_useRelNofollow


 
Constructor Detail

JSPWikiMarkupParser

public JSPWikiMarkupParser(Reader in)
Creates a markup parser.


 
Method Detail

addElement

private org.jdom.Element addElement(org.jdom.Content e)


cleanLink

public static String cleanLink(String link)
Cleans a Wiki name.

[ This is a link ] -> ThisIsALink

param:
link Link to be cleared. Null is safe, and causes this to return null.
return:
A cleaned link.
since:
2.0


disableAccessRules

public void disableAccessRules()
//* Adds a hook for processing link texts. This hook is called //* when the link text is written into the output stream, and //* you may use it to modify the text. It does not affect the //* actual link, only the user-visible text. //* //*
param:
mutator The hook to call. Null is safe. //


disableOutputEscaping

private void disableOutputEscaping()
Emits a processing instruction that will disable markup escaping. This is very useful if you want to emit HTML directly into the stream.


enableImageInlining

public void enableImageInlining(boolean toggle)
Use this to turn on or off image inlining.
param:
toggle If true, images are inlined (as per set in jspwiki.properties) If false, then images won't be inlined; instead, they will be treated as standard hyperlinks.
since:
2.2.9


escapeHTMLEntities

private String escapeHTMLEntities(StringBuffer buf)
Escapes XML entities in a HTML-compatible way (i.e. does not escape entities that are already escaped).
param:
buf
return:


fillBuffer

private void fillBuffer(org.jdom.Element startElement)


findAttachment

private String findAttachment(String link)


flushPlainText

private int flushPlainText()


getBooleanProperty

public static boolean getBooleanProperty(Properties props, String key, boolean defval)
Gets a boolean property from a standard Properties list. Returns the default value, in case the key has not been set.

The possible values for the property are "true"/"false", "yes"/"no", or "on"/"off". Any value not recognized is always defined as "false".

param:
props A list of properties to search.
param:
key The property key.
param:
defval The default value to return.
return:
True, if the property "key" was set to "true", "on", or "yes".
since:
2.0.11


getCleanTranslator

private JSPWikiMarkupParser getCleanTranslator()
Does a lazy init. Otherwise, we would get into a situation where HTMLRenderer would try and boot a TranslatorReader before the TranslatorReader it is contained by is up.


getImagePatterns

public static Collection getImagePatterns()
Figure out which image suffixes should be inlined.
return:
Collection of Strings with patterns.


getListType

private static String getListType(char c)


getPosition

public int getPosition()
Return the current position in the reader stream. The value will be -1 prior to reading.
return:
the reader position as an int.


handleApostrophe

private org.jdom.Element handleApostrophe()
For example: italics.


handleBackslash

private org.jdom.Element handleBackslash()


handleBar

private org.jdom.Element handleBar(boolean newLine)


handleClosebrace

private org.jdom.Element handleClosebrace()
Handles both }} and }}}


handleDash

private org.jdom.Element handleDash()


handleDefinitionList

private org.jdom.Element handleDefinitionList()


handleDiv

private org.jdom.Element handleDiv(boolean newLine)
Handles constructs of type %%(style) and %%class
param:
newLine
return:
@throws IOException


handleGeneralList

private org.jdom.Element handleGeneralList()
Like original handleOrderedList() and handleUnorderedList() however handles both ordered ('#') and unordered ('*') mixed together.


handleHeading

private org.jdom.Element handleHeading(String pageName)


handleHyperlinks

private org.jdom.Element handleHyperlinks(String link, int pos)
Gobbles up all hyperlinks that are encased in square brackets.


handleImageLink

private org.jdom.Element handleImageLink(String reallink, String link, boolean hasLinkText)
Image links are handled differently: 1. If the text is a WikiName of an existing page, it gets linked. 2. If the text is an external link, then it is inlined. 3. Otherwise it becomes an ALT text.
param:
reallink The link to the image.
param:
link Link text portion, may be a link to somewhere else.
param:
hasLinkText If true, then the defined link had a link text available. This means that the link text may be a link to a wiki page, or an external resource.


handleMetadata

private org.jdom.Element handleMetadata(String link)
Handles metadata setting [{SET foo=bar}]


handleOpenbrace

private org.jdom.Element handleOpenbrace(boolean isBlock)


handleOpenbracket

private org.jdom.Element handleOpenbracket()


handleSlash

private org.jdom.Element handleSlash(boolean newLine)


handleTilde

private org.jdom.Element handleTilde()
Generic escape of next character or entity.


handleUnderscore

private org.jdom.Element handleUnderscore()


initialize

private void initialize()
param:
engine The WikiEngine this reader is attached to. Is used to figure out of a page exits.


isAccessRule

private static boolean isAccessRule(String link)
Returns true, if the link in question is an access rule.


isExternalLink

public static boolean isExternalLink(String link)
Figures out if a link is an off-site link. This recognizes the most common protocols by checking how it starts.
since:
2.4


isImageLink

private boolean isImageLink(String link)
Matches the given link to the list of image name patterns to determine whether it should be treated as an inline image or not.


isMetadata

private static boolean isMetadata(String link)


isNumber

public static boolean isNumber(String s)
Returns true, if the argument contains a number, otherwise false. In a quick test this is roughly the same speed as Integer.parseInt() if the argument is a number, and roughly ten times the speed, if the argument is NOT a number.
since:
2.4


isPositive

public static boolean isPositive(String val)
Returns true, if the string "val" denotes a positive string. Allowed values are "yes", "on", and "true". Comparison is case-insignificant. Null values are safe.
param:
val Value to check.
return:
True, if val is "true", "on", or "yes"; otherwise false.
since:
2.0.26


linkExists

private String linkExists(String page)
Returns link name, if it exists; otherwise it returns null.


makeCamelCaseLink

private org.jdom.Element makeCamelCaseLink(String wikiname)
When given a link to a WikiName, we just return a proper HTML link for it. The local link mutator chain is also called.


makeDirectURILink

private org.jdom.Element makeDirectURILink(String url)
Takes an URL and turns it into a regular wiki link. Unfortunately, because of the way that flushPlainText() works, it already encodes all of the XML entities. But so does WikiContext.getURL(), so we have to do a reverse-replace here, so that it can again be replaced in makeLink.

What a crappy problem.

param:
url
return:


makeError

public static org.jdom.Element makeError(String error)
Writes HTML for error message.


makeHeading

public org.jdom.Element makeHeading(int level, String pageName, String title, Heading hd)
Returns XHTML for the start of the heading. Also sets the line-end emitter.
param:
level
param:
title the title for the heading
param:
hd a List to which heading should be added


makeHeadingAnchor

private String makeHeadingAnchor(String baseName, String title, Heading hd)
Modifies the "hd" parameter to contain proper values. Because an "id" tag may only contain [a-zA-Z0-9:_-], we'll replace the % after url encoding with '_'.


makeLink

private org.jdom.Element makeLink(int type, String link, String text, String section)


makeSectionTitle

private String makeSectionTitle(String title)


nextToken

protected int nextToken()


outlinkImage

private org.jdom.Element outlinkImage()
Returns an element for the external link image (out.png). However, this method caches the URL for the lifetime of this MarkupParser, because it's commonly used, and we'll end up with possibly hundreds our thousands of references to it... It's a lot faster, too.
return:
An element containing the HTML for the outlink image.


parse

public org.jdom.Document parse()


peekAheadLine

private String peekAheadLine()
This method peeks ahead in the stream until EOL and returns the result. It will keep the buffers untouched.
return:
The string from the current position to the end of line.


popElement

private org.jdom.Element popElement(String s)


pushBack

protected void pushBack(int c)
Push back any character to the current input. Does not push back a read EOF, though.


pushBack

private void pushBack(String s)
Pushes back any string that has been read. It will obviously be pushed back in a reverse order.
since:
2.1.77


pushElement

private org.jdom.Element pushElement(org.jdom.Element e)


readBraceContent

private String readBraceContent(char opening, char closing)
Reads the stream until the current brace is closed or stream end.


readUntil

private String readUntil(String endChars)
Reads the stream until it meets one of the specified ending characters, or stream end. The ending character will be left in the stream.


readUntilEOL

private StringBuffer readUntilEOL()
Reads the stream until the next EOL or EOF. Note that it will also read the EOL from the stream.


readWhile

private String readWhile(String endChars)
Reads the stream while the characters that have been specified are in the stream, returning then the result as a String.


setInputReader

public Reader setInputReader(Reader in)
Replaces the current input character stream with a new one.
param:
in New source for input. If null, this method does nothing.
return:
the old stream


startBlockLevel

private void startBlockLevel()
Starts a block level element, therefore closing a potential open paragraph tag.


unwindGeneralList

private org.jdom.Element unwindGeneralList()


 

Groovy Documentation