Groovy Documentation

java.gate.yam.convert
Class HtmlToYamConverter

java.lang.Object
  java.gate.yam.convert.HtmlToYamConverter

class HtmlToYamConverter

Convert HTML to YAM. The bulk of the conversion work is done by an XSLT stylesheet, but there is a small amount of pre-processing done in Java to fix up things that are very difficult or impossible to do in XSLT. In particular, for lists that are nested inside other lists, e.g.:

 <ul>
   <li>A list item
     <ul>
       <li>Nested list</li>
     </ul></li>
 </ul>
 
we must strip the whitespace between the parent li text ("A list item<newline><four spaces>") and the opening nested ul tag, otherwise the list nesting is lost in the generated yam.
author:
Valentin Tablan, modified by Ian Roberts


Field Summary
private static String XSL_ENCODING

The encoding used for the XSL documents

private static Set listTags

Set containing the HTML element names that represent lists.

private static Logger log

private static Transformer transformer

The XSL transformer used for HTML to YAM conversions

 
Constructor Summary
HtmlToYamConverter()

 
Method Summary
static String domToString(org.w3c.dom.Document input)

Transforms a DOM document into a String representation in YAM format.

private static void initTransformer()

static String jdomToString(def input)

Transforms a DOM document into a String representation in YAM format.

static void main(String[] args)

Test code - DO NOT USE!

static String readerToString(Reader htmlReader)

Converts HTML source provided from a reader to YAM format returned as string.

static String stringToString(String htmlSource)

Converts HTML source provided as String to YAM format returned as String.

 
Methods inherited from class Object
wait, wait, wait, hashCode, getClass, equals, toString, notify, notifyAll
 

Field Detail

XSL_ENCODING

private static final String XSL_ENCODING
The encoding used for the XSL documents


listTags

private static Set listTags
Set containing the HTML element names that represent lists. Tag names must be in upper case, as the DOM documents produced by NekoHTML report their tag names in upper case regardless of the original case used in the HTML.


log

private static final Logger log


transformer

private static Transformer transformer
The XSL transformer used for HTML to YAM conversions


 
Constructor Detail

HtmlToYamConverter

HtmlToYamConverter()


 
Method Detail

domToString

public static String domToString(org.w3c.dom.Document input)
Transforms a DOM document into a String representation in YAM format. Does some minor pre-processing of the DOM tree to clean up some things that are extremely difficult in XSLT.
param:
input the input DOM document, in HTML
return:
a String value with the parsed results
throws:
TransformerException


initTransformer

private static void initTransformer()


jdomToString

public static String jdomToString(def input)
Transforms a DOM document into a String representation in YAM format. Does some minor pre-processing of the JDOM tree to clean up some things that are extremely difficult in XSLT.
param:
input the jDom document, in HTML
return:
a String in YAM format
throws:
TransformerException


main

public static void main(String[] args)
Test code - DO NOT USE! Given a html file arg[0], writes out its yam file to the directory arg[1]
param:
args


readerToString

public static String readerToString(Reader htmlReader)
Converts HTML source provided from a reader to YAM format returned as string.
param:
htmlReader the Reader supplying the html source document
return:
a String representing a document in YAM format
throws:
SAXException
throws:
IOException
throws:
TransformerException


stringToString

public static String stringToString(String htmlSource)
Converts HTML source provided as String to YAM format returned as String.
param:
htmlSource the String representation of the input HTML document.
return:
a String representing a document in YAM format
throws:
SAXException
throws:
IOException
throws:
TransformerException


 

Groovy Documentation