gate.yam.convert
Class HtmlToYamConverter

java.lang.Object
  extended by gate.yam.convert.HtmlToYamConverter

public class HtmlToYamConverter
extends Object

Convert HTML to YAM. The bulk of the conversion work is done by an XSLT stylesheet, but there is a small amount of pre-processing done in Java to fix up things that are very difficult or impossible to do in XSLT. In particular, for lists that are nested inside other lists, e.g.:

 <ul>
   <li>A list item
     <ul>
       <li>Nested list</li>
     </ul></li>
 </ul>
 
we must strip the whitespace between the parent li text ("A list item<newline><four spaces>") and the opening nested ul tag, otherwise the list nesting is lost in the generated yam.

Author:
Valentin Tablan, modified by Ian Roberts

Constructor Summary
HtmlToYamConverter()
           
 
Method Summary
static String domToString(Document input)
          Transforms a DOM document into a String representation in YAM format.
static String jdomToString(org.jdom.Document input)
          Transforms a DOM document into a String representation in YAM format.
static void main(String[] args)
          Test code - DO NOT USE!
static String readerToString(Reader htmlReader)
          Converts HTML source provided from a reader to YAM format returned as string.
static String stringToString(String htmlSource)
          Converts HTML source provided as String to YAM format returned as String.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlToYamConverter

public HtmlToYamConverter()
Method Detail

stringToString

public static String stringToString(String htmlSource)
                             throws SAXException,
                                    IOException,
                                    TransformerException
Converts HTML source provided as String to YAM format returned as String.

Parameters:
htmlSource - the String representation of the input HTML document.
Returns:
a String representing a document in YAM format
Throws:
SAXException
IOException
TransformerException

readerToString

public static String readerToString(Reader htmlReader)
                             throws SAXException,
                                    IOException,
                                    TransformerException
Converts HTML source provided from a reader to YAM format returned as string.

Parameters:
htmlReader - the Reader supplying the html source document
Returns:
a String representing a document in YAM format
Throws:
SAXException
IOException
TransformerException

domToString

public static String domToString(Document input)
                          throws TransformerException
Transforms a DOM document into a String representation in YAM format. Does some minor pre-processing of the DOM tree to clean up some things that are extremely difficult in XSLT.

Parameters:
input - the input DOM document, in HTML
Returns:
a String value with the parsed results
Throws:
TransformerException

jdomToString

public static String jdomToString(org.jdom.Document input)
                           throws TransformerException
Transforms a DOM document into a String representation in YAM format. Does some minor pre-processing of the JDOM tree to clean up some things that are extremely difficult in XSLT.

Parameters:
input - the jDom document, in HTML
Returns:
a String in YAM format
Throws:
TransformerException

main

public static void main(String[] args)
                 throws Exception
Test code - DO NOT USE! Given a html file arg[0], writes out its yam file to the directory arg[1]

Parameters:
args -
Throws:
Exception