Update tagsoup to 1.2.1

- For a list of changes / bug fixes, see CHANGES. - Moved LICENSE, MODULE_LICENSE_APACHE2 to the root of the repository to be consistent with the rest of android. - Rename GOOGLE_README.txt to the more standard README.android. - Preserve the definitions / ant build rules that generate HTMLScanner.java/HTMLSchema.java in case anyone's interested. src/definitions & src/templates were moved to top level directories to avoid having to introduce complicated source filter rules in the makefile. Tested : android.text.HtmlTest (coretests), android.text.cts.HtmlTest (CTS) https://code.google.com/p/android/issues/detail?id=60999 Change-Id: I402b5244fb6396c9f087393fa6e1e8653387ba0a
author: Narayan Kamath <narayan@google.com> 2013-10-21 12:26:25 +0100
committer: Narayan Kamath <narayan@google.com> 2013-10-22 12:06:49 +0100
commit: 70dce01b47b7ef16f67b6bd17ee66fca72b42ef1 (patch)
tree: 73378e7789f5643fd27bd4cc97183d6629c3113e /CHANGES
parent: 68c2ec9e0acdb3214b7fb91dbab8c9fab8736817 (diff)
download: tagsoup-70dce01b47b7ef16f67b6bd17ee66fca72b42ef1.tar.gz
1 files changed, 309 insertions, 0 deletions
diff --git a/CHANGES b/CHANGES
new file mode 100644
index 0000000..73e5887
--- /dev/null
+++ b/CHANGES
@@ -0,0 +1,309 @@
+Changes from 1.2 to 1.2.1
+=========================
+Match DOCTYPE case-blind
+Extend PushbackReader's size for oddball cases like & followed by CR
+Leo Sutic's 2x-4x speedup by precompiling HTMLScanner table
+
+Changes from 1.1.3 to 1.2
+=========================
+Changed license to Apache 2.0
+Bogon default model is now ANY, not EMPTY
+Support new DOCTYPE output switches --doctype-system and --doctype-public
+Support new XML declaration output switches --standalone and --version
+New --norootbogons switch makes bogons children of the root
+Don't resolve entity references in attribute values unless semicolon-terminated
+Support character entities above U+FFFF
+Add character entities from the 2007-12-14 draft of xml-entity-names
+Call SAX events startPrefixMapping and endPrefixMapping to report prefixes
+Clean up newline processing, shrinking html.stml considerably
+Allow link elements in the body as well as the head, to avoid excess bodies
+Allow tables inside paragraphs
+Allow cells and forms in thead and tfoot elements without intervening tr element
+The span element is no longer restartable
+Support non-standard elements bgsound, blink, canvas, comment, listing,
+	marquee, nobr, ruby, rbc, rtc, rb, rt, rp, wbr, xmp
+In HTML mode, boolean attributes like checked are output in minimized form
+Correctly handle runs of less-than characters
+Suppress all but the first DOCTYPE declaration
+Modify PI targets containing colons to have underscores instead
+The case of element tags is now canonicalized to the schema
+PI targets are no longer forced to lower case
+
+Changes from 1.1.2 to 1.1.3
+===========================
+Allow Parser.set* methods to accept null
+Allow setting the LexicalHandler feature to be null
+	in both cases means "use default behavior"
+
+Changes from 1.1.1 to 1.1.2
+===========================
+Setting CDATAElementsFeature didn't really set CDATAElements instance variable
+
+Changes from 1.1 to 1.1.1
+=========================
+Removed lexical handler calls to startCDATA/endCDATA from CDATA element handling
+Added lexical handler calls to startCDATA/endCDATA from CDATA section handling
+Added CDATAElementsFeature, the programmatic equivalent of the --nocdata switch
+
+Changes from 1.0.5 to 1.1
+=========================
+Add Tatu Saloranta's JAXP support package
+
+Changes from 1.0.4 to 1.0.5
+===========================
+Major repairs to comment scanning
+Skip leading BOM
+Comment out debugging code in PYXWriter
+Allow &#X as well as &#x
+Add net.sf.saxon to list of supported XSLT engines
+
+Changes from 1.0.4 to 1.0.3
+===========================
+Certain options were mutually exclusive that should not have been
+Blocked XML declaration from specifying an encoding of ""
+--method=html was not doing the right thing
+
+Changes from 1.0.3 to 1.0.2
+===========================
+Fixed build file to use Java target version 1.4
+Fixed --version switch to print the right thing
+
+Changes from 1.0.1 to 1.0.2
+===========================
+Version attribute default value removed from html element
+Leading and trailing hyphens now trimmed properly from comments
+Added --output-encoding switch to control encoding
+If output encoding is Unicode, don't generate character references
+Whitespace compressed and junk stripped from public identifiers
+
+Changes from 1.0 to 1.0.1
+=========================
+Added ignorableWhitespaceFeature and --ignorable to report ignorable whitespace
+	Patch due to David Pashley
+Insert spaces to break up -- in comments
+Change bogus chars in publicids to spaces
+--lexical switch now outputs DOCTYPE if there is one
+Remove unnecessary blank line after XML declaration
+
+Changes from 1.0rc9 to 1.0
+==========================
+Added feature to control restartability
+	Patch due to Nikita Zhuk
+Added corresponding --norestart switch in CommandLine
+Made translate-colons feature actually work
+
+Changes from 1.0rc8 to 1.0rc9
+=============================
+If there is a publicid but no systemid, set systemid to ""
+
+Changes from 1.0rc7 to 1.0rc8
+=============================
+Fixed paper-bag bug (source didn't match binary in release)
+
+Changes from 1.0rc6 to 1.0rc7
+=============================
+LexicalHandler now gets DOCTYPE information (publicid and systemid)
+	Patch due to Mike Bremford
+HTMLScanner now reports more useful debug output when not commented out
+	Patch due to Mike Bremford
+Change "<memberOfAny>" to exclude "<root>" pseudo-element
+	This prevents "script" from being output as a root
+The shared HTMLParser object has been eliminated
+
+Changes from 1.0rc5 to 1.0rc6
+=============================
+If namespaceFeature is false, uri and localname are passed as empty strings
+The namespacePrefixesFeature is now always false
+Command line switch --nons no longer affects namespacePrefixesFeature
+Command line switch --html now implies --nons
+XMLWriter is now told directly to use the schema's URI as default namespace
+XMLWriter now takes the element name from the qname if localname is empty
+
+Changes from 1.0rc4 to 1.0rc5
+=============================
+The --nodefault switch now removes only default attributes, not all of them
+Added --nocolons switch and translate-colons feature to convert ":"
+	in names to "_" (thus suppressing namespaces other than the basic one)
+The root element can be unknown without problem
+Empty <script/> and <style/> tags now work
+Added all standard SAX2 features to feature hashtable
+Reimplemented namespacePrefixes feature (broken since 1.0rc3)
+
+Changes from 1.0rc3 to 1.0rc4
+=============================
+Remove trailing ? from processing instructions (in case the input is XHTML)
+Added Javadocs for all SAX standard and TagSoup-specific features and properties
+Fixed termination conditions for entity/character references
+Fixed EOF-pushback bug that was generating bogus &#x65535; references
+Added Parser feature and --nodefaults switch to ignore default attribute values
+Added support for SAX Locator
+Updated AFL license to version 3.0
+Scanner buffer size increases as needed, allowing large attribute values
+Look for various XSLT implementations as available (still fails in raw 5.0)
+Clean up handling of XML empty tags and SGML minimized end-tags
+Support proper options and help message internally
+Use Hashtable in CommandLine class instead of HashMap
+Do proper buffering of InputStream and Reader
+Clean up content model of noframes element
+Removed htmlMode in XMLWriter
+Added support for XSLT output options METHOD=html and OMIT_XML_DECLARATION=yes
+Command line option --html sets both of these
+Wrote simple validator for TSSL schemas (tssl/tssl-validator.xslt)
+Removed various validity problems in html.tssl
+When processing a start-tag, don't restart elements that aren't in the new
+	element's content model
+Remove bogus double param in tssl.xslt
+
+Changes from 1.0rc2 to 1.0rc3
+=============================
+Convert CR and CRLF to LF in comments and PIs
+Force empty elements to close immediately
+Match close tags of CDATA elements more precisely (but case-blind)
+Process switches on the command line
+Man page available
+
+Changes from 1.0rc1 to 1.0rc2
+=============================
+Isolated & and &# now don't crash parser
+TagSoup no longer depends on /dev/stdin existing
+Refactored Parser class, removing main method to new CommandLine class
+Changes to content models of form, button, table, and tr elements in html.tssl
+'</scr' + 'ipt>' in a script element no longer terminates it
+Introduced "uncloseability" of form and table elements
+"pyxin" property specifies that input is in PYX format
+Correctly cope with unexpected characters around colons, also with multiple colons
+Correctly output comments with "--" in them (by adding a space)
+
+Changes from 0.10.2 to 1.0rc1
+=============================
+Script can now appear anywhere
+Switch -nocdata correctly implemented
+Eliminated useless M_n constants in Schema
+Introduced <memberofAny> and <isRoot> as alternatives to
+	<memberOf> in TSSL
+Allow prefixes in element names
+Attributes are now normalized
+Expanded public API for Element and ElementType
+Javadoc improved
+
+Changes from 0.10.1 to 0.10.2
+=============================
+Removed misfeature whereby > terminated a tag even inside quotes
+Added licensing language to XSLT scripts, RELAX NG schemas
+Removed long-standing mishandling of entity references in attributes
+Cleaned up logic for converting junky strings to proper XML Names
+Correctly handle empty tag that has no whitespace or attributes
+Restore correct 0.9.3 handling of an apparent end-tag in a CDATA element
+Added script element to content model of head element
+
+Changes from 0.9.7 to 0.10.1 (there is no 0.10.0):
+==================================================
+Convert to XSLT configuration exclusively;
+	Perl code and tab-separated tables are gone
+Remove xmlns:* attributes
+Append "_" to attribute names ending in ":"
+Don't prepend "_" to an attribute name starting in "_"
+Handle namespace prefixes in attributes:
+	"xml" prefix is handled correctly
+	other prefixes are mapped to "urn:x-prefix:foo"
+Ignore XML declarations
+-Dnocdata=true turns off F_CDATA on script and style elements
+Fixed off-by-one errors in character references that made them uninterpreted
+Start-tags ending in a minimized attribute are no longer being dropped
+XML empty tags are now supported (though slashes are still allowed in
+	unquoted attribute values)
+
+Changes from 0.9.6 to 0.9.7:
+============================
+Upgraded AFL to version 2.1
+Passed through newlines in character content (very old bug)
+
+Changes from 0.9.5 to 0.9.6:
+============================
+Script element can appear directly in body
+">" terminates a start-tag even inside a quoted attribute,
+	to protect against unbalanced quotes
+"_" is prepended to attributes that don't begin with a letter
+Remove "xmlns" attributes from the input
+All standard features can now be set
+	(although there is no effect from doing so)
+New "bogons-empty" feature can be set to false to give bogons
+	 content model of ANY rather than EMPTY;
+	-Dany switch sets this feature to false
+TSSL now has an explicit group element to declare an element group
+STML is a new XML format for modeling state-table changes
+License updated to AFL 2.1
+
+Changes from 0.9.4 to 0.9.5:
+============================
+S in the statetable now means \r and \n and \t as well as space
+	(as was always intended; brain fart!)
+Ins and del elements are now allowed everywhere
+TSSL now correctly supports attributes that are legal on all elements
+
+Changes from 0.9.3 to 0.9.4:
+============================
+Fixed paper-bag bug that revealed attribute type BOOLEAN to applications.
+Obsolete ABSTRACT removed in favor of README.
+Improved implementation of CDATA restart after bogus end-tag.
+Allowed hyphen, underscore, and period in names as well as colon.
+First cut at TagSoup Schema Language -- doesn't do anything yet.
+Support CDATA sections on input.
+Don't generate built-in entities within CDATA elements.
+
+Changes from 0.9.2 to 0.9.3:
+============================
+Convenience main program "tagsoup" in bin directory.
+Begin to integrate tests.
+Introduced BOOLEAN type (currently just converted to NMTOKEN).
+Features that actually work are now named constants in Parser.
+Double root elements are really gone now.
+ID attributes weren't being removed from restarted elements.
+Fixed a bug that made unknown elements disappear in some cases.
+Parser is now safely reusable.
+PYXWriter and XMLWriter now implement LexicalHandler.
+Parser reports comments, startCDATA, and endCDATA events to a LexicalHandler.
+ScanHandler methods now throw only SAXException, not also IOException.
+-Dlexical=true switch sets the ContentHandler as a LexicalHandler as well
+	(XMLWriter prints comments, ignores CDATA sections; PYXWriter ignores all).
+-Dreuse=true switch reuses a single Parser object (no great speed gain).
+We now disallow an a element as the child of another a element.
+An empty input is now treated as zero-length character content.
+HTMLWriter is gone in favor of an extended XMLWriter with get/setHTMLMode methods.
+CDATA elements only terminaate with matching end-tags (thanks to Sebastien Bardoux).
+
+Changes from 0.9.1 to 0.9.2:
+============================
+No longer inserts bogus ; after unknown entity reference without ;.
+Consecutive entity references now work correctly.
+Setting namespaces and namespace-prefixes methods now works.
+-Dnons=true option turns off namespace and prefix.
+New feature http://www.ccil.org/~cowan/tagsoup/features/ignore-bogons"
+	suppresses unknown start-tags (any end-tag will be automatically ignored).
+-Dnobogons=true option turns ignore-bogons on.
+Suppress unknown and/or empty initial start-tag always
+	(prevents double root element).
+Schema now allows style as an inline element, like script.
+Schema now allows tr as a child of table to avoid problems with embedded tables.
+Clear Parser instance variables to make Parsers properly reusable.
+
+Changes from 0.9 to 0.9.1:
+==========================
+Incorporated patch for -jar support by Joseph Walton.
+Incorporated patch for Megginson XMLWriter support by Joseph Walton.
+Changed existing XMLWriter to HTMLWriter.
+Rewrote Parsermain for better features, removed Tester class.
+-Dnewline=true removed, now implied by -DHTML=true.
+-Dfiles=true now used to generate separate outputs (old Tester behavior)
+	with extension xhtml (removing any old extension).
+Fixed nasty bug in HTMLScanner that was failing to fix unusual entities.
+Don't attempt to smash whitespace to spaces any more.
+
+Changes from 0.8 to 0.9:
+========================
+Ant-ified by Martin Rademacher.
+Don't suppress colons in element names.
+Entity problems fixed (I hope).
+Can now set namespace and namespace-prefixes features (without effect).
+Properly templatize HTMLModels.java.
+Attributes are no longer in the HTML namespace.
author	Narayan Kamath <narayan@google.com>	2013-10-21 12:26:25 +0100
committer	Narayan Kamath <narayan@google.com>	2013-10-22 12:06:49 +0100
commit	70dce01b47b7ef16f67b6bd17ee66fca72b42ef1 (patch)
tree	73378e7789f5643fd27bd4cc97183d6629c3113e /CHANGES
parent	68c2ec9e0acdb3214b7fb91dbab8c9fab8736817 (diff)
download	tagsoup-70dce01b47b7ef16f67b6bd17ee66fca72b42ef1.tar.gz