diff options
author | Narayan Kamath <narayan@google.com> | 2013-10-21 12:26:25 +0100 |
---|---|---|
committer | Narayan Kamath <narayan@google.com> | 2013-10-22 12:06:49 +0100 |
commit | 70dce01b47b7ef16f67b6bd17ee66fca72b42ef1 (patch) | |
tree | 73378e7789f5643fd27bd4cc97183d6629c3113e /CHANGES | |
parent | 68c2ec9e0acdb3214b7fb91dbab8c9fab8736817 (diff) | |
download | tagsoup-70dce01b47b7ef16f67b6bd17ee66fca72b42ef1.tar.gz |
Update tagsoup to 1.2.1
- For a list of changes / bug fixes, see CHANGES.
- Moved LICENSE, MODULE_LICENSE_APACHE2 to the root of
the repository to be consistent with the rest of android.
- Rename GOOGLE_README.txt to the more standard README.android.
- Preserve the definitions / ant build rules that generate
HTMLScanner.java/HTMLSchema.java in case anyone's interested.
src/definitions & src/templates were moved to top level directories
to avoid having to introduce complicated source filter rules in
the makefile.
Tested :
android.text.HtmlTest (coretests), android.text.cts.HtmlTest (CTS)
https://code.google.com/p/android/issues/detail?id=60999
Change-Id: I402b5244fb6396c9f087393fa6e1e8653387ba0a
Diffstat (limited to 'CHANGES')
-rw-r--r-- | CHANGES | 309 |
1 files changed, 309 insertions, 0 deletions
@@ -0,0 +1,309 @@ +Changes from 1.2 to 1.2.1 +========================= +Match DOCTYPE case-blind +Extend PushbackReader's size for oddball cases like & followed by CR +Leo Sutic's 2x-4x speedup by precompiling HTMLScanner table + +Changes from 1.1.3 to 1.2 +========================= +Changed license to Apache 2.0 +Bogon default model is now ANY, not EMPTY +Support new DOCTYPE output switches --doctype-system and --doctype-public +Support new XML declaration output switches --standalone and --version +New --norootbogons switch makes bogons children of the root +Don't resolve entity references in attribute values unless semicolon-terminated +Support character entities above U+FFFF +Add character entities from the 2007-12-14 draft of xml-entity-names +Call SAX events startPrefixMapping and endPrefixMapping to report prefixes +Clean up newline processing, shrinking html.stml considerably +Allow link elements in the body as well as the head, to avoid excess bodies +Allow tables inside paragraphs +Allow cells and forms in thead and tfoot elements without intervening tr element +The span element is no longer restartable +Support non-standard elements bgsound, blink, canvas, comment, listing, + marquee, nobr, ruby, rbc, rtc, rb, rt, rp, wbr, xmp +In HTML mode, boolean attributes like checked are output in minimized form +Correctly handle runs of less-than characters +Suppress all but the first DOCTYPE declaration +Modify PI targets containing colons to have underscores instead +The case of element tags is now canonicalized to the schema +PI targets are no longer forced to lower case + +Changes from 1.1.2 to 1.1.3 +=========================== +Allow Parser.set* methods to accept null +Allow setting the LexicalHandler feature to be null + in both cases means "use default behavior" + +Changes from 1.1.1 to 1.1.2 +=========================== +Setting CDATAElementsFeature didn't really set CDATAElements instance variable + +Changes from 1.1 to 1.1.1 +========================= +Removed lexical handler calls to startCDATA/endCDATA from CDATA element handling +Added lexical handler calls to startCDATA/endCDATA from CDATA section handling +Added CDATAElementsFeature, the programmatic equivalent of the --nocdata switch + +Changes from 1.0.5 to 1.1 +========================= +Add Tatu Saloranta's JAXP support package + +Changes from 1.0.4 to 1.0.5 +=========================== +Major repairs to comment scanning +Skip leading BOM +Comment out debugging code in PYXWriter +Allow &#X as well as &#x +Add net.sf.saxon to list of supported XSLT engines + +Changes from 1.0.4 to 1.0.3 +=========================== +Certain options were mutually exclusive that should not have been +Blocked XML declaration from specifying an encoding of "" +--method=html was not doing the right thing + +Changes from 1.0.3 to 1.0.2 +=========================== +Fixed build file to use Java target version 1.4 +Fixed --version switch to print the right thing + +Changes from 1.0.1 to 1.0.2 +=========================== +Version attribute default value removed from html element +Leading and trailing hyphens now trimmed properly from comments +Added --output-encoding switch to control encoding +If output encoding is Unicode, don't generate character references +Whitespace compressed and junk stripped from public identifiers + +Changes from 1.0 to 1.0.1 +========================= +Added ignorableWhitespaceFeature and --ignorable to report ignorable whitespace + Patch due to David Pashley +Insert spaces to break up -- in comments +Change bogus chars in publicids to spaces +--lexical switch now outputs DOCTYPE if there is one +Remove unnecessary blank line after XML declaration + +Changes from 1.0rc9 to 1.0 +========================== +Added feature to control restartability + Patch due to Nikita Zhuk +Added corresponding --norestart switch in CommandLine +Made translate-colons feature actually work + +Changes from 1.0rc8 to 1.0rc9 +============================= +If there is a publicid but no systemid, set systemid to "" + +Changes from 1.0rc7 to 1.0rc8 +============================= +Fixed paper-bag bug (source didn't match binary in release) + +Changes from 1.0rc6 to 1.0rc7 +============================= +LexicalHandler now gets DOCTYPE information (publicid and systemid) + Patch due to Mike Bremford +HTMLScanner now reports more useful debug output when not commented out + Patch due to Mike Bremford +Change "<memberOfAny>" to exclude "<root>" pseudo-element + This prevents "script" from being output as a root +The shared HTMLParser object has been eliminated + +Changes from 1.0rc5 to 1.0rc6 +============================= +If namespaceFeature is false, uri and localname are passed as empty strings +The namespacePrefixesFeature is now always false +Command line switch --nons no longer affects namespacePrefixesFeature +Command line switch --html now implies --nons +XMLWriter is now told directly to use the schema's URI as default namespace +XMLWriter now takes the element name from the qname if localname is empty + +Changes from 1.0rc4 to 1.0rc5 +============================= +The --nodefault switch now removes only default attributes, not all of them +Added --nocolons switch and translate-colons feature to convert ":" + in names to "_" (thus suppressing namespaces other than the basic one) +The root element can be unknown without problem +Empty <script/> and <style/> tags now work +Added all standard SAX2 features to feature hashtable +Reimplemented namespacePrefixes feature (broken since 1.0rc3) + +Changes from 1.0rc3 to 1.0rc4 +============================= +Remove trailing ? from processing instructions (in case the input is XHTML) +Added Javadocs for all SAX standard and TagSoup-specific features and properties +Fixed termination conditions for entity/character references +Fixed EOF-pushback bug that was generating bogus 񥔵 references +Added Parser feature and --nodefaults switch to ignore default attribute values +Added support for SAX Locator +Updated AFL license to version 3.0 +Scanner buffer size increases as needed, allowing large attribute values +Look for various XSLT implementations as available (still fails in raw 5.0) +Clean up handling of XML empty tags and SGML minimized end-tags +Support proper options and help message internally +Use Hashtable in CommandLine class instead of HashMap +Do proper buffering of InputStream and Reader +Clean up content model of noframes element +Removed htmlMode in XMLWriter +Added support for XSLT output options METHOD=html and OMIT_XML_DECLARATION=yes +Command line option --html sets both of these +Wrote simple validator for TSSL schemas (tssl/tssl-validator.xslt) +Removed various validity problems in html.tssl +When processing a start-tag, don't restart elements that aren't in the new + element's content model +Remove bogus double param in tssl.xslt + +Changes from 1.0rc2 to 1.0rc3 +============================= +Convert CR and CRLF to LF in comments and PIs +Force empty elements to close immediately +Match close tags of CDATA elements more precisely (but case-blind) +Process switches on the command line +Man page available + +Changes from 1.0rc1 to 1.0rc2 +============================= +Isolated & and &# now don't crash parser +TagSoup no longer depends on /dev/stdin existing +Refactored Parser class, removing main method to new CommandLine class +Changes to content models of form, button, table, and tr elements in html.tssl +'</scr' + 'ipt>' in a script element no longer terminates it +Introduced "uncloseability" of form and table elements +"pyxin" property specifies that input is in PYX format +Correctly cope with unexpected characters around colons, also with multiple colons +Correctly output comments with "--" in them (by adding a space) + +Changes from 0.10.2 to 1.0rc1 +============================= +Script can now appear anywhere +Switch -nocdata correctly implemented +Eliminated useless M_n constants in Schema +Introduced <memberofAny> and <isRoot> as alternatives to + <memberOf> in TSSL +Allow prefixes in element names +Attributes are now normalized +Expanded public API for Element and ElementType +Javadoc improved + +Changes from 0.10.1 to 0.10.2 +============================= +Removed misfeature whereby > terminated a tag even inside quotes +Added licensing language to XSLT scripts, RELAX NG schemas +Removed long-standing mishandling of entity references in attributes +Cleaned up logic for converting junky strings to proper XML Names +Correctly handle empty tag that has no whitespace or attributes +Restore correct 0.9.3 handling of an apparent end-tag in a CDATA element +Added script element to content model of head element + +Changes from 0.9.7 to 0.10.1 (there is no 0.10.0): +================================================== +Convert to XSLT configuration exclusively; + Perl code and tab-separated tables are gone +Remove xmlns:* attributes +Append "_" to attribute names ending in ":" +Don't prepend "_" to an attribute name starting in "_" +Handle namespace prefixes in attributes: + "xml" prefix is handled correctly + other prefixes are mapped to "urn:x-prefix:foo" +Ignore XML declarations +-Dnocdata=true turns off F_CDATA on script and style elements +Fixed off-by-one errors in character references that made them uninterpreted +Start-tags ending in a minimized attribute are no longer being dropped +XML empty tags are now supported (though slashes are still allowed in + unquoted attribute values) + +Changes from 0.9.6 to 0.9.7: +============================ +Upgraded AFL to version 2.1 +Passed through newlines in character content (very old bug) + +Changes from 0.9.5 to 0.9.6: +============================ +Script element can appear directly in body +">" terminates a start-tag even inside a quoted attribute, + to protect against unbalanced quotes +"_" is prepended to attributes that don't begin with a letter +Remove "xmlns" attributes from the input +All standard features can now be set + (although there is no effect from doing so) +New "bogons-empty" feature can be set to false to give bogons + content model of ANY rather than EMPTY; + -Dany switch sets this feature to false +TSSL now has an explicit group element to declare an element group +STML is a new XML format for modeling state-table changes +License updated to AFL 2.1 + +Changes from 0.9.4 to 0.9.5: +============================ +S in the statetable now means \r and \n and \t as well as space + (as was always intended; brain fart!) +Ins and del elements are now allowed everywhere +TSSL now correctly supports attributes that are legal on all elements + +Changes from 0.9.3 to 0.9.4: +============================ +Fixed paper-bag bug that revealed attribute type BOOLEAN to applications. +Obsolete ABSTRACT removed in favor of README. +Improved implementation of CDATA restart after bogus end-tag. +Allowed hyphen, underscore, and period in names as well as colon. +First cut at TagSoup Schema Language -- doesn't do anything yet. +Support CDATA sections on input. +Don't generate built-in entities within CDATA elements. + +Changes from 0.9.2 to 0.9.3: +============================ +Convenience main program "tagsoup" in bin directory. +Begin to integrate tests. +Introduced BOOLEAN type (currently just converted to NMTOKEN). +Features that actually work are now named constants in Parser. +Double root elements are really gone now. +ID attributes weren't being removed from restarted elements. +Fixed a bug that made unknown elements disappear in some cases. +Parser is now safely reusable. +PYXWriter and XMLWriter now implement LexicalHandler. +Parser reports comments, startCDATA, and endCDATA events to a LexicalHandler. +ScanHandler methods now throw only SAXException, not also IOException. +-Dlexical=true switch sets the ContentHandler as a LexicalHandler as well + (XMLWriter prints comments, ignores CDATA sections; PYXWriter ignores all). +-Dreuse=true switch reuses a single Parser object (no great speed gain). +We now disallow an a element as the child of another a element. +An empty input is now treated as zero-length character content. +HTMLWriter is gone in favor of an extended XMLWriter with get/setHTMLMode methods. +CDATA elements only terminaate with matching end-tags (thanks to Sebastien Bardoux). + +Changes from 0.9.1 to 0.9.2: +============================ +No longer inserts bogus ; after unknown entity reference without ;. +Consecutive entity references now work correctly. +Setting namespaces and namespace-prefixes methods now works. +-Dnons=true option turns off namespace and prefix. +New feature http://www.ccil.org/~cowan/tagsoup/features/ignore-bogons" + suppresses unknown start-tags (any end-tag will be automatically ignored). +-Dnobogons=true option turns ignore-bogons on. +Suppress unknown and/or empty initial start-tag always + (prevents double root element). +Schema now allows style as an inline element, like script. +Schema now allows tr as a child of table to avoid problems with embedded tables. +Clear Parser instance variables to make Parsers properly reusable. + +Changes from 0.9 to 0.9.1: +========================== +Incorporated patch for -jar support by Joseph Walton. +Incorporated patch for Megginson XMLWriter support by Joseph Walton. +Changed existing XMLWriter to HTMLWriter. +Rewrote Parsermain for better features, removed Tester class. +-Dnewline=true removed, now implied by -DHTML=true. +-Dfiles=true now used to generate separate outputs (old Tester behavior) + with extension xhtml (removing any old extension). +Fixed nasty bug in HTMLScanner that was failing to fix unusual entities. +Don't attempt to smash whitespace to spaces any more. + +Changes from 0.8 to 0.9: +======================== +Ant-ified by Martin Rademacher. +Don't suppress colons in element names. +Entity problems fixed (I hope). +Can now set namespace and namespace-prefixes features (without effect). +Properly templatize HTMLModels.java. +Attributes are no longer in the HTML namespace. |