Changes from M35 to M36
General
- Okapi targets Java 8, Java 7 is not supported anymore.
We build using Java 8, we do not test on Java 7. We started using Java 8 APIs and there is no intention to backport anything to Java 7.
Publicly unavailable security fixes and upgrades for Java 7 ceased as of April 2015. - Starting with this version (M36) we will publish the release version of Okapi to Maven Central.
- Changed SWT library to the official in Maven central (3.106.3).
Core
- All classes that implemented
hasNext()
andnext()
are now declared to implementIterator<Event>
. - Deprecating
FilterIterable
. Now, that we are on JDK 8, we intend to add real stream support, and this hack will be removed. - Added a
stream
method toIFilter
. Seeokapi/examples/java/example07
for usage.
Filters
-
IDML Filter
- Fixed issue #627, which prevented some “track changes” additions from being extracted for translation.
-
Markdown Filter
- Replaced use of the inline code finder to handle embedded HTML with an HTML subfilter. This improves the translatability of embedded HTML (for example, translatable attributes, as in issue #651), as well as allowing for exclusion of MathML content (issue #645).
- Issue #684: correctly handle nested markup.
- Fixed issue #685 and #694, which caused duplicate HTML tags within the table element in merged documents.
- Fixed issue #686 (partially) and issue #728. Quoted paragraphs without HTML tags, quoted lists, and quoted tables are handled properly. Quoted paragraphs with HTML elements are still not handled properly.
- Partially fixed issue #687. Blank (empty) lines are retained in most cases. The filter now uses version 0.32.20 of flexmark-java.
- Implemented issue #692. The user can specify a custom HTML configuration id to be used by the HTML subfilter to process HTML sections within Markdown documents.
- Fixed issue #701, which caused the inline markup character such as
"*"
of"*emphasized part of text*"
at the beginning of the line to be separated from the translation unit. - Fixed issue #708, which caused the ATX heading that immediately follows a list item gets prepended with extra spaces.
- Fixed issue #711, where the link references in absence of an anchor text (which works as anchor text), the anchor text, the image reference's alt text, or the title text in the reference definition was not extracted.
- Issue #713: The inline code finder was disabled in SNAPSHOT versions made after January 21, 2018, with an assmption that it would conflict with the HTML subfilter. After a careful analysis and experiment, it was determined that the assumption was not right, and the inline code finder has been reinstated.
- Fixed issue #714, where the extracted text from the anchor text, which can have inline markups, had the markups literaly
(e.g.
'the *important* page'
) instead of being replaced by place holders ('the <g id="1"/>important<g id="2"/> page'
). - Fixed issue #715, where neighboring markups were breaking up a run of text into two trans-units.
Example:
Here is **strongly** *emphasized* text.
- Fixed issue #716, where a run of text that includes HTML inline tags such as
<b>
was broken up to small trans-units at each tag. - Implemented a new feature mentioned in issue #720. By specifying the new configuration parameter
urlToTranslatePattern
with a regular expression, only URLs matching the pattern will be extracted. - Fixed issue #725 where newline characters were lost in the YAML metadata (front matter).
- A new feature to prevent blocks of text matching a specified pattern from extraction (thus translation) has been added. See issue #726.
- Fixed issue #727 where a task list item of the form
- [ ] Task to be completed
was losing the space between the angular brackets when merged. (Note: the Markdown filter does not formerly support task lists. The code that handles link reference nodes is handling the task lists by coincident.)
-
OpenXML Filter
* Fixed [issue #679](https://bitbucket.org/okapiframework/okapi/issues/679): Fixed a case where the filter didn't didn't properly escape the value of certain types of content (eg, watermarks), leading to corrupt target documents. * Fixed [issue #703](https://bitbucket.org/okapiframework/okapi/issues/703): when using extended code attributes, the filter would sometimes incorrectly indicate that italic or bold formatting was present. * Fixed [issue #734](https://bitbucket.org/okapiframework/okapi/issues/734): Multi-line formulas could be truncated when processing XLSX files.
-
Table Filter
- Allowed the FilterConfigurationMapper to be used for the sub-filter mapping.
-
XLIFF 1.2 and 2.0 Filter
- Added a new filter, “XLIFF 1.2 and 2.0 Filter” (
okf_autoxliff
), which will automatically detect XLIFF version and then delegate operations to the XLIFF or XLIFF-2 filter as appropriate.
- Added a new filter, “XLIFF 1.2 and 2.0 Filter” (
-
XLIFF Filter
- Issue #662: Added support for the inline code finder when extracting XLIFF content.
- Added the
okf_xliff-iws
configuration with enhanced support for the “IWSXLIFF” produced by WorldServer in some cases. The filter reads and writes translation status values and exposes IWS-specific segment metadata as Property objects on the translation unit. - Now if the filter finds a note referencing the target and there is no target element, one is created to preserve the note.
-
XLIFF2 Filter
- Fixed issue #697: Fixed crash when parsing XLIFF-2 files with
<group>
elements.
- Fixed issue #697: Fixed crash when parsing XLIFF-2 files with
-
XML Filter
- Added support for comment nodes of pointers (e.g.
locNotePointer
)
- Added support for comment nodes of pointers (e.g.
-
TEX Filter
- Added the initial Beta version of a filter for TEX files.
-
Multi-Parsers Filter
- Added the initial Beta version of a filter for two-levels complex formats (e.g. CSV with some columns in Markdown, some in HTML, some in plain text).
Steps
-
Rainbow Translation Kit Creation Step
- Fixed issue #732 where the input file could not be the output of a previous XSLT Transform step.
- Fixed issue #733 where the SendOutput option of Rainbow Kit Extraction Step did not work for XLIFF Packages.
Connectors
-
DeepL
- The connector
DeepLv1Connector
for the production API has been implemented. - The connector
DeepLConnector
has also been implemented, but it is for a deprecated API (that still works at this time, but may be discontinued at any time).
- The connector
-
Microsoft Translator
- Fixed the issue where segment with only whitespace were causing an error when passed to the connector.
-
KantanMT
- Deprecated the old connector for the v1 of the API.
- Implemented a new connector for the v2.1 of the API. The connector includes extra methods to list, query, start and stop engines.
Libraries
-
Segmentation
- The Okapi recommended segmentation rules file (
okapi_default_icu4j.srx
) is now embedded int the release .jar.
This means it can be accessed as a resource stream (SRXDocument.class.getResourceAsStream("okapi_default_icu4j.srx")
).
That can be used either directly (srxDoc.loadRules(...the stream...)
) or forsetSourceSrxStream(...)
/setTargetSrxStream(...)
in theSegmentationStep
Parameters
.
It means the applications using Okapi from Maven don't have to somehow download and provide their own copy of the recommended.srx
- The Okapi recommended segmentation rules file (
Applications
-
Rainbow
- Added file extension mapping for
.tsv
tookf_table_tsv
. This resolves issue #683. - Added file extension mappings for
.csv
tookf_table_csv
, and.markdown
tookf_markdown
.
- Added file extension mapping for
-
Tikal
- Removed the
-x2
and the-m2
options. Extraction using the JSON skeleton is no-longer supported.
- Removed the