Changes from M36 to M37

Core

* Deprecated `net.sf.okapi.common.Base64`, you should use [`java.util.Base64`](https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html).
* Fixed [issue #739](https://bitbucket.org/okapiframework/okapi/issues/739): Code constructors are inconsistent
* Added `IFilter.stream()`, for convenience. This is mostly “syntactic sugar”, but allows one to write more “modern” code.

Steps

  • Microsoft Batch Translations

    • Fixed the case when a batch of events ended with skipped segments.
    • Implemented an option to add a prefix to the translation candidate text when copying it in the target. The option is not set by default. The text of the prefix can be specified.
  • External Command

    • Added support for the variables ${srcBCP47} and ${trgBCP47}.

Filters

  • HTML Filter (okf_html)

    • Updated the configuration files to make the filter aware of HTML 5 tags; also reviewed the existing tags and attributes.
      Potentially disruptive changes:
      • The value attribute of the option tag is not extracted anymore. This was a bug. The value is not localizable, as it is supposed to be used to programatically determine what the selection was (server or client side). Translating it can break functionality. See the HTML spec.
      • The dd tags are now handled the same way as the li tags, meaning that empty dd tags get extracted
  • IDML Filter

    • Added a set of filter options to allow the filter to ignore kerning, tracking, and baseline shift properties within a configurable threshold, in order to improve segment quality by reducing tag noise. Issue #785.
  • JSON Filter

    • Addresses the enhancement request in issue #751. JSON Filter now produces the <note> elements (and the enclosing <notes> elements if XLIFF2 writer is used) in XLIFF from the key-value pairs where the key is listed in the new configuration item noteProductionKeys, which is a comman separated list of keys.
      Also added a new configuration item includeIts to XLIFF2 Writer. (The XLIFF 1.2 Writer has had the same option.) This appears as “Includes ITS markup when available.” in Options for XLIFF2 Writer on Rainbow.
  • Markdown Filter

    • Addresses issue #741: Replaced use of the FlexMark Front Matter parser for translating Metadata Headers with a configurable YAML subfilter.
      This improves the translatability of embedded YAML (such as, nested keys, keys with spaces, embedded Markdown or HTML inside key value pairs), as well as allowing for exclusion specific keys.
    • Addresses issue #737: Blank lines in the front YAML block are removed when merged
  • OpenXML Filter

    • Fixed issue #795: a crash that could occur when extracting external hyperlinks.
    • Fixed issue #794: a crash that could occur when extracting PPTX documents produced by LibreOffice.
    • Fixed issue #790: an improvement to the way shading style properties are exposed in code data.
    • Implemented Issue #780: a subfilter option that is applied to the contents of unstyled XLSX cell text.
    • Fixed issue #743: a file descriptor leak when checking for encrypted DOCX files.
    • Fixed issue #736: excluding hidden Powerpoint slides from translation by default, to be consistent with handling of other types of hidden text. Note: this change may cause problems merging kits produced with earlier versions of Okapi.
  • Abstract Markup Filter

    • Partially addresses issue #749: AbstractMarkupFilter no longer populates Code.outerData for runs of text that are both EXCLUDED and INLINE. Impacts all filters that use AbstractMarkupFilter (HTML, XmlStream etc.). This change manifests in the XLIFF output, for example:
      Original XML Format:
      <ph conref="2" translate="no"><?xm-replace_text Phrase?>;</ph>
      XLIFF Content Before:
      <ph id="2" ctype="x-ph">&lt;?xm-replace_text Phrase?></ph>
      XLIFF Content After Change:
      <ph id="2" ctype="x-ph"><ph conref="2" translate="no"><?xm-replace_text Phrase?></ph></ph>

Libraries

  • Translation

    • Changed the base implementation of batchQueryText() to use batchQuery() instead of query().