Changes from 1.43.0 to 1.44.0

Core

  • Added code in Segments to preserve TextPart Properties and Annotations after segmentation. Also add code to handle deepen segmentation case properly to produce new segments with proper ids i.e., if parent segment id=s1 and the parent is further segmented the children segment ids are s1.1, s1.2, s1.3 etc…
  • Can use space + backslash at the end for a line break.
    It generates a <br /> without ending the paragraph or the list item.
  • Recent JDK releases (4/2022) have set xpath operator limits to smaller values to enhance security. We override these defaults to allow ITS based filters to work without limits.
  • Fix to GenericSkeleton to allow deep copy of all parent references (not just “self”)
  • Updates to ISegmenter methods to allow preservation of inline Code ids when joining segments in TextUnitMerger

Connectors

  • GlobalSight

    • Removed

Filters

  • PO Filter

    • Fixed an issue which caused bilingual PO files not to merge correctly when a subfilter was applied, PR #605.
  • IDML Filter

    • Improved: initial support for end notes provided: issue #856, styles handling for nested elements.
    • Improved: custom text variables can be optionally translated: issue #1138
    • Improved: index topics can be optionally translated: issue #1139
    • Improved: Rainbow UI for font mappings provided: issue #1149
  • Markdown filter

    • Added support for Admonition syntax: PR #621
  • OpenXML Filter

    • Improved: font mapping for XLSX documents provided: issue #972
    • Improved: revisions automatically accepted in XLSX documents: issue #983
    • Improved: hidden styled text parts extracted as modifiable in PPTX documents: issue #1011
    • Fixed: the handling of cell references in table parts clarified: issue #1143
    • Fixed: differential format reading clarified: issue #1144
    • Improved: Rainbow UI for complex worksheet configurations provided,
      deprecated column exclusion configurations for XLSX documents removed: issue #1147
    • Improved: Rainbow UI for font mappings provided: issue #1150
    • Fixed: empty referent runs handling clarified: issue #1157
  • XLIFF2 Filter

    • Fix xliff2 filter handling of ignorable - auto-create target (copy of source) if needed
    • xliff2 segment and ingorable TextParts now given auto-generated id’s
    • If xliff2 segment state is not initial then write target only if there is no content

Libraries

  • Serialization Library

    • Add new Google Protobuffer based library to serialize TextUnits. Library is used to produce a serialized file in three formats (1) Binary protobuffer (2) textual protobuffer (3) standard JSON. The serialized file can be used in place of XLIFF 1.2 to facilitate extraction and merge using OriginalDocumentTextUnitFlatMergerStep

Steps

  • Segmentation Step

    • added setDoNotSegmentIfHasTarget option (default is false). If true we turn off segmentation if the TextUnit has a target. This is to protect from producing misalignments.
  • XLIFF Word-Count Splitter Step

    • Fixed: context groups copied on splitting: issue #1156

Applications

  • Tikal

    • Update Tikal to preserve whitespace in the extracted xliff 1.2