Changes from 1.41.0 to 1.42.0
Core
* Added the attribute `extType` on the `AltTranslation` class (used for example in XTM XLIFF files).
* Major refactor of `syncronizeCodeIds` and `alignAndCopyCodeMetadata`
* Major refactor of all core Okapi resources to consistently handle `Properties` and `IAnnotations`
* Add `TextPart.whiteSpaceStrategy` in order to preserve whitespace handling in original formats (xliff2 specifically)
* Deprecate ILayerProvider for removal next release
Connectors
-
MyMemory
- Actually use the key parameter to get results from your own translation memories created trough the web site.
- Option to send and email to get more quota.
- Removed IP sending now useless.
- Use the
max_hits
parameter when querying the service. - Set
creation_date
attribute in results. - Streamlined internal logic.
Filters
-
IDML Filter
- Issue #629: Merge of empty targets fixed.
-
Markdown Filter
- Changed MIME type to
text/markdown
as officially registered with IETF RFC 7763 since 2016. The old MIME type wastext/x-markdown
. - Add the “Translate Indented Code Blocks” option to control extraction of indented code blocks, which had previously always been extracted.
- Changed MIME type to
-
OpenXML Filter
- Issue #927: Alignment and RTL handling improved.
- Issue #982: Worksheet inline strings extraction provided.
- Issue #1010: Excluded or hidden presentation slides and their related parts got excluded or hidden as well.
- Issue #1058: DrawingML text line break positioning fixed.
- Issue #1059: The extraction of worksheet and row groups provided.
- Issue #1060: Rows exclusion configuration provided.
- Issue #1061: New columns exclusion configuration provided.
- Issue #1062: Metadata rows and columns configuration provided.
- Issue #1080: Documents processing with cross-structure revisions in tables fixed.
- Issue #1083: The handling of multiple instructions in complex fields improved.
- Issue #1085: Empty structural document tag content handling fixed.
- Issue #1095: The processing of tables with blank rows at the end fixed.
- Issue #1102: The merge of paragraphs with absent properties fixed.
-
XLIFF Filter
- Issue #1018: Expose the
cdataSubfilter
option in the filter config UI.
- Issue #1018: Expose the
-
XLIFF2 Filter
- Add mrk tag support
- Fix loss of roundtrip whitespace info
- Fix loss of Segment id
- Update XliffWriter to always output xml:space value
- Add setTagType to MTag
- Add full support for subtype and type
- Fix merge bug with ignorable segments being misplaced after merge
-
XML Filter
- Issue #1024: On merge, correctly escape markup inside
CDATA
sections that were extracted using theinlineCData
option. - Added a
PROP_XLIFF_FLAVOR
property to theStartSubDocument
object/event (triggered by<file>
) indicating the flavor of the document. - Added a
PROP_REPETITION
property at the segment level indicating (for both SDL and XTM flavors) if the segment was marked as repetition.
- Issue #1024: On merge, correctly escape markup inside
-
PO Filter
- Correctly detect plurals when the ‘Plural-Forms’ entry is split on two physical lines.
- Decode escaped characters (
\
,"
, tabs, newlines, carriage returns, etc.) in message ids and message strings upon reading and encode them back while writing. Unescaped characters are read unaltered but encoded while writing. - Default inline code finder rules do not capture escaped sequences anymore.
- Update POWriter to use new encoder.
-
TS Filter
- Added the ability to pick up comment and extracomment elements from TS files as default into annotations
-
TMX Filter
- Standardize mapping of TMX inline code id's to Okapi
Code.id
andCode.originalId
- Fix various bugs with matching bpt and ept inline codes. Especially if codes are overlapping.
- Simplify redundant code
- Standardize mapping of TMX inline code id's to Okapi
-
SDLPackage Filter
- Issue #1093: SDLXLIFF Files in sub-folders are now processed.
Libraries
-
Segmentation
- Fixed bug with icu4j segmentation rules option. All icu4j rules should now work when combined with SRX rules
Steps
-
Rainbow Translation Kit Merging Step
- Issue #1017: Redundant parameters removed.
-
Simple TM Batch Leveraging Step
- Issue #1015: Now both
IQuery
andITMQuery
connectors can be used with the step.
- Issue #1015: Now both
-
Text Modification Step
- A greater set of ASCII characters are replaced with Extended Latin characters.
Connectors
-
Pensieve TM
- Issue #837: Pensieve TM now uses Lucene 8.8 libraries, upgraded from 3.3.
- Slight changes in the TM behavior are expected but the TM should largely behave similarly. Testing before production use is strongly encouraged.
- Note: These public classes have been removed from
okapi-lib-search
:AlphabeticNgramTokenizer
,ConcordanceFuzzyQuery
,ConcordanceFuzzyScorer
,FuzzySimilarity
,SimpleConcordanceFuzzyQuery
,SimpleConcordanceFuzzyScorer
,SortableToken
Applications
-
Tikal
- Moved sources from
okapi
toapplications
, mavengroupId
changed fromnet.sf.okapi
tonet.sf.okapi.applications
.
- Moved sources from
-
Tikal & Rainbow
- Using our own slf4j logger (so that we can change level, and show results in GUI).
OSes
-
Windows
- Updated the launchers (
.exe
files) to use the java inJAVA_HOME
orPATH
, if available.
- Updated the launchers (
-
macOS
- Added build for aarch64 (ARM 64 bit, Apple M1 chip)
-
Build
- Cleaned and unified the various build scripts, for all platforms.
Nowdeployment/maven
contains only two scripts:build
(with parameters, trybuild help
), andclean
(.bat
and.sh
versions). - We merged the integration tests from their own separate repository into the Okapi repo (
integration-tests
folder).
- Cleaned and unified the various build scripts, for all platforms.