6 The iconv library

The Recode library is able to use the capabilities of an external, pre-installed iconv library, usually as provided by GNU libc or the portable libiconv written by Bruno Haible. In fact, many capabilities of the Recode library are duplicated in an external iconv library, as they likely share many charsets. We discuss, here, the issues related to this duplication, and other peculiarities specific to the iconv library.

The RECODE_STRICT_MAPPING_FLAG option, corresponding to the ‘--strict’ flag, is implemented by adding iconv option //IGNORE to the ‘after’ encoding. This has the side effect that untranslatable input is only signalled at the end of the conversion, whereas with Recode’s built-in conversion routines the error will be signalled immediately.

If the string -translit is appended to the after encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output. This corresponds to the iconv option //TRANSLIT.

To check whether iconv is used for a particular conversion, just use the ‘-v’ or ‘--verbose’ option, see Controlling how files are recoded, and check whether ‘:iconv:’ appears as an intermediate charset.

The :iconv: charset represents a conceptual pivot charset within the external iconv library (in fact, this pivot exists, but is not directly reachable). This charset has a : (a mere colon) and :libiconv: for aliases. It is not allowed to recode from or to this charset directly. But when this charset is selected as an intermediate, usually by automatic means, then the external iconv Recode library is called to handle the transformations. By using an ‘--ignore=:iconv:’ option on the recode call or equivalently, but more simply, ‘-x:’, Recode is instructed to avoid this charset as an intermediate, with the consequence that the external iconv library is not used. You can also use --prefer-iconv to use iconv if possible. Consider these calls:

recode l1..1250 < input > output
recode -x: l1..1250 < input > output
recode --prefer-iconv l1..1250 < input > output

All should transform input from ISO-8859-1 to CP1250 on output. The first call might use the external iconv library, while the second call definitely avoids it. The third call will use the external iconv library if it supports the required conversion. Whatever the path used, the results should normally be identical. However, there might be observable differences. Most of them might result from reversibility issues, as the external iconv engine does not likely address reversibility in the same way. Even if much less likely, some differences might result from slight errors in the tables used, such differences should then be reported as bugs.

Discrepancies might be seen in the area of error detection and recovery. The Recode library usually tries to detect canonicity errors in input, and production of ambiguous output, but the external iconv library does not necessarily do it the same way. Moreover, the Recode library may not always recover as nicely as possible when the external iconv has no translation for a given character.

The external iconv libraries may offer different sets of charsets and aliases from one library to another, and also between successive versions of a single library. Best is to check the documentation of the external iconv library, as of the time Recode was installed, to know which charsets and aliases are being provided.

The ‘--ignore=:iconv:’ or ‘-x:’ options might be useful when there is a need to make a recoding more exactly repeatable between machines or installations, the idea being here to remove the variance possibly introduced by the various implementations of an external iconv library. These options might also help deciding whether if some recoding problem is genuine to Recode, or is induced by the external iconv library.