Uploaded image for project: 'SWORD'
  1. SWORD
  2. API-198

UTF8GreekAccents filter has several issues

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: filters
    • Labels:
      None

      Description

      The issues can be classified in two categories.

      1. Greek accent not filtered out
      2. Non-diacritic character filtered out that shouldn't be

      In #1 the accent is    U+0345   ͅ    COMBINING GREEK YPOGEGRAMMENI
      In #2 the character is U+2019   ’   RIGHT SINGLE QUOTATION MARK

      The latter is NOT a Greek accent. AFAIK, there's no valid reason to filter this out.

      For reporting details, please refer to the recent discussion in sword-devel.

      For detailed background, please refer to Greek diacritics

      In addition to these two particular issues, there's the greater concern about the filter not having a restricted scope. Because some Greek accents are not particular to Greek but general combining characters for other languages too, when the filter is applied to non-Greek text, it removes diacritics that should be retained in that context.

      Furthermore, because the filter makes use of Unicode Normalization to NFKC as a prelude to removing the combining characters, it has the side-effect that some unusual codepoints are not restored afterwards on account of the fact that decomposition for some codepoints is not reversible.

      Example: U+00BE VULGAR FRACTION THREE QUARTERS ¾ becomes 3/4

      NB. "Affects Versions" seems to be out of date in the CrossWire Tracker.
      My observations were made using Xiphos 4.0.4 and diatheke version 4.7

        Attachments

          Activity

            People

            • Assignee:
              refdoc Peter von Kaehne
              Reporter:
              dfh David Haslam
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: