SOLVED Unicode Category of Glyph

  • Hey there,

    I am looking for the best/easiest way to get the unicode and the unicode category of a glyph. The unicode itself is part of the RGlyph object, but not the unicode category, right?

    I tried investigating @erik’s glyphNameFormatter and @jensRFUnicodeInfo, but did not find the right thing yet. Should I be looking elsewhere?

    Thanks in advance for your help!

  • admin


    from import unicodeCategories

  • @benedikt
    FWIW, the glyphNameFormatter in RF has a couple of functions that might be useful. It uses the glyphNamesToUnicodeAndCategories.txt names list which is buried somewhere in RF. This is not the full unicode list (I think CJ and K is not included), but it contains a lot of good stuff.

    import glyphNameFormatter.reader
    name = "flyingSaucer"
    value = 0x1F6F8
    # unicode to name
    > "flyingSaucer"
    # name to unicode
    > 128760
    # unicode to category
    > "So"
    # name to category
    > "So"

  • Wow, awesome. That’s exactly what I was looking for. Thanks for the pointers!

  • admin


    from import unicodeCategories

  • @frederik this returns the unicode range for a glyph, not the unicode category :)

    for example, the first codepoint in your script belongs to the Basic Latin range, and the second to Cyrillic. but both glyphs belong to the category Letter.

  • admin

    I guess glyphNameFormatter has the data you are looking for :)

    from glyphNameFormatter import GlyphName
    g = GlyphName(65)
    g = GlyphName(1234)

  • there’s a catch: in order look up the category, you’ll need to convert the unicode value from integer to hex:

    # load categories data from txt file
    filePath = '/Users/gferreira/Desktop/Categories.txt'
    with open(filePath, 'r') as f:
        rawData = f.readlines()
    # convert raw data into dict
    categories = {}
    for line in rawData:
        uni, gc, level1, level2, level3, level4, name = line.split('\t')
        categories[uni] = level1, level2, level3, level4
    # get unicode for glyph
    g = CurrentGlyph()
    if g.unicode is not None:
        # convert unicode integer to hexadecimal
        uni = "%X" % g.unicode
        uni = uni.zfill(4)
        # get category for unicode value
        if uni in categories:
    >>> fi
    >>> 64257
    >>> FB01
    >>> ('Letter', 'Ligature', '', '')

  • hello @benedikt

    to get the unicode for a glyph you can use RGlyph.autoUnicodes.

    to get unicode categories I would try using the data provided by the Unicode Consortium. I’ve found these:

    you can write a script to load data from Categories.txt, and then search for the unicode value to get its category.

    hope this helps! let us know if it works…

Log in to reply