    I am looking for the best/easiest way to get the unicode and the unicode category of a glyph. The unicode itself is part of the RGlyph object, but not the unicode category, right?

    I tried investigating @erik’s glyphNameFormatter and @jensRFUnicodeInfo, but did not find the right thing yet. Should I be looking elsewhere?

    hello @benedikt

    to get the unicode for a glyph you can use RGlyph.autoUnicodes.

    to get unicode categories I would try using the data provided by the Unicode Consortium. I’ve found these:

    you can write a script to load data from Categories.txt, and then search for the unicode value to get its category.

    hope this helps! let us know if it works…

    there’s a catch: in order look up the category, you’ll need to convert the unicode value from integer to hex:

    # load categories data from txt file
    filePath = '/Users/gferreira/Desktop/Categories.txt'
    with open(filePath, 'r') as f:
        rawData = f.readlines()
    # convert raw data into dict
    categories = {}
    for line in rawData:
        uni, gc, level1, level2, level3, level4, name = line.split('\t')
        categories[uni] = level1, level2, level3, level4
    # get unicode for glyph
    g = CurrentGlyph()
    if g.unicode is not None:
        # convert unicode integer to hexadecimal
        uni = "%X" % g.unicode
        uni = uni.zfill(4)
        # get category for unicode value
        if uni in categories:
    >>> fi
    >>> 64257
    >>> FB01
    >>> ('Letter', 'Ligature', '', '')

    I guess glyphNameFormatter has the data you are looking for :)

    from glyphNameFormatter import GlyphName
    g = GlyphName(65)
    g = GlyphName(1234)

    @frederik this returns the unicode range for a glyph, not the unicode category :)

    for example, the first codepoint in your script belongs to the Basic Latin range, and the second to Cyrillic. but both glyphs belong to the category Letter.

    from glyphNameFormatter.data import unicodeCategories

  • Wow, awesome. That’s exactly what I was looking for. Thanks for the pointers!

  • @benedikt
    FWIW, the glyphNameFormatter in RF has a couple of functions that might be useful. It uses the glyphNamesToUnicodeAndCategories.txt names list which is buried somewhere in RF. This is not the full unicode list (I think CJ and K is not included), but it contains a lot of good stuff.

    import glyphNameFormatter.reader
    name = "flyingSaucer"
    value = 0x1F6F8
    # unicode to name
    > "flyingSaucer"
    # name to unicode
    > 128760
    # unicode to category
    > "So"
    # name to category
    > "So"