Find the unicode range name for a glyph (and other fun unicode things)

erik

When you're building a large multiscript characterset it may be necessary to find out which unicode range a specific glyph belongs to. Sometimes you can guess from the glyphname, but that is not very reliable.

The glyphNameFormatter.reader module (built in RF, thanks!) has a couple of useful functions (full list here), particularly the Unicode To Range, or u2r can help.

glyphNameFormatter.reader has a couple of special dicts: uni2name maps all unicodes with a GNUFL entry to anames. Its keys will then be "all the unicodes we can do something with"

from glyphNameFormatter.reader import *

allUnis = list(uni2name.keys())

for v in allUnis[-200:]:
    print(hex(v), chr(v), u2r(v))

This will print something like this:

0x1f6e3 🛣 Transport and Map Symbols
0x1f6e4 🛤 Transport and Map Symbols
...
0x1f6f8 🛸 Transport and Map Symbols
0x1f6f9 🛹 Transport and Map Symbols

Suppose you want to find all the unicode values that belong to the Armenian range (for instance).

from glyphNameFormatter.reader import *

allUnis = list(uni2name.keys())

for v in allUnis:
    if u2r(v) == "Armenian":
        print(hex(v), chr(v), u2r(v))

Prints this:

0x531 Ա Armenian
...
0x58f ֏ Armenian

This is how you can get a list of all the supported range names:

from glyphNameFormatter.reader import *
print(rangeNames)

['Basic Latin', 'Latin-1 Supplement', 'Latin Extended-A', 'Latin Extended-B', 'IPA Extensions', 'Spacing Modifier Letters', 'Combining Diacritical Marks', 'Greek and Coptic', 'Cyrillic', 'Cyrillic Supplement', 'Armenian', 'Hebrew', 'Arabic', 'Arabic Supplement', 'Devanagari', 'Bengali', 'Gurmukhi', 'Gujarati', 'Oriya', 'Tamil', 'Telugu', 'Kannada', 'Malayalam', 'Sinhala', 'Thai', 'Tibetan', 'Hangul Jamo', 'Ethiopic', 'Cherokee', 'Runic', 'Mongolian', 'Vedic Extensions', 'Phonetic Extensions', 'Phonetic Extensions Supplement', 'Combining Diacritical Marks Supplement', 'Latin Extended Additional', 'Greek Extended', 'General Punctuation', 'Superscripts and Subscripts', 'Currency Symbols', 'Letterlike Symbols', 'Number Forms', 'Arrows', 'Mathematical Operators', 'Miscellaneous Technical', 'Control Pictures', 'Optical Character Recognition', 'Enclosed Alphanumerics', 'Box Drawing', 'Block Elements', 'Geometric Shapes', 'Miscellaneous Symbols', 'Dingbats', 'Miscellaneous Mathematical Symbols-A', 'Braille Patterns', 'Glagolitic', 'Latin Extended-C', 'Supplemental Punctuation', 'CJK Symbols and Punctuation', 'Hiragana', 'Katakana', 'Bopomofo', 'Hangul Compatibility Jamo', 'Enclosed CJK Letters and Months', 'CJK Compatibility', 'Latin Extended-D', 'Javanese', 'Latin Extended-E', 'Cherokee Supplement', 'Private Use Area', 'Alphabetic Presentation Forms', 'Arabic Presentation Forms-A', 'Vertical Forms', 'CJK Compatibility Forms', 'Small Form Variants', 'Arabic Presentation Forms-B', 'Halfwidth and Fullwidth Forms', 'Specials', 'Zanabazar Square', 'Domino Tiles', 'Playing Cards', 'Enclosed Alphanumeric Supplement', 'Miscellaneous Symbols and Pictographs', 'Emoticons', 'Transport and Map Symbols']

Finally, if you want to find out more about these really useful functions in glyphNameFormatter.reader have a look at the github repo

Thanks!