SOLVED Unicode Category of Glyph
-
Hey there,
I am looking for the best/easiest way to get the unicode and the unicode category of a glyph. The unicode itself is part of the RGlyph object, but not the unicode category, right?
I tried investigating @erik’s glyphNameFormatter and @jens’ RFUnicodeInfo, but did not find the right thing yet. Should I be looking elsewhere?
Thanks in advance for your help!
-
try:
from glyphNameFormatter.data import unicodeCategories print(unicodeCategories[65])
-
@benedikt
FWIW, theglyphNameFormatter
in RF has a couple of functions that might be useful. It uses theglyphNamesToUnicodeAndCategories.txt
names list which is buried somewhere in RF. This is not the full unicode list (I think CJ and K is not included), but it contains a lot of good stuff.import glyphNameFormatter.reader name = "flyingSaucer" value = 0x1F6F8 # unicode to name print(glyphNameFormatter.reader.u2n(value)) > "flyingSaucer" # name to unicode print(glyphNameFormatter.reader.n2u(name)) > 128760 # unicode to category print(glyphNameFormatter.reader.u2c(value)) > "So" # name to category print(glyphNameFormatter.reader.n2c(name)) > "So"
-
Wow, awesome. That’s exactly what I was looking for. Thanks for the pointers!
-
try:
from glyphNameFormatter.data import unicodeCategories print(unicodeCategories[65])
-
@frederik this returns the unicode range for a glyph, not the unicode category :)
for example, the first codepoint in your script belongs to the
Basic Latin
range, and the second toCyrillic
. but both glyphs belong to the categoryLetter
.
-
I guess glyphNameFormatter has the data you are looking for :)
from glyphNameFormatter import GlyphName g = GlyphName(65) print(g.uniRangeName) g = GlyphName(1234) print(g.uniRangeName)
-
there’s a catch: in order look up the category, you’ll need to convert the unicode value from integer to hex:
# load categories data from txt file filePath = '/Users/gferreira/Desktop/Categories.txt' with open(filePath, 'r') as f: rawData = f.readlines() # convert raw data into dict categories = {} for line in rawData: uni, gc, level1, level2, level3, level4, name = line.split('\t') categories[uni] = level1, level2, level3, level4 # get unicode for glyph g = CurrentGlyph() g.autoUnicodes() print(g.name) print(g.unicode) if g.unicode is not None: # convert unicode integer to hexadecimal uni = "%X" % g.unicode uni = uni.zfill(4) print(uni) # get category for unicode value if uni in categories: print(categories[uni])
>>> fi >>> 64257 >>> FB01 >>> ('Letter', 'Ligature', '', '')
-
hello @benedikt
to get the unicode for a glyph you can use RGlyph.autoUnicodes.
to get unicode categories I would try using the data provided by the Unicode Consortium. I’ve found these:
you can write a script to load data from
Categories.txt
, and then search for the unicode value to get its category.hope this helps! let us know if it works…