SOLVED Unicode Category of Glyph
- 
					
					
					
					
 Hey there, I am looking for the best/easiest way to get the unicode and the unicode category of a glyph. The unicode itself is part of the RGlyph object, but not the unicode category, right? I tried investigating @erik’s glyphNameFormatter and @jens’ RFUnicodeInfo, but did not find the right thing yet. Should I be looking elsewhere? Thanks in advance for your help! 
 
- 
					
					
					
					
 try: from glyphNameFormatter.data import unicodeCategories print(unicodeCategories[65])
 
- 
					
					
					
					
 @benedikt 
 FWIW, theglyphNameFormatterin RF has a couple of functions that might be useful. It uses theglyphNamesToUnicodeAndCategories.txtnames list which is buried somewhere in RF. This is not the full unicode list (I think CJ and K is not included), but it contains a lot of good stuff.import glyphNameFormatter.reader name = "flyingSaucer" value = 0x1F6F8 # unicode to name print(glyphNameFormatter.reader.u2n(value)) > "flyingSaucer" # name to unicode print(glyphNameFormatter.reader.n2u(name)) > 128760 # unicode to category print(glyphNameFormatter.reader.u2c(value)) > "So" # name to category print(glyphNameFormatter.reader.n2c(name)) > "So"
 
- 
					
					
					
					
 Wow, awesome. That’s exactly what I was looking for. Thanks for the pointers! 
 
- 
					
					
					
					
 try: from glyphNameFormatter.data import unicodeCategories print(unicodeCategories[65])
 
- 
					
					
					
					
 @frederik this returns the unicode range for a glyph, not the unicode category :) for example, the first codepoint in your script belongs to the Basic Latinrange, and the second toCyrillic. but both glyphs belong to the categoryLetter.
 
- 
					
					
					
					
 I guess glyphNameFormatter has the data you are looking for :) from glyphNameFormatter import GlyphName g = GlyphName(65) print(g.uniRangeName) g = GlyphName(1234) print(g.uniRangeName)
 
- 
					
					
					
					
 there’s a catch: in order look up the category, you’ll need to convert the unicode value from integer to hex: # load categories data from txt file filePath = '/Users/gferreira/Desktop/Categories.txt' with open(filePath, 'r') as f: rawData = f.readlines() # convert raw data into dict categories = {} for line in rawData: uni, gc, level1, level2, level3, level4, name = line.split('\t') categories[uni] = level1, level2, level3, level4 # get unicode for glyph g = CurrentGlyph() g.autoUnicodes() print(g.name) print(g.unicode) if g.unicode is not None: # convert unicode integer to hexadecimal uni = "%X" % g.unicode uni = uni.zfill(4) print(uni) # get category for unicode value if uni in categories: print(categories[uni])>>> fi >>> 64257 >>> FB01 >>> ('Letter', 'Ligature', '', '')
 
- 
					
					
					
					
 hello @benedikt to get the unicode for a glyph you can use RGlyph.autoUnicodes. to get unicode categories I would try using the data provided by the Unicode Consortium. I’ve found these: you can write a script to load data from Categories.txt, and then search for the unicode value to get its category.hope this helps! let us know if it works… 
 
 
			
		 
			
		