So it looks like the problem breaks down into creating a suitable data structure from the contents of the UnicodeData.txt file. CPython uses a python script (what else?) to create a C header file with the contents of various data structures. So all I need to do is probably one of the following:
- Use the same approach, generate a very similar structure and port the existing C code that accesses the data structures into Java. Not very appealing.
- Do something similar, and create a class for each code point. Probably not very good from a resource perspective (OK, potentially premature optmisation since I haven't measured it, but the UnicodeData.txt file is 817k, so that's some data structure). That could be useful from a LearningTest perspective though; e.g. can I use java.lang.Character, or do I need something else entirely.
- Use something in existing core Java libraries.
- Use a third-party library.
- Something else, that I haven't bothered to think about what it could be.
Just one problem. I don't fully understand what is required yet. From reading the UnicodeData commentary, that indicates to me the reasons why the below tests are fine.
2468;CIRCLED DIGIT NINE;No;0;EN;
(UnicodeData.txt entry for code-point 0x2468)
verify(unicodedata.decimal(u'\u2468',None) is None)
verify(unicodedata.digit(u'\u2468') == 9)
verify(unicodedata.numeric(u'\u2468') == 9.0)
and those tests pass (for CPython - I haven't implemented the Jython version yet!). From the file entry and commentary, that code-point appears to have no decimal digit value, a digit value of 9 and a numeric value of 9. The tests confirm that. I don't understand why these don't also pass.
325F;CIRCLED NUMBER THIRTY FIVE;No;0;ON;
(UnicodeData.txt entry for code-point 0x325F)
verify(unicodedata.decimal(u'\u325F',None) is None)
verify(unicodedata.digit(u'\u325F', None) is None)
verify(unicodedata.numeric(u'\u325F') == 35.0)
The last one fails with:
Traceback (most recent call last):
ValueError: not a numeric character
Evidently I need to delve deeper into the spec, or start asking more knowledgeable people some questions.