NIIZA, Japan--(BUSINESS WIRE)--The CJK Dictionary Institute, Inc. (CJKI), which specializes in the compilation of large-scale CJK (Chinese, Japanese and Korean) and Arabic lexical resources, today announced a major update to its Database of Arabic Names, or DAN for short. Compiled using linguistic knowledge together with sophisticated data collection and validation techniques, DAN is expected to be ready for release in March of this year, and will contain over a million unique romanized names.
As reported in a September 2007 story on CNN, a Justice Department inspector general's report regarding the performance of the Terrorist Screening Center (TSC), which maintains the consolidated watchlist database used by all government agencies, found serious shortcomings. According to the Inspector General for the U.S. Department of Justice Glenn Fine, "It is critical that the TSC further improve the quality of its watchlist data because of the consequences of inaccurate or missing information. Inaccurate, incomplete, and obsolete watchlist information can increase the risk of not identifying known or suspected terrorists, and it can also increase the risk that innocent persons will be stopped or detained."
One of the reasons the watchlists contain errors is the huge number of name variants that occur when transcribing names into English. This problem is particularly acute with Arabic names, due to the high level of ambiguity and great complexity of the Arabic script. For example, there are over 100 ways that the name Mohammed can be transcribed.
"While the U.S. Congress some time ago passed a law requiring the transcription of Arabic names to be standardized, this has been bogged down by difficulties because of the inherent ambiguities of unvocalized Arabic and the bewildering variety of ways to spell Arabic names in English," according to Jack Halpern, CEO of The CJK Dictionary Institute, Inc., who has been compiling large-scale databases for over 10 years.
These difficulties, and the recent upsurge in demand for comprehensive Arabic linguistic resources, have recently led Halpern and his team of specially trained native Arabic-speaking editors to dramatically expand CJKI's Database of Arabic Names (DAN). According to CJKI's Tyler Reid, who is managing the project, "For the last four years we have been compiling DAN on the basis of various resources. In addition, we have a set of tools fine-tuned over the years for processing Arabic names, as well as the cooperation of specialists in Arabic information processing. It's important to note that the database contains unique first and last names, like 'Mohamed' and 'ElBaradei,' and not full names like 'Mohamed ElBaradei'."
According to a CNN report in October of last year, the terrorist watchlist now contains over 755,000 records, and continues to grow substantially each month. Halpern expects that, as the list grows, so will the need to eliminate false positives on the one hand, and to reduce the risk of not identifying those that are on the list on the other.
In addition to its importance in security applications, DAN can also be used for anti-money laundering and customer identity management systems. More information on DAN, including database samples which highlight the number of potential variants, can be found at http://www.cjk.org/cjk/arabic/dan.htm.
About The CJK Dictionary Institute, Inc.
The CJK Dictionary Institute, Inc. is one of the world's prime resources for CJK (Chinese, Japanese and Korean) and Arabic lexical resources, and is contributing to information processing technology by providing high-quality lexical resources and consulting services to some of the world's leading software developers and IT companies, including Fujitsu, Sony, Google, Microsoft, Yahoo and Amazon. For more information, please visit http://www.cjk.org.