osx - Working with Arabic in Python on MacOS -
i'm working on project has data in arabic. 1 task requires me create database mapping dicts. don't read arabic, of google translate , original english versions of data, i'm able surmise arabic strings map database columns.
the problem i'm facing python / macos / seems converting ligatures (?) in arabic when use copy/paste on them, leads code not recognizing of dicts.
i believe have way around problem, given nature of work i'm doing, understand happening.
the original arabic key looks this:
however, when copy/paste on macos, converts following:
google translate, macos, safari, etc... seem think these equivalent text, python disagrees , throws keyerror when encounters original (due system having converted second version. if paste here, converts: الفئة
is there way work text @ system level not end being converted python doesn't recognize?
in case finds , runs similar problem...
what needed parse through 350k structured arabic records (though not same schema), extract key values, map them english database column names, , insert original records table. thinking laziness work, created set of unique keys, printed screen, copy/pasted text editor, converted dict, , used arabic words dict keys , english column names values. except, did not notice when pasted set of arabic field names system "fixed" arabic misspellings, resulting in key names no longer recognized when parsing records.
to fix problem, instead of printing arabic column names (there 32 of them) screen, created sqlite database , inserted them table included blank "standardized" column. went sqlite , updated records map english arabic. read table python , created lookup dict used when parsing full data payload. inserting arabic sqlite did not "correct" misspellings me, , hence, records extracted there served accurate lookup.
the lookup table ended looking this:
in spite of trying, never figured out how macos stop correcting misspelled arabic.
Comments
Post a Comment