Old 01-08-2019, 05:51 AM   #1
amagalma
Human being with feelings
 
Join Date: Apr 2011
Posts: 1,455
Default Lua os. functions with Unicode. How?

Hello!


Lua does not seem to support unicode characters when using os. functions like os.rename, os.remove etc.


Does anyone know how to work-around this?


Thanks!
amagalma is offline   Reply With Quote
Old 01-09-2019, 12:48 PM   #2
amagalma
Human being with feelings
 
Join Date: Apr 2011
Posts: 1,455
Default

Any ideas?
amagalma is offline   Reply With Quote
Old 01-10-2019, 10:06 AM   #3
mespotine
Human being with feelings
 
mespotine's Avatar
 
Join Date: May 2017
Location: Leipzig, Germany
Posts: 679
Default

As far as I know, Lua can only do UTF8:

https://www.lua.org/manual/5.3/manual.html#6.5
__________________
Ultraschall-API: https://forum.cockos.com/showthread....98#post2067798
Reaper Internals - Developerdocs for Reaper: https://forum.cockos.com/showthread.php?t=207635
mespotine is offline   Reply With Quote
Old 01-10-2019, 10:41 AM   #4
Xenakios
Human being with feelings
 
Xenakios's Avatar
 
Join Date: Feb 2007
Location: Oulu, Finland
Posts: 7,527
Default

On Windows the problem probably is that Lua's OS lib uses the C functions rename() etc which do not support UTF-8. It could be made to work if the incoming UTF-8 strings from the scripts were first converted into Unicode and the wide versions of the functions like _wrename were used. (Obviously this is not a script side solution, the C code of Lua itself would need to be changed. ReaScript users can't do that.)

Interesting that this hasn't been reported as a problem for the Lua developers a long time ago...?

A possible solution in the context of ReaScript could be that Reaper would provide its own functions for file renaming, deletion etc.
__________________
For info on SWS Reaper extension plugin (including Xenakios' previous extension/actions) :
http://www.sws-extension.org/
https://github.com/Jeff0S/sws
--
Xenakios blog (about HourGlass, Paul(X)Stretch and λ) :
http://xenakios.wordpress.com/
Xenakios is online now   Reply With Quote
Old 01-10-2019, 11:23 AM   #5
Xenakios
Human being with feelings
 
Xenakios's Avatar
 
Join Date: Feb 2007
Location: Oulu, Finland
Posts: 7,527
Default

I was able to do a fix for rename() in Lua's C code. But it would probably take years of discussions and code reviews to get this included in the Lua language code base...So it's easiest and fastest if Cockos fixes this in some way for ReaScript.

edit : Justin is on it!
__________________
For info on SWS Reaper extension plugin (including Xenakios' previous extension/actions) :
http://www.sws-extension.org/
https://github.com/Jeff0S/sws
--
Xenakios blog (about HourGlass, Paul(X)Stretch and λ) :
http://xenakios.wordpress.com/

Last edited by Xenakios; 01-10-2019 at 11:45 AM.
Xenakios is online now   Reply With Quote
Old 01-10-2019, 01:40 PM   #6
amagalma
Human being with feelings
 
Join Date: Apr 2011
Posts: 1,455
Default

Thank you guys for your responses!

Quote:
Originally Posted by Xenakios View Post
edit : Justin is on it!
Great!

.. Meanwhile, a fine gentleman at Stack Overflow made it possible to use lua io functions with the Greek codepage 1253 using only stock Lua.

I modified very slightly the code and it can be used with other codepages, as long as 1) the mapping table is provided (it's not so difficult to create it with regex) and 2) the wanted codepage is defined internally in the script by the user.

But of course, if Justin could come up with new functions supporting unicode out of the box for Reaper, then this would be the best!


if reaper.GetOS():match("Win") then

local char, byte, table_insert, table_concat = string.char, string.byte, table.insert, table.concat

-- TABLES OF CODEPAGES

local cp1253 = { -- GREEK
[0x20AC] = 0x80, -- EURO SIGN
[0x201A] = 0x82, -- SINGLE LOW-9 QUOTATION MARK
[0x0192] = 0x83, -- LATIN SMALL LETTER F WITH HOOK
[0x201E] = 0x84, -- DOUBLE LOW-9 QUOTATION MARK
[0x2026] = 0x85, -- HORIZONTAL ELLIPSIS
[0x2020] = 0x86, -- DAGGER
[0x2021] = 0x87, -- DOUBLE DAGGER
[0x2030] = 0x89, -- PER MILLE SIGN
[0x2039] = 0x8B, -- SINGLE LEFT-POINTING ANGLE QUOTATION MARK
[0x2018] = 0x91, -- LEFT SINGLE QUOTATION MARK
[0x2019] = 0x92, -- RIGHT SINGLE QUOTATION MARK
[0x201C] = 0x93, -- LEFT DOUBLE QUOTATION MARK
[0x201D] = 0x94, -- RIGHT DOUBLE QUOTATION MARK
[0x2022] = 0x95, -- BULLET
[0x2013] = 0x96, -- EN DASH
[0x2014] = 0x97, -- EM DASH
[0x2122] = 0x99, -- TRADE MARK SIGN
[0x203A] = 0x9B, -- SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
[0x00A0] = 0xA0, -- NO-BREAK SPACE
[0x0385] = 0xA1, -- GREEK DIALYTIKA TONOS
[0x0386] = 0xA2, -- GREEK CAPITAL LETTER ALPHA WITH TONOS
[0x00A3] = 0xA3, -- POUND SIGN
[0x00A4] = 0xA4, -- CURRENCY SIGN
[0x00A5] = 0xA5, -- YEN SIGN
[0x00A6] = 0xA6, -- BROKEN BAR
[0x00A7] = 0xA7, -- SECTION SIGN
[0x00A8] = 0xA8, -- DIAERESIS
[0x00A9] = 0xA9, -- COPYRIGHT SIGN
[0x00AB] = 0xAB, -- LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
[0x00AC] = 0xAC, -- NOT SIGN
[0x00AD] = 0xAD, -- SOFT HYPHEN
[0x00AE] = 0xAE, -- REGISTERED SIGN
[0x2015] = 0xAF, -- HORIZONTAL BAR
[0x00B0] = 0xB0, -- DEGREE SIGN
[0x00B1] = 0xB1, -- PLUS-MINUS SIGN
[0x00B2] = 0xB2, -- SUPERSCRIPT TWO
[0x00B3] = 0xB3, -- SUPERSCRIPT THREE
[0x0384] = 0xB4, -- GREEK TONOS
[0x00B5] = 0xB5, -- MICRO SIGN
[0x00B6] = 0xB6, -- PILCROW SIGN
[0x00B7] = 0xB7, -- MIDDLE DOT
[0x0388] = 0xB8, -- GREEK CAPITAL LETTER EPSILON WITH TONOS
[0x0389] = 0xB9, -- GREEK CAPITAL LETTER ETA WITH TONOS
[0x038A] = 0xBA, -- GREEK CAPITAL LETTER IOTA WITH TONOS
[0x00BB] = 0xBB, -- RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
[0x038C] = 0xBC, -- GREEK CAPITAL LETTER OMICRON WITH TONOS
[0x00BD] = 0xBD, -- VULGAR FRACTION ONE HALF
[0x038E] = 0xBE, -- GREEK CAPITAL LETTER UPSILON WITH TONOS
[0x038F] = 0xBF, -- GREEK CAPITAL LETTER OMEGA WITH TONOS
[0x0390] = 0xC0, -- GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
[0x0391] = 0xC1, -- GREEK CAPITAL LETTER ALPHA
[0x0392] = 0xC2, -- GREEK CAPITAL LETTER BETA
[0x0393] = 0xC3, -- GREEK CAPITAL LETTER GAMMA
[0x0394] = 0xC4, -- GREEK CAPITAL LETTER DELTA
[0x0395] = 0xC5, -- GREEK CAPITAL LETTER EPSILON
[0x0396] = 0xC6, -- GREEK CAPITAL LETTER ZETA
[0x0397] = 0xC7, -- GREEK CAPITAL LETTER ETA
[0x0398] = 0xC8, -- GREEK CAPITAL LETTER THETA
[0x0399] = 0xC9, -- GREEK CAPITAL LETTER IOTA
[0x039A] = 0xCA, -- GREEK CAPITAL LETTER KAPPA
[0x039B] = 0xCB, -- GREEK CAPITAL LETTER LAMDA
[0x039C] = 0xCC, -- GREEK CAPITAL LETTER MU
[0x039D] = 0xCD, -- GREEK CAPITAL LETTER NU
[0x039E] = 0xCE, -- GREEK CAPITAL LETTER XI
[0x039F] = 0xCF, -- GREEK CAPITAL LETTER OMICRON
[0x03A0] = 0xD0, -- GREEK CAPITAL LETTER PI
[0x03A1] = 0xD1, -- GREEK CAPITAL LETTER RHO
[0x03A3] = 0xD3, -- GREEK CAPITAL LETTER SIGMA
[0x03A4] = 0xD4, -- GREEK CAPITAL LETTER TAU
[0x03A5] = 0xD5, -- GREEK CAPITAL LETTER UPSILON
[0x03A6] = 0xD6, -- GREEK CAPITAL LETTER PHI
[0x03A7] = 0xD7, -- GREEK CAPITAL LETTER CHI
[0x03A8] = 0xD8, -- GREEK CAPITAL LETTER PSI
[0x03A9] = 0xD9, -- GREEK CAPITAL LETTER OMEGA
[0x03AA] = 0xDA, -- GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
[0x03AB] = 0xDB, -- GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
[0x03AC] = 0xDC, -- GREEK SMALL LETTER ALPHA WITH TONOS
[0x03AD] = 0xDD, -- GREEK SMALL LETTER EPSILON WITH TONOS
[0x03AE] = 0xDE, -- GREEK SMALL LETTER ETA WITH TONOS
[0x03AF] = 0xDF, -- GREEK SMALL LETTER IOTA WITH TONOS
[0x03B0] = 0xE0, -- GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
[0x03B1] = 0xE1, -- GREEK SMALL LETTER ALPHA
[0x03B2] = 0xE2, -- GREEK SMALL LETTER BETA
[0x03B3] = 0xE3, -- GREEK SMALL LETTER GAMMA
[0x03B4] = 0xE4, -- GREEK SMALL LETTER DELTA
[0x03B5] = 0xE5, -- GREEK SMALL LETTER EPSILON
[0x03B6] = 0xE6, -- GREEK SMALL LETTER ZETA
[0x03B7] = 0xE7, -- GREEK SMALL LETTER ETA
[0x03B8] = 0xE8, -- GREEK SMALL LETTER THETA
[0x03B9] = 0xE9, -- GREEK SMALL LETTER IOTA
[0x03BA] = 0xEA, -- GREEK SMALL LETTER KAPPA
[0x03BB] = 0xEB, -- GREEK SMALL LETTER LAMDA
[0x03BC] = 0xEC, -- GREEK SMALL LETTER MU
[0x03BD] = 0xED, -- GREEK SMALL LETTER NU
[0x03BE] = 0xEE, -- GREEK SMALL LETTER XI
[0x03BF] = 0xEF, -- GREEK SMALL LETTER OMICRON
[0x03C0] = 0xF0, -- GREEK SMALL LETTER PI
[0x03C1] = 0xF1, -- GREEK SMALL LETTER RHO
[0x03C2] = 0xF2, -- GREEK SMALL LETTER FINAL SIGMA
[0x03C3] = 0xF3, -- GREEK SMALL LETTER SIGMA
[0x03C4] = 0xF4, -- GREEK SMALL LETTER TAU
[0x03C5] = 0xF5, -- GREEK SMALL LETTER UPSILON
[0x03C6] = 0xF6, -- GREEK SMALL LETTER PHI
[0x03C7] = 0xF7, -- GREEK SMALL LETTER CHI
[0x03C8] = 0xF8, -- GREEK SMALL LETTER PSI
[0x03C9] = 0xF9, -- GREEK SMALL LETTER OMEGA
[0x03CA] = 0xFA, -- GREEK SMALL LETTER IOTA WITH DIALYTIKA
[0x03CB] = 0xFB, -- GREEK SMALL LETTER UPSILON WITH DIALYTIKA
[0x03CC] = 0xFC, -- GREEK SMALL LETTER OMICRON WITH TONOS
[0x03CD] = 0xFD, -- GREEK SMALL LETTER UPSILON WITH TONOS
[0x03CE] = 0xFE, -- GREEK SMALL LETTER OMEGA WITH TONOS
}

local locale = tonumber(string.match(os.setlocale(), "(%d+)$"))
local CODEPAGE

-- Use appropriate locale
if locale == 1250 then -- CENTRAL/EASTERN EUROPEAN
CODEPAGE = cp1250
elseif locale == 1251 then -- CYRILLIC
CODEPAGE = cp1251
elseif locale == 1253 then -- GREEK
CODEPAGE = cp1253
elseif locale == 1254 then -- TURKISH
CODEPAGE = cp1254
elseif locale == 1255 then -- HEBREW
CODEPAGE = cp1255
-- etc
end

local function utf8_to_unicode(utf8str, pos)
-- pos = starting byte position inside input string (default 1)
pos = pos or 1
local code, size = byte(utf8str, pos), 1
if code >= 0xC0 and code < 0xFE then
local mask = 64
code = code - 128
repeat
local next_byte = byte(utf8str, pos + size) or 0
if next_byte >= 0x80 and next_byte < 0xC0 then
code, size = (code - mask - 2) * 64 + next_byte, size + 1
else
code, size = byte(utf8str, pos), 1
end
mask = mask * 32
until code < mask
end
-- returns code, number of bytes in this utf8 char
return code, size
end

local function utf8_to_codepage(utf8str)
local pos, result_codepage = 1, {}
while pos <= #utf8str do
local code, size = utf8_to_unicode(utf8str, pos)
pos = pos + size
code = code < 128 and code or CODEPAGE[code] or byte('?')
table_insert(result_codepage, char(code))
end
return table_concat(result_codepage)
end

local orig_os_rename = os.rename

function os.rename(old, new)
return orig_os_rename(utf8_to_codepage(old), utf8_to_codepage(new))
end

local orig_os_remove = os.remove

function os.remove(filename)
return orig_os_remove(utf8_to_codepage(filename))
end

local orig_os_execute = os.execute

function os.execute(command)
if command then
command = utf8_to_codepage(command)
end
return orig_os_execute(command)
end

local orig_io_open = io.open

function io.open(filename, ...)
return orig_io_open(utf8_to_codepage(filename), ...)
end

local orig_io_popen = io.popen

function io.popen(prog, ...)
return orig_io_popen(utf8_to_codepage(prog), ...)
end

local orig_io_lines = io.lines

function io.lines(filename, ...)
if filename then
filename = utf8_to_codepage(filename)
end
return orig_io_lines(filename, ...)
end

end

Last edited by amagalma; 01-10-2019 at 03:25 PM.
amagalma is offline   Reply With Quote
Old 01-10-2019, 03:27 PM   #7
amagalma
Human being with feelings
 
Join Date: Apr 2011
Posts: 1,455
Default

And some other codepage tables:

Code:
  local cp1250 = { -- CENTRAL/EASTERN EUROPEAN
    [0x20AC] = 0x80,  -- EURO SIGN
    [0x201A] = 0x82,  -- SINGLE LOW-9 QUOTATION MARK
    [0x201E] = 0x84,  -- DOUBLE LOW-9 QUOTATION MARK
    [0x2026] = 0x85,  -- HORIZONTAL ELLIPSIS
    [0x2020] = 0x86,  -- DAGGER
    [0x2021] = 0x87,  -- DOUBLE DAGGER
    [0x2030] = 0x89,  -- PER MILLE SIGN
    [0x0160] = 0x8A,  -- LATIN CAPITAL LETTER S WITH CARON
    [0x2039] = 0x8B,  -- SINGLE LEFT-POINTING ANGLE QUOTATION MARK
    [0x015A] = 0x8C,  -- LATIN CAPITAL LETTER S WITH ACUTE
    [0x0164] = 0x8D,  -- LATIN CAPITAL LETTER T WITH CARON
    [0x017D] = 0x8E,  -- LATIN CAPITAL LETTER Z WITH CARON
    [0x0179] = 0x8F,  -- LATIN CAPITAL LETTER Z WITH ACUTE
    [0x2018] = 0x91,  -- LEFT SINGLE QUOTATION MARK
    [0x2019] = 0x92,  -- RIGHT SINGLE QUOTATION MARK
    [0x201C] = 0x93,  -- LEFT DOUBLE QUOTATION MARK
    [0x201D] = 0x94,  -- RIGHT DOUBLE QUOTATION MARK
    [0x2022] = 0x95,  -- BULLET
    [0x2013] = 0x96,  -- EN DASH
    [0x2014] = 0x97,  -- EM DASH
    [0x2122] = 0x99,  -- TRADE MARK SIGN
    [0x0161] = 0x9A,  -- LATIN SMALL LETTER S WITH CARON
    [0x203A] = 0x9B,  -- SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
    [0x015B] = 0x9C,  -- LATIN SMALL LETTER S WITH ACUTE
    [0x0165] = 0x9D,  -- LATIN SMALL LETTER T WITH CARON
    [0x017E] = 0x9E,  -- LATIN SMALL LETTER Z WITH CARON
    [0x017A] = 0x9F,  -- LATIN SMALL LETTER Z WITH ACUTE
    [0x00A0] = 0xA0,  -- NO-BREAK SPACE
    [0x02C7] = 0xA1,  -- CARON
    [0x02D8] = 0xA2,  -- BREVE
    [0x0141] = 0xA3,  -- LATIN CAPITAL LETTER L WITH STROKE
    [0x00A4] = 0xA4,  -- CURRENCY SIGN
    [0x0104] = 0xA5,  -- LATIN CAPITAL LETTER A WITH OGONEK
    [0x00A6] = 0xA6,  -- BROKEN BAR
    [0x00A7] = 0xA7,  -- SECTION SIGN
    [0x00A8] = 0xA8,  -- DIAERESIS
    [0x00A9] = 0xA9,  -- COPYRIGHT SIGN
    [0x015E] = 0xAA,  -- LATIN CAPITAL LETTER S WITH CEDILLA
    [0x00AB] = 0xAB,  -- LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
    [0x00AC] = 0xAC,  -- NOT SIGN
    [0x00AD] = 0xAD,  -- SOFT HYPHEN
    [0x00AE] = 0xAE,  -- REGISTERED SIGN
    [0x017B] = 0xAF,  -- LATIN CAPITAL LETTER Z WITH DOT ABOVE
    [0x00B0] = 0xB0,  -- DEGREE SIGN
    [0x00B1] = 0xB1,  -- PLUS-MINUS SIGN
    [0x02DB] = 0xB2,  -- OGONEK
    [0x0142] = 0xB3,  -- LATIN SMALL LETTER L WITH STROKE
    [0x00B4] = 0xB4,  -- ACUTE ACCENT
    [0x00B5] = 0xB5,  -- MICRO SIGN
    [0x00B6] = 0xB6,  -- PILCROW SIGN
    [0x00B7] = 0xB7,  -- MIDDLE DOT
    [0x00B8] = 0xB8,  -- CEDILLA
    [0x0105] = 0xB9,  -- LATIN SMALL LETTER A WITH OGONEK
    [0x015F] = 0xBA,  -- LATIN SMALL LETTER S WITH CEDILLA
    [0x00BB] = 0xBB,  -- RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
    [0x013D] = 0xBC,  -- LATIN CAPITAL LETTER L WITH CARON
    [0x02DD] = 0xBD,  -- DOUBLE ACUTE ACCENT
    [0x013E] = 0xBE,  -- LATIN SMALL LETTER L WITH CARON
    [0x017C] = 0xBF,  -- LATIN SMALL LETTER Z WITH DOT ABOVE
    [0x0154] = 0xC0,  -- LATIN CAPITAL LETTER R WITH ACUTE
    [0x00C1] = 0xC1,  -- LATIN CAPITAL LETTER A WITH ACUTE
    [0x00C2] = 0xC2,  -- LATIN CAPITAL LETTER A WITH CIRCUMFLEX
    [0x0102] = 0xC3,  -- LATIN CAPITAL LETTER A WITH BREVE
    [0x00C4] = 0xC4,  -- LATIN CAPITAL LETTER A WITH DIAERESIS
    [0x0139] = 0xC5,  -- LATIN CAPITAL LETTER L WITH ACUTE
    [0x0106] = 0xC6,  -- LATIN CAPITAL LETTER C WITH ACUTE
    [0x00C7] = 0xC7,  -- LATIN CAPITAL LETTER C WITH CEDILLA
    [0x010C] = 0xC8,  -- LATIN CAPITAL LETTER C WITH CARON
    [0x00C9] = 0xC9,  -- LATIN CAPITAL LETTER E WITH ACUTE
    [0x0118] = 0xCA,  -- LATIN CAPITAL LETTER E WITH OGONEK
    [0x00CB] = 0xCB,  -- LATIN CAPITAL LETTER E WITH DIAERESIS
    [0x011A] = 0xCC,  -- LATIN CAPITAL LETTER E WITH CARON
    [0x00CD] = 0xCD,  -- LATIN CAPITAL LETTER I WITH ACUTE
    [0x00CE] = 0xCE,  -- LATIN CAPITAL LETTER I WITH CIRCUMFLEX
    [0x010E] = 0xCF,  -- LATIN CAPITAL LETTER D WITH CARON
    [0x0110] = 0xD0,  -- LATIN CAPITAL LETTER D WITH STROKE
    [0x0143] = 0xD1,  -- LATIN CAPITAL LETTER N WITH ACUTE
    [0x0147] = 0xD2,  -- LATIN CAPITAL LETTER N WITH CARON
    [0x00D3] = 0xD3,  -- LATIN CAPITAL LETTER O WITH ACUTE
    [0x00D4] = 0xD4,  -- LATIN CAPITAL LETTER O WITH CIRCUMFLEX
    [0x0150] = 0xD5,  -- LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
    [0x00D6] = 0xD6,  -- LATIN CAPITAL LETTER O WITH DIAERESIS
    [0x00D7] = 0xD7,  -- MULTIPLICATION SIGN
    [0x0158] = 0xD8,  -- LATIN CAPITAL LETTER R WITH CARON
    [0x016E] = 0xD9,  -- LATIN CAPITAL LETTER U WITH RING ABOVE
    [0x00DA] = 0xDA,  -- LATIN CAPITAL LETTER U WITH ACUTE
    [0x0170] = 0xDB,  -- LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
    [0x00DC] = 0xDC,  -- LATIN CAPITAL LETTER U WITH DIAERESIS
    [0x00DD] = 0xDD,  -- LATIN CAPITAL LETTER Y WITH ACUTE
    [0x0162] = 0xDE,  -- LATIN CAPITAL LETTER T WITH CEDILLA
    [0x00DF] = 0xDF,  -- LATIN SMALL LETTER SHARP S
    [0x0155] = 0xE0,  -- LATIN SMALL LETTER R WITH ACUTE
    [0x00E1] = 0xE1,  -- LATIN SMALL LETTER A WITH ACUTE
    [0x00E2] = 0xE2,  -- LATIN SMALL LETTER A WITH CIRCUMFLEX
    [0x0103] = 0xE3,  -- LATIN SMALL LETTER A WITH BREVE
    [0x00E4] = 0xE4,  -- LATIN SMALL LETTER A WITH DIAERESIS
    [0x013A] = 0xE5,  -- LATIN SMALL LETTER L WITH ACUTE
    [0x0107] = 0xE6,  -- LATIN SMALL LETTER C WITH ACUTE
    [0x00E7] = 0xE7,  -- LATIN SMALL LETTER C WITH CEDILLA
    [0x010D] = 0xE8,  -- LATIN SMALL LETTER C WITH CARON
    [0x00E9] = 0xE9,  -- LATIN SMALL LETTER E WITH ACUTE
    [0x0119] = 0xEA,  -- LATIN SMALL LETTER E WITH OGONEK
    [0x00EB] = 0xEB,  -- LATIN SMALL LETTER E WITH DIAERESIS
    [0x011B] = 0xEC,  -- LATIN SMALL LETTER E WITH CARON
    [0x00ED] = 0xED,  -- LATIN SMALL LETTER I WITH ACUTE
    [0x00EE] = 0xEE,  -- LATIN SMALL LETTER I WITH CIRCUMFLEX
    [0x010F] = 0xEF,  -- LATIN SMALL LETTER D WITH CARON
    [0x0111] = 0xF0,  -- LATIN SMALL LETTER D WITH STROKE
    [0x0144] = 0xF1,  -- LATIN SMALL LETTER N WITH ACUTE
    [0x0148] = 0xF2,  -- LATIN SMALL LETTER N WITH CARON
    [0x00F3] = 0xF3,  -- LATIN SMALL LETTER O WITH ACUTE
    [0x00F4] = 0xF4,  -- LATIN SMALL LETTER O WITH CIRCUMFLEX
    [0x0151] = 0xF5,  -- LATIN SMALL LETTER O WITH DOUBLE ACUTE
    [0x00F6] = 0xF6,  -- LATIN SMALL LETTER O WITH DIAERESIS
    [0x00F7] = 0xF7,  -- DIVISION SIGN
    [0x0159] = 0xF8,  -- LATIN SMALL LETTER R WITH CARON
    [0x016F] = 0xF9,  -- LATIN SMALL LETTER U WITH RING ABOVE
    [0x00FA] = 0xFA,  -- LATIN SMALL LETTER U WITH ACUTE
    [0x0171] = 0xFB,  -- LATIN SMALL LETTER U WITH DOUBLE ACUTE
    [0x00FC] = 0xFC,  -- LATIN SMALL LETTER U WITH DIAERESIS
    [0x00FD] = 0xFD,  -- LATIN SMALL LETTER Y WITH ACUTE
    [0x0163] = 0xFE,  -- LATIN SMALL LETTER T WITH CEDILLA
    [0x02D9] = 0xFF,  -- DOT ABOVE
  }
amagalma is offline   Reply With Quote
Old 01-10-2019, 03:28 PM   #8
amagalma
Human being with feelings
 
Join Date: Apr 2011
Posts: 1,455
Default

Code:
 local cp1251 = { -- CYRILLIC
    [0x0402] = 0x80,  -- CYRILLIC CAPITAL LETTER DJE
    [0x0403] = 0x81,  -- CYRILLIC CAPITAL LETTER GJE
    [0x201A] = 0x82,  -- SINGLE LOW-9 QUOTATION MARK
    [0x0453] = 0x83,  -- CYRILLIC SMALL LETTER GJE
    [0x201E] = 0x84,  -- DOUBLE LOW-9 QUOTATION MARK
    [0x2026] = 0x85,  -- HORIZONTAL ELLIPSIS
    [0x2020] = 0x86,  -- DAGGER
    [0x2021] = 0x87,  -- DOUBLE DAGGER
    [0x20AC] = 0x88,  -- EURO SIGN
    [0x2030] = 0x89,  -- PER MILLE SIGN
    [0x0409] = 0x8A,  -- CYRILLIC CAPITAL LETTER LJE
    [0x2039] = 0x8B,  -- SINGLE LEFT-POINTING ANGLE QUOTATION MARK
    [0x040A] = 0x8C,  -- CYRILLIC CAPITAL LETTER NJE
    [0x040C] = 0x8D,  -- CYRILLIC CAPITAL LETTER KJE
    [0x040B] = 0x8E,  -- CYRILLIC CAPITAL LETTER TSHE
    [0x040F] = 0x8F,  -- CYRILLIC CAPITAL LETTER DZHE
    [0x0452] = 0x90,  -- CYRILLIC SMALL LETTER DJE
    [0x2018] = 0x91,  -- LEFT SINGLE QUOTATION MARK
    [0x2019] = 0x92,  -- RIGHT SINGLE QUOTATION MARK
    [0x201C] = 0x93,  -- LEFT DOUBLE QUOTATION MARK
    [0x201D] = 0x94,  -- RIGHT DOUBLE QUOTATION MARK
    [0x2022] = 0x95,  -- BULLET
    [0x2013] = 0x96,  -- EN DASH
    [0x2014] = 0x97,  -- EM DASH
    [0x2122] = 0x99,  -- TRADE MARK SIGN
    [0x0459] = 0x9A,  -- CYRILLIC SMALL LETTER LJE
    [0x203A] = 0x9B,  -- SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
    [0x045A] = 0x9C,  -- CYRILLIC SMALL LETTER NJE
    [0x045C] = 0x9D,  -- CYRILLIC SMALL LETTER KJE
    [0x045B] = 0x9E,  -- CYRILLIC SMALL LETTER TSHE
    [0x045F] = 0x9F,  -- CYRILLIC SMALL LETTER DZHE
    [0x00A0] = 0xA0,  -- NO-BREAK SPACE
    [0x040E] = 0xA1,  -- CYRILLIC CAPITAL LETTER SHORT U
    [0x045E] = 0xA2,  -- CYRILLIC SMALL LETTER SHORT U
    [0x0408] = 0xA3,  -- CYRILLIC CAPITAL LETTER JE
    [0x00A4] = 0xA4,  -- CURRENCY SIGN
    [0x0490] = 0xA5,  -- CYRILLIC CAPITAL LETTER GHE WITH UPTURN
    [0x00A6] = 0xA6,  -- BROKEN BAR
    [0x00A7] = 0xA7,  -- SECTION SIGN
    [0x0401] = 0xA8,  -- CYRILLIC CAPITAL LETTER IO
    [0x00A9] = 0xA9,  -- COPYRIGHT SIGN
    [0x0404] = 0xAA,  -- CYRILLIC CAPITAL LETTER UKRAINIAN IE
    [0x00AB] = 0xAB,  -- LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
    [0x00AC] = 0xAC,  -- NOT SIGN
    [0x00AD] = 0xAD,  -- SOFT HYPHEN
    [0x00AE] = 0xAE,  -- REGISTERED SIGN
    [0x0407] = 0xAF,  -- CYRILLIC CAPITAL LETTER YI
    [0x00B0] = 0xB0,  -- DEGREE SIGN
    [0x00B1] = 0xB1,  -- PLUS-MINUS SIGN
    [0x0406] = 0xB2,  -- CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
    [0x0456] = 0xB3,  -- CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
    [0x0491] = 0xB4,  -- CYRILLIC SMALL LETTER GHE WITH UPTURN
    [0x00B5] = 0xB5,  -- MICRO SIGN
    [0x00B6] = 0xB6,  -- PILCROW SIGN
    [0x00B7] = 0xB7,  -- MIDDLE DOT
    [0x0451] = 0xB8,  -- CYRILLIC SMALL LETTER IO
    [0x2116] = 0xB9,  -- NUMERO SIGN
    [0x0454] = 0xBA,  -- CYRILLIC SMALL LETTER UKRAINIAN IE
    [0x00BB] = 0xBB,  -- RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
    [0x0458] = 0xBC,  -- CYRILLIC SMALL LETTER JE
    [0x0405] = 0xBD,  -- CYRILLIC CAPITAL LETTER DZE
    [0x0455] = 0xBE,  -- CYRILLIC SMALL LETTER DZE
    [0x0457] = 0xBF,  -- CYRILLIC SMALL LETTER YI
    [0x0410] = 0xC0,  -- CYRILLIC CAPITAL LETTER A
    [0x0411] = 0xC1,  -- CYRILLIC CAPITAL LETTER BE
    [0x0412] = 0xC2,  -- CYRILLIC CAPITAL LETTER VE
    [0x0413] = 0xC3,  -- CYRILLIC CAPITAL LETTER GHE
    [0x0414] = 0xC4,  -- CYRILLIC CAPITAL LETTER DE
    [0x0415] = 0xC5,  -- CYRILLIC CAPITAL LETTER IE
    [0x0416] = 0xC6,  -- CYRILLIC CAPITAL LETTER ZHE
    [0x0417] = 0xC7,  -- CYRILLIC CAPITAL LETTER ZE
    [0x0418] = 0xC8,  -- CYRILLIC CAPITAL LETTER I
    [0x0419] = 0xC9,  -- CYRILLIC CAPITAL LETTER SHORT I
    [0x041A] = 0xCA,  -- CYRILLIC CAPITAL LETTER KA
    [0x041B] = 0xCB,  -- CYRILLIC CAPITAL LETTER EL
    [0x041C] = 0xCC,  -- CYRILLIC CAPITAL LETTER EM
    [0x041D] = 0xCD,  -- CYRILLIC CAPITAL LETTER EN
    [0x041E] = 0xCE,  -- CYRILLIC CAPITAL LETTER O
    [0x041F] = 0xCF,  -- CYRILLIC CAPITAL LETTER PE
    [0x0420] = 0xD0,  -- CYRILLIC CAPITAL LETTER ER
    [0x0421] = 0xD1,  -- CYRILLIC CAPITAL LETTER ES
    [0x0422] = 0xD2,  -- CYRILLIC CAPITAL LETTER TE
    [0x0423] = 0xD3,  -- CYRILLIC CAPITAL LETTER U
    [0x0424] = 0xD4,  -- CYRILLIC CAPITAL LETTER EF
    [0x0425] = 0xD5,  -- CYRILLIC CAPITAL LETTER HA
    [0x0426] = 0xD6,  -- CYRILLIC CAPITAL LETTER TSE
    [0x0427] = 0xD7,  -- CYRILLIC CAPITAL LETTER CHE
    [0x0428] = 0xD8,  -- CYRILLIC CAPITAL LETTER SHA
    [0x0429] = 0xD9,  -- CYRILLIC CAPITAL LETTER SHCHA
    [0x042A] = 0xDA,  -- CYRILLIC CAPITAL LETTER HARD SIGN
    [0x042B] = 0xDB,  -- CYRILLIC CAPITAL LETTER YERU
    [0x042C] = 0xDC,  -- CYRILLIC CAPITAL LETTER SOFT SIGN
    [0x042D] = 0xDD,  -- CYRILLIC CAPITAL LETTER E
    [0x042E] = 0xDE,  -- CYRILLIC CAPITAL LETTER YU
    [0x042F] = 0xDF,  -- CYRILLIC CAPITAL LETTER YA
    [0x0430] = 0xE0,  -- CYRILLIC SMALL LETTER A
    [0x0431] = 0xE1,  -- CYRILLIC SMALL LETTER BE
    [0x0432] = 0xE2,  -- CYRILLIC SMALL LETTER VE
    [0x0433] = 0xE3,  -- CYRILLIC SMALL LETTER GHE
    [0x0434] = 0xE4,  -- CYRILLIC SMALL LETTER DE
    [0x0435] = 0xE5,  -- CYRILLIC SMALL LETTER IE
    [0x0436] = 0xE6,  -- CYRILLIC SMALL LETTER ZHE
    [0x0437] = 0xE7,  -- CYRILLIC SMALL LETTER ZE
    [0x0438] = 0xE8,  -- CYRILLIC SMALL LETTER I
    [0x0439] = 0xE9,  -- CYRILLIC SMALL LETTER SHORT I
    [0x043A] = 0xEA,  -- CYRILLIC SMALL LETTER KA
    [0x043B] = 0xEB,  -- CYRILLIC SMALL LETTER EL
    [0x043C] = 0xEC,  -- CYRILLIC SMALL LETTER EM
    [0x043D] = 0xED,  -- CYRILLIC SMALL LETTER EN
    [0x043E] = 0xEE,  -- CYRILLIC SMALL LETTER O
    [0x043F] = 0xEF,  -- CYRILLIC SMALL LETTER PE
    [0x0440] = 0xF0,  -- CYRILLIC SMALL LETTER ER
    [0x0441] = 0xF1,  -- CYRILLIC SMALL LETTER ES
    [0x0442] = 0xF2,  -- CYRILLIC SMALL LETTER TE
    [0x0443] = 0xF3,  -- CYRILLIC SMALL LETTER U
    [0x0444] = 0xF4,  -- CYRILLIC SMALL LETTER EF
    [0x0445] = 0xF5,  -- CYRILLIC SMALL LETTER HA
    [0x0446] = 0xF6,  -- CYRILLIC SMALL LETTER TSE
    [0x0447] = 0xF7,  -- CYRILLIC SMALL LETTER CHE
    [0x0448] = 0xF8,  -- CYRILLIC SMALL LETTER SHA
    [0x0449] = 0xF9,  -- CYRILLIC SMALL LETTER SHCHA
    [0x044A] = 0xFA,  -- CYRILLIC SMALL LETTER HARD SIGN
    [0x044B] = 0xFB,  -- CYRILLIC SMALL LETTER YERU
    [0x044C] = 0xFC,  -- CYRILLIC SMALL LETTER SOFT SIGN
    [0x044D] = 0xFD,  -- CYRILLIC SMALL LETTER E
    [0x044E] = 0xFE,  -- CYRILLIC SMALL LETTER YU
    [0x044F] = 0xFF,  -- CYRILLIC SMALL LETTER YA
  }
amagalma is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 04:47 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.