Fix Arabic / complex-script text input (wide WNDCLASS + BMP-aware filters)#2574
Fix Arabic / complex-script text input (wide WNDCLASS + BMP-aware filters)#2574wahidkurdo wants to merge 3 commits intoTheSuperHackers:mainfrom
Conversation
The text-entry widgets fed every WM_CHAR through GameWindowManager::winIsAscii / winIsAlNum, which called iswascii / iswalnum and rejected everything above U+007F. Additionally the main window was created with the ANSI WNDCLASS / CreateWindow / DefWindowProc triplet, which transcodes WM_CHAR through the system ANSI codepage and corrupts non-Latin-1 input. This patch accepts printable BMP code points in winIsAscii / winIsAlNum and switches both Generals and GeneralsMD main windows to the wide (Unicode) class pipeline so WM_CHAR is delivered as UTF-16.
|
| Filename | Overview |
|---|---|
| Core/GameEngine/Source/GameClient/GUI/GameWindowGlobal.cpp | Widens winIsDigit/winIsAscii/winIsAlNum to accept non-ASCII BMP code points; winIsAlNum fallback at 0xA0 admits non-alphanumeric symbols including U+00A0 NON-BREAK SPACE. |
| Generals/Code/Main/WinMain.cpp | Switches window registration and creation to WNDCLASSW/RegisterClassW/CreateWindowW and replaces DefWindowProc with DefWindowProcW so WM_CHAR delivers UTF-16 code units; also includes whitespace/indentation cleanup. |
| GeneralsMD/Code/Main/WinMain.cpp | Same Unicode window-class pipeline change as Generals/Code/Main/WinMain.cpp applied to the Zero Hour main loop; formatting aligned with the Generals counterpart. |
Sequence Diagram
sequenceDiagram
participant KB as Keyboard / IME
participant OS as Windows Message Loop
participant WP as WndProc
participant DF as DefWindowProcW
participant FI as winIsAscii / winIsAlNum
participant TE as GadgetTextEntry buffer
Note over KB,TE: Before patch (ANSI window class)
KB->>OS: Arabic keypress
OS->>WP: WM_CHAR (ANSI codepage transcoded — garbled)
WP->>FI: c = garbled byte
FI-->>WP: 0 (filtered out)
WP->>DF: DefWindowProcA (ANSI)
Note over KB,TE: After patch (Unicode window class)
KB->>OS: Arabic keypress
OS->>WP: WM_CHAR (UTF-16 code point, e.g. 0x0645)
WP->>FI: c = 0x0645
FI-->>WP: 1 (c >= 0xA0, passes)
WP->>TE: Arabic character stored
WP->>DF: DefWindowProcW (Unicode)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: Core/GameEngine/Source/GameClient/GUI/GameWindowGlobal.cpp
Line: 267-268
Comment:
**`winIsAlNum` accepts non-alphanumeric BMP code points**
The `if (c >= 0xA0) return 1` guard accepts every non-control BMP code point above DEL, including non-alphanumeric ones: U+00A0 (NON-BREAK SPACE), U+00A2 (¢ CENT SIGN), U+00B7 (· MIDDLE DOT), currency/math symbols, etc. An `alphaNumericalOnly`-flagged widget would now silently admit these symbols and an invisible non-breaking space. If any such widget feeds a system that treats its output as truly alphanumeric (username uniqueness keys, server-side validation, etc.), these code points will slip through undetected.
Consider narrowing the fallback to Unicode letter/digit general categories (e.g. via `iswctype`) or at minimum excluding known non-alphanumeric ranges in the Latin-1 Supplement block (0xA0–0xBF), which are entirely punctuation and symbols.
How can I resolve this? If you propose a fix, please make it concise.Reviews (3): Last reviewed commit: "Extend winIsDigit to accept script-speci..." | Re-trigger Greptile
Addresses review feedback: the project style rule requires if/else/for/while bodies on their own line for debugger breakpoint placement.
|
Tip: Greploop — Automatically fix all review issues by running Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal. |
|
The aSCIIOnly flag is really just an input filter on the widget side its not doing anything load-bearing for security or the wire format. Strings in the engine are UnicodeString (UTF-16) all the way through, and the places that actually need ASCII (filesystem paths, the old GameSpy/WOL lobby name marshalling, map/script IDs) do their own ASCII check further down the pipeline. So widening winIsAscii doesnt punch through any of those guards, it just stops the widget from eating Arabic/CJK keystrokes. Youre right though that the flag name is misleading after this change. If youd rather keep aSCIIOnly strict, I can go two ways: Rename the widened version to winIsPrintableBMP and put the old strict check back into winIsAscii. Then only the text entries that actually need complex-script input get switched over. Smaller patch, no .wnd changes needed. |
Addresses review feedback: winIsDigit stayed on iswdigit, which in the default C locale only accepts ASCII 0-9. For consistency with the widened winIsAscii / winIsAlNum filters, also accept Arabic-Indic (U+0660-U+0669), Extended Arabic-Indic (U+06F0-U+06F9), Devanagari (U+0966-U+096F) and Bengali (U+09E6-U+09EF) digit ranges so digit-only text entry widgets work for users typing native numerals.
| #ifdef RTS_ENABLE_CRASHDUMP | ||
| #include "Common/MiniDumper.h" | ||
| #endif | ||
| #include "../OnlineServices_Init.h" |
There was a problem hiding this comment.
This patch appears to be optimized for the Generals Online repository fork.
There was a problem hiding this comment.
No changes. AI tools change only the position of the pointer.
There was a problem hiding this comment.
This is a GO file.
No changes. AI tools change only the position of the pointer.
This is not a file in the repository and the compiler (obviously) breaks on it.
| // case WM_MOUSEHOVER: return "WM_MOUSEHOVER"; | ||
| // case WM_MOUSELEAVE: return "WM_MOUSELEAVE"; | ||
| // case WM_MOUSEHOVER: return "WM_MOUSEHOVER"; | ||
| // case WM_MOUSELEAVE: return "WM_MOUSELEAVE"; |
There was a problem hiding this comment.
There are a lot of tab/space diffs
There was a problem hiding this comment.
This is typical AI slop and needs to be redone by the author
| if ( c >= 0x09E6 && c <= 0x09EF ) | ||
| return 1; | ||
| return 0; | ||
|
|
There was a problem hiding this comment.
Remove line before closing parenthesis
| // Bengali digits (U+09E6..U+09EF) | ||
| if ( c >= 0x09E6 && c <= 0x09EF ) | ||
| return 1; | ||
| return 0; |
There was a problem hiding this comment.
Add line break between return 1; and return 0; for readability
| * script-specific decimal digit ranges (Arabic-Indic, Extended | ||
| * Arabic-Indic, Devanagari, Bengali, etc.) so that digit-only text | ||
| * entry widgets work for users typing native numerals. | ||
| */ |
There was a problem hiding this comment.
request a change please if you want to correct something
There was a problem hiding this comment.
request a change please if you want to correct something
As this is your first contribution to this repository and your fairly empty profile, you may not be completely familiar with how things commonly work on github
- You open a PR if you want to make a change to the code, e.g. to fix a bug or add a feature
- As an author, you must ensure that the code meets the standards of the repository
- This is particularly applicable if you use an AI assistant. You need to review the code of the AI first before putting it up for review
- All changes to the code must be in scope. This includes whitespaces, tabs and new lines.
- Once you open a PR, you provide a good description on what you have done and why you want to make the change. If it builds on an open issue, the issue needs to be linked.
- One or more reviewers will review your code and will provide feedback.
- You as the author must pick up on the feedback. Reviewers in general will to provide full solution or changes but steer you in the direction that would improve the code or solution.
- Once you have resolved all the feedback/comments and committed these changes, you can rerequest a review. This is an iterative process - new comments may appear.
- Once the reviewer(s) are happy, they will approve the PR, after which it will be merged with the main code. In this particular repository that may take a few days.
| // case WM_MOUSEHOVER: return "WM_MOUSEHOVER"; | ||
| // case WM_MOUSELEAVE: return "WM_MOUSELEAVE"; | ||
| // case WM_MOUSEHOVER: return "WM_MOUSEHOVER"; | ||
| // case WM_MOUSELEAVE: return "WM_MOUSELEAVE"; |
There was a problem hiding this comment.
This is typical AI slop and needs to be redone by the author
| #ifdef RTS_ENABLE_CRASHDUMP | ||
| #include "Common/MiniDumper.h" | ||
| #endif | ||
| #include "../OnlineServices_Init.h" |
There was a problem hiding this comment.
This is a GO file.
No changes. AI tools change only the position of the pointer.
This is not a file in the repository and the compiler (obviously) breaks on it.
Problem
Typing Arabic (or any non-Latin script) into in-game text fields produced either nothing or Latin-1 gibberish, depending on the system ANSI codepage.
Two independent bugs were responsible:
GameWindowManager::winIsAsciiandwinIsAlNumusediswascii/iswalnum, which reject everything >= 0x80. Every non-ASCII character got filtered out before reaching the text-entry buffer.The main window was registered and created with the ANSI variants (
WNDCLASS,RegisterClass,CreateWindow,DefWindowProc). Windows then transcodesWM_CHARthrough the system ANSI codepage, so on a machine whose ANSI CP is not Windows-1256, Arabic characters are mangled before the game ever sees them.Fix
winIsAscii/winIsAlNum.GeneralsandGeneralsMDmain loops soWM_CHARis delivered as UTF-16.Scope
Pure client-side fix, no asset or protocol changes. Affects both Generals and Zero Hour. No behavioural change for users who only type ASCII.
Testing
Built both targets locally. Verified that Arabic characters typed into lobby/chat widgets are now accepted and stored correctly.
Rendering of complex-script shaping (RTL / ligatures) is a separate concern and not addressed here — this PR is only about getting the code points into the buffer in the first place.