resourcestring data .rsj files in case the source file is interpreted as
UTF-8. Previously, the individual UTF-8 bytes were each stored in a
separate widechar in the Json file (mantis #28717)
* due to the fact that rstconv didn't use the cwstring unit on Unix, rstconv
until now just concatenated the bytes stored in the widechars of the Json
file on those platforms, i.e., the strings put in the resource file were
byte for byte equal to what was in the source file. On Windows, these bytes
were interpreted as individual widechars, converted to the
DefaultSystemCodePage and then written. This means that for anything but
ISO-8859-1 (where every widechar from #0000 to #0255 maps to #0 to #255),
the output got corrupted.
In order to keep compatibility with the old behaviour whereby rstconv wrote
the resource strings using the same encoding as in the source file (except
if the data got completely corrupted, in which case compatibility is
useless), we now store all resourcestrings twice in the .rsj file: once as
the exact byte sequence from the source file, and once (properly) encoded
in UTF-16.
By default, rstconv will use the byte string and just write that one to the
resource file. Additionally, there is a new -p option that accepts a code
page name (see rstconv -h for the list of supported names), which can be
used to make rstconv use the UTF-16 version and convert that to the desired
code page (as long as the system on which rstconv runs supports that
codepage).
And this also finally resolves mantis #6477.
git-svn-id: trunk@31881 -
o support for the new codepage-aware ansistrings in the jvm branch
o empty ansistrings are now always represented by a nil pointer rather than
by an empty string, because an empty string also has a code page which
can confuse code (although this will make ansistrings harder to use
in Java code)
o more string helpers code shared between the general and jvm rtl
o support for indexbyte/word in the jvm rtl (warning: first parameter
is an open array rather than an untyped parameter there, so
indexchar(pcharvar^,10,0) will be equivalent to
indexchar[pcharvar^],10,0) there, which is different from what is
intended; changing it to an untyped parameter wouldn't help though)
o default() support is not yet complete
o calling fpcres is currently broken due to limitations in
sysutils.executeprocess() regarding handling unix quoting and
the compiler using the same command lines for scripts and directly
calling external programs
o compiling the Java compiler currently requires adding ALLOW_WARNINGS=1
to the make command line
git-svn-id: branches/jvmbackend@20887 -
defcmp: Address code paged' string type comparison taking care of the code page
ncnv: Remove un-needed code page comparison to CP_UTF8, some fixes regarding shortstrings and wide char/string
ncon: For the case of tstringconstnode.changestringtype (ncon.pas) where the code page are of CP_NONE or 0 no translation is done as :
* CP_NONE is compatible to all
* For 0 the raw bytes are just copied.
My changes:
- change ascii2unicode to allow pass source codepage,
- convert in both cases when source or destination is UTF8
git-svn-id: trunk@19457 -
o support for ansistring constants. It's done via a detour because the
JVM only supports UTF-16 string constants (no array of byte or anything
like that): store every ansicharacter in the lower 8 bits of an
UTF-16 constant string, and at run time copy the characters to an
ansistring. The alternative is to generate code that stores every
character separately to an array.
o the base ansistring support is implemented in a class called
AnsistringClass, and an ansistring is simply an instance of this
class under the hood
o the compiler currently does generate nil pointers as empty
ansistrings unlike for unicodestrings, where we always
explicitly generate an empty string. The reason is that
unicodestrings are the same as JLString and hence common
for Java interoperation, while ansistrings are unlikely to
be used in interaction with external Java code
* fixed indentation
git-svn-id: branches/jvmbackend@18562 -
(widechar<->char, widechar<>*string), based on patch from
Rimgaudas Laucius (mantis #7758)
* no longer perform compile-time widechar/string->char/ansi/
shortstring conversions if they would destroy information
(they can't cope with widechars with ord>=128). This means
that you can now properly constant widechars/widestrings
in source code with a {$codepage } set without risking that
the compiler will mangle everything afterwards
* support ESysEINVAL return code from iconv (happens if last
multibyte char is incomplete)
* fixed writing of widechars (were converted to char -> lost
information)
git-svn-id: trunk@8274 -