UTF8 on LC server

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

UTF8 on LC server

Niggemann, Bernd via use-livecode
Hi,

In LC, if I have a field or variable in Japanese (double-byte)
and get, say, the 5th character, it returns the correct
double-byte character. But on LC server (on-rev hosting)
"character" commands get single bytes, breaking the Japanese
character turning it into gibberish.

Is there any way to get LC Server to handle double-byte
characters the same way LC desktop does?

Tim Selander
Tokyo, Japan

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
On 05/31/2018 06:43 AM, Tim Selander via use-livecode wrote:
> Is there any way to get LC Server to handle double-byte characters the
> same way LC desktop does?
>
> Tim Selander
> Tokyo, Japan

LC Server serves pages with a default "Content-Type" header of:

         Content-Type: text/html; charset=iso-8859-1

That would prevent the display of Japanese characters.

Try putting:

         put header "Content-Type: text/html; charset=utf-8"

at the top of your lc pages after the <?lc and before any other content.

See if this solves your problem.

Good luck,

Warren

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
Thanks, Warren.

Yes, I've got that header set up, and UTF8 is working fine, pages look
great. But LC server is not handling character chunking in variables the
same way as LC desktop. In desktop, I can say "put char 1 of variable1"
and I get a Japanese kanji. In LC Server, I only get half a kanji.
"Word" chunks are also not working. Items and lines are OK.

I never got the hange of all the encodes and decodes needed for Japanese
in LC 6 and earlier... but does LC server require those kinds of text
manipulations?

Tim Selander
Tokyo, Japan

On 2018/06/01 6:43, Warren Samples via use-livecode wrote:

> On 05/31/2018 06:43 AM, Tim Selander via use-livecode wrote:
>> Is there any way to get LC Server to handle double-byte characters
>> the same way LC desktop does?
>>
>> Tim Selander
>> Tokyo, Japan
>
> LC Server serves pages with a default "Content-Type" header of:
>
>         Content-Type: text/html; charset=iso-8859-1
>
> That would prevent the display of Japanese characters.
>
> Try putting:
>
>         put header "Content-Type: text/html; charset=utf-8"
>
> at the top of your lc pages after the <?lc and before any other content.
>
> See if this solves your problem.
>
> Good luck,
>
> Warren
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode


> On May 31, 2018, at 4:33 PM, Tim Selander via use-livecode <[hidden email]> wrote:
>
> Thanks, Warren.
>
> Yes, I've got that header set up, and UTF8 is working fine, pages look great. But LC server is not handling character chunking in variables the same way as LC desktop. In desktop, I can say "put char 1 of variable1" and I get a Japanese kanji. In LC Server, I only get half a kanji. "Word" chunks are also not working. Items and lines are OK.

Yep, “char" is no longer the thing to use. Use “codepoint”.
        put codepoint 1 of variable1

Kee Nethery


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
Thanks, Kee.

Actually, I had found the reference to codepoint in the dictionary and
tried it. But it seems to work the same as character -- breaking kanji.
My test code is "put codepoint 500 to 550 of variable1" and the webpage
shows: �。こうして夕があり、朝があった。�  The beginning and ending
kanji got split in half. Identical results to "put char 500 to 550."

Tim Selander


On 2018/06/01 8:39, kee nethery via use-livecode wrote:

>
>> On May 31, 2018, at 4:33 PM, Tim Selander via use-livecode <[hidden email]> wrote:
>>
>> Thanks, Warren.
>>
>> Yes, I've got that header set up, and UTF8 is working fine, pages look great. But LC server is not handling character chunking in variables the same way as LC desktop. In desktop, I can say "put char 1 of variable1" and I get a Japanese kanji. In LC Server, I only get half a kanji. "Word" chunks are also not working. Items and lines are OK.
> Yep, “char" is no longer the thing to use. Use “codepoint”.
> put codepoint 1 of variable1
>
> Kee Nethery
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
I’m assuming you are using “unicode” (aka UTF-16) and not UTF8 to do all your transforming of the data?
Kee

> On May 31, 2018, at 4:46 PM, Tim Selander via use-livecode <[hidden email]> wrote:
>
> Thanks, Kee.
>
> Actually, I had found the reference to codepoint in the dictionary and tried it. But it seems to work the same as character -- breaking kanji. My test code is "put codepoint 500 to 550 of variable1" and the webpage shows: �。こうして夕があり、朝があった。�  The beginning and ending kanji got split in half. Identical results to "put char 500 to 550."
>
> Tim Selander

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
In reply to this post by Niggemann, Bernd via use-livecode
Hi Tim,

which version of LC engine are you using on the LC Server ?

( and which version on the desktop? )

Alex.


On 01/06/2018 00:33, Tim Selander via use-livecode wrote:

> Thanks, Warren.
>
> Yes, I've got that header set up, and UTF8 is working fine, pages look
> great. But LC server is not handling character chunking in variables
> the same way as LC desktop. In desktop, I can say "put char 1 of
> variable1" and I get a Japanese kanji. In LC Server, I only get half a
> kanji. "Word" chunks are also not working. Items and lines are OK.
>
> I never got the hange of all the encodes and decodes needed for
> Japanese in LC 6 and earlier... but does LC server require those kinds
> of text manipulations?
>
> Tim Selander
> Tokyo, Japan
>
> On 2018/06/01 6:43, Warren Samples via use-livecode wrote:
>> On 05/31/2018 06:43 AM, Tim Selander via use-livecode wrote:
>>> Is there any way to get LC Server to handle double-byte characters
>>> the same way LC desktop does?
>>>
>>> Tim Selander
>>> Tokyo, Japan
>>
>> LC Server serves pages with a default "Content-Type" header of:
>>
>>         Content-Type: text/html; charset=iso-8859-1
>>
>> That would prevent the display of Japanese characters.
>>
>> Try putting:
>>
>>         put header "Content-Type: text/html; charset=utf-8"
>>
>> at the top of your lc pages after the <?lc and before any other content.
>>
>> See if this solves your problem.
>>
>> Good luck,
>>
>> Warren
>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
In reply to this post by Niggemann, Bernd via use-livecode
Hi Kee and Alex,

The original documents I'm working with are UTF8, so that's that I've
been using. So converting them to UTF16 is recommended? I'll try that.

Alex, desktop is version 8 something, and the server is the one
installed on the on-rev host; can't remember what the key in $_Server
for than info is, and Googling failed me this time...

Tim Selander

On 2018/06/01 8:55, kee nethery via use-livecode wrote:

> I’m assuming you are using “unicode” (aka UTF-16) and not UTF8 to do all your transforming of the data?
> Kee
>
>> On May 31, 2018, at 4:46 PM, Tim Selander via use-livecode <[hidden email]> wrote:
>>
>> Thanks, Kee.
>>
>> Actually, I had found the reference to codepoint in the dictionary and tried it. But it seems to work the same as character -- breaking kanji. My test code is "put codepoint 500 to 550 of variable1" and the webpage shows: �。こうして夕があり、朝があった。�  The beginning and ending kanji got split in half. Identical results to "put char 500 to 550."
>>
>> Tim Selander
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
on-rev host is 7.0.1  you can get it with version()

On Thu, May 31, 2018 at 6:14 PM, Tim Selander via use-livecode <
[hidden email]> wrote:

> Hi Kee and Alex,
>
> The original documents I'm working with are UTF8, so that's that I've been
> using. So converting them to UTF16 is recommended? I'll try that.
>
> Alex, desktop is version 8 something, and the server is the one installed
> on the on-rev host; can't remember what the key in $_Server for than info
> is, and Googling failed me this time...
>
> Tim Selander
>
> On 2018/06/01 8:55, kee nethery via use-livecode wrote:
>
>> I’m assuming you are using “unicode” (aka UTF-16) and not UTF8 to do all
>> your transforming of the data?
>> Kee
>>
>> On May 31, 2018, at 4:46 PM, Tim Selander via use-livecode <
>>> [hidden email]> wrote:
>>>
>>> Thanks, Kee.
>>>
>>> Actually, I had found the reference to codepoint in the dictionary and
>>> tried it. But it seems to work the same as character -- breaking kanji. My
>>> test code is "put codepoint 500 to 550 of variable1" and the webpage shows:
>>> �。こうして夕があり、朝があった。�  The beginning and ending kanji got split in half.
>>> Identical results to "put char 500 to 550."
>>>
>>> Tim Selander
>>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode


On 01/06/2018 01:17, Mike Bonner via use-livecode wrote:
> on-rev host is 7.0.1  you can get it with version()
Hmmm - I get 7.1.0 (on sage).

And that might just be the problem with Unicode handling .... I think it
improved in LC v 8 upwards,  but that's an area I know nothing about.

You can, I believe, ask on-rev support to upgrade you to a modern
version for each domain (or install your own copy of LC Server.

I've been tempted to submit an on-rev *bug* report that they are still
using an obsolete, no-longer-supported version of LC Server. I really
don't understand why they don't upgrade the default to at least 8.x
STABLE. Perhaps preferably 9.x STABLE - if they really believe 9.x has a
STABLE release.


-- Alex.

> On Thu, May 31, 2018 at 6:14 PM, Tim Selander via use-livecode <
> [hidden email]> wrote:
>
>> Hi Kee and Alex,
>>
>> The original documents I'm working with are UTF8, so that's that I've been
>> using. So converting them to UTF16 is recommended? I'll try that.
>>
>> Alex, desktop is version 8 something, and the server is the one installed
>> on the on-rev host; can't remember what the key in $_Server for than info
>> is, and Googling failed me this time...
>>
>> Tim Selander
>>
>> On 2018/06/01 8:55, kee nethery via use-livecode wrote:
>>
>>> I’m assuming you are using “unicode” (aka UTF-16) and not UTF8 to do all
>>> your transforming of the data?
>>> Kee
>>>
>>> On May 31, 2018, at 4:46 PM, Tim Selander via use-livecode <
>>>> [hidden email]> wrote:
>>>>
>>>> Thanks, Kee.
>>>>
>>>> Actually, I had found the reference to codepoint in the dictionary and
>>>> tried it. But it seems to work the same as character -- breaking kanji. My
>>>> test code is "put codepoint 500 to 550 of variable1" and the webpage shows:
>>>> �。こうして夕があり、朝があった。�  The beginning and ending kanji got split in half.
>>>> Identical results to "put char 500 to 550."
>>>>
>>>> Tim Selander
>>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> [hidden email]
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
oops, dyslexic typo on my part, sorry about that.

On Thu, May 31, 2018 at 6:33 PM, Alex Tweedly via use-livecode <
[hidden email]> wrote:

>
>
> On 01/06/2018 01:17, Mike Bonner via use-livecode wrote:
>
>> on-rev host is 7.0.1  you can get it with version()
>>
> Hmmm - I get 7.1.0 (on sage).
>
> And that might just be the problem with Unicode handling .... I think it
> improved in LC v 8 upwards,  but that's an area I know nothing about.
>
> You can, I believe, ask on-rev support to upgrade you to a modern version
> for each domain (or install your own copy of LC Server.
>
> I've been tempted to submit an on-rev *bug* report that they are still
> using an obsolete, no-longer-supported version of LC Server. I really don't
> understand why they don't upgrade the default to at least 8.x STABLE.
> Perhaps preferably 9.x STABLE - if they really believe 9.x has a STABLE
> release.
>
>
> -- Alex.
>
> On Thu, May 31, 2018 at 6:14 PM, Tim Selander via use-livecode <
>> [hidden email]> wrote:
>>
>> Hi Kee and Alex,
>>>
>>> The original documents I'm working with are UTF8, so that's that I've
>>> been
>>> using. So converting them to UTF16 is recommended? I'll try that.
>>>
>>> Alex, desktop is version 8 something, and the server is the one installed
>>> on the on-rev host; can't remember what the key in $_Server for than info
>>> is, and Googling failed me this time...
>>>
>>> Tim Selander
>>>
>>> On 2018/06/01 8:55, kee nethery via use-livecode wrote:
>>>
>>> I’m assuming you are using “unicode” (aka UTF-16) and not UTF8 to do all
>>>> your transforming of the data?
>>>> Kee
>>>>
>>>> On May 31, 2018, at 4:46 PM, Tim Selander via use-livecode <
>>>>
>>>>> [hidden email]> wrote:
>>>>>
>>>>> Thanks, Kee.
>>>>>
>>>>> Actually, I had found the reference to codepoint in the dictionary and
>>>>> tried it. But it seems to work the same as character -- breaking
>>>>> kanji. My
>>>>> test code is "put codepoint 500 to 550 of variable1" and the webpage
>>>>> shows:
>>>>> �。こうして夕があり、朝があった。�  The beginning and ending kanji got split in half.
>>>>> Identical results to "put char 500 to 550."
>>>>>
>>>>> Tim Selander
>>>>>
>>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> [hidden email]
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> [hidden email]
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
In reply to this post by Niggemann, Bernd via use-livecode
On 2018-06-01 02:14, Tim Selander via use-livecode wrote:
> Hi Kee and Alex,
>
> The original documents I'm working with are UTF8, so that's that I've
> been using. So converting them to UTF16 is recommended? I'll try that.
>
> Alex, desktop is version 8 something, and the server is the one
> installed on the on-rev host; can't remember what the key in $_Server
> for than info is, and Googling failed me this time...

You should be fine using 'character' on any unicode text - it uses the
Unicode grapheme (specific name of 'character's as human's 'think' of
'character's) breaking rules to find the boundaries.

That being said, I think codepoint (from memory) should also be okay on
Japanese text as I don't think the Japanese/Chinese scripts have any
multi-codepoint characters - they just use codepoints with value > 65535
for less used ideographs (the 'supplementary plane'). [ Korean script
can be encoded with Hangul, which *does* require the use of character as
a single Korean Hangul ideograph can be composed of up to three
codepoints ].

The fact it is breaking on Japanese text in the way you suggest makes me
think you aren't textDecode()'ing your UTF-8 input files:

e.g.
    put textDecode(url ("binfile:<pathtofile>"), "utf-8") into tText

Without decoding as utf-8, the engine will thing your file is 'native'
(single-byte encoded), so each byte of the file will be seen as a
separate character.

Internally the engine uses either single-byte or double-byte encodings
for strings (the latter being UTF-16) - which is not user-visible, you
just need to make sure that incoming data is decoded correctly.

Can you share the code you are using to read in the text data and code
which is breaking on server?

Warmest Regards,

Mark.

P.S. 'word' in LC is still any sequence of non-space characters
separated by spaces, or any sequence of characters delimited by quotes -
it takes no account of the script of the text, nor actual
word-boundaries. If you want human-style word boundaries then you should
use trueWord (which uses the standard Unicode word breaking rules).

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
Hi Mark,

Here is the script. The files I'm using are
bamboobabies.com/getjapanesetext.lc, and the text it is getting
is bamboobabies.com/news.txt.

In the script, there are two lines reading the text file that
I've taken turns commenting out....

If you can give me any hints, it would be greatly appreciated.

Tim Selander


<?lc put header "Content-Type: text/html; charset=UTF-8" ?>
<!DOCTYPE HTML>
<html>
     <head>
         <meta http-equiv="Content-type" content="text/html;
charset=UTF8">
         <title>workbench</title>
     </head>
<body>

<?lc
--This line loads readable japanese text, but putting char 500 to
550 breaks beginning and ending kanji
put url "http://bamboobabies.com/news.txt" into vText

--When this line is used, none of the put text is readable
--put textDecode(url "binfile:bamboobabies.com/news.txt",
"utf-8") into vText

put line 1 of vText

put "<BR><BR><BR><BR>"

put char 500 to 550 of vText
  ?>
</body>
</html>




On 2018.06.01 16:17, Mark Waddingham via use-livecode wrote:

> You should be fine using 'character' on any unicode text - it
> uses the Unicode grapheme (specific name of 'character's as
> human's 'think' of 'character's) breaking rules to find the
> boundaries.
>
> That being said, I think codepoint (from memory) should also be
> okay on Japanese text as I don't think the Japanese/Chinese
> scripts have any multi-codepoint characters - they just use
> codepoints with value > 65535 for less used ideographs (the
> 'supplementary plane'). [ Korean script can be encoded with
> Hangul, which *does* require the use of character as a single
> Korean Hangul ideograph can be composed of up to three codepoints ].
>
> The fact it is breaking on Japanese text in the way you suggest
> makes me think you aren't textDecode()'ing your UTF-8 input files:
>
> e.g.
>     put textDecode(url ("binfile:<pathtofile>"), "utf-8") into tText
>
> Without decoding as utf-8, the engine will thing your file is
> 'native' (single-byte encoded), so each byte of the file will be
> seen as a separate character.
>
> Internally the engine uses either single-byte or double-byte
> encodings for strings (the latter being UTF-16) - which is not
> user-visible, you just need to make sure that incoming data is
> decoded correctly.
>
> Can you share the code you are using to read in the text data and
> code which is breaking on server?
>
> Warmest Regards,
>
> Mark.
>
> P.S. 'word' in LC is still any sequence of non-space characters
> separated by spaces, or any sequence of characters delimited by
> quotes - it takes no account of the script of the text, nor
> actual word-boundaries. If you want human-style word boundaries
> then you should use trueWord (which uses the standard Unicode
> word breaking rules).
>

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
On 2018-06-01 12:53, Tim Selander via use-livecode wrote:

> Hi Mark,
>
> Here is the script. The files I'm using are
> bamboobabies.com/getjapanesetext.lc, and the text it is getting is
> bamboobabies.com/news.txt.
>
> In the script, there are two lines reading the text file that I've
> taken turns commenting out....
>
> If you can give me any hints, it would be greatly appreciated.
>
> Tim Selander
>
>
> <?lc put header "Content-Type: text/html; charset=UTF-8" ?>
> <!DOCTYPE HTML>
> <html>
>     <head>
>         <meta http-equiv="Content-type" content="text/html;
> charset=UTF8">
>         <title>workbench</title>
>     </head>
> <body>
>
> <?lc
> --This line loads readable japanese text, but putting char 500 to 550
> breaks beginning and ending kanji
> put url "http://bamboobabies.com/news.txt" into vText
>
> --When this line is used, none of the put text is readable
> --put textDecode(url "binfile:bamboobabies.com/news.txt", "utf-8") into
> vText
>
> put line 1 of vText
>
> put "<BR><BR><BR><BR>"
>
> put char 500 to 550 of vText
>  ?>
> </body>
> </html>

Try this:

<?lc set the outputTextEncoding to "utf-8" ?>
<?lc put header "Content-Type: text/html; charset=UTF-8" ?>
<!DOCTYPE HTML>
<html>
     <head>
         <meta http-equiv="Content-type" content="text/html;
charset=UTF8">
         <title>workbench</title>
     </head>
<body>
<?lc
--This line loads readable japanese text, but putting char 500 to 550
breaks beginning and ending kanji
put textDecode(url "http://bamboobabies.com/news.txt", "utf-8") into
vText

put line 1 of vText

put "<BR><BR><BR><BR>"

put char 500 to 550 of vText
  ?>
</body>
</html>

The problem you are having is that your text-file is UTF-8, but the
engine doesn't know that - you need to explicit decode it into a
LiveCode string using textDecode. You can then manipulate it as chars
etc. correctly with Unicode. That solves the 'getting data into livecode
in the form needed' problem.

The other side of the problem is the text encoding used when you do
'put'. By default this is 'native' - by setting the outputTextEncoding
at the start, the engine will automatically encode any strings you 'put'
with the encoding specified.

Hope this helps!

Mark.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
also just fyi ...if u are encoding arrays and u need the character
handling, you need the extra parameter .... arrayencode(myarray,"7.0")



On Fri, Jun 1, 2018 at 7:15 AM, Mark Waddingham via use-livecode <
[hidden email]> wrote:

> On 2018-06-01 12:53, Tim Selander via use-livecode wrote:
>
>> Hi Mark,
>>
>> Here is the script. The files I'm using are
>> bamboobabies.com/getjapanesetext.lc, and the text it is getting is
>> bamboobabies.com/news.txt.
>>
>> In the script, there are two lines reading the text file that I've
>> taken turns commenting out....
>>
>> If you can give me any hints, it would be greatly appreciated.
>>
>> Tim Selander
>>
>>
>> <?lc put header "Content-Type: text/html; charset=UTF-8" ?>
>> <!DOCTYPE HTML>
>> <html>
>>     <head>
>>         <meta http-equiv="Content-type" content="text/html; charset=UTF8">
>>         <title>workbench</title>
>>     </head>
>> <body>
>>
>> <?lc
>> --This line loads readable japanese text, but putting char 500 to 550
>> breaks beginning and ending kanji
>> put url "http://bamboobabies.com/news.txt" into vText
>>
>> --When this line is used, none of the put text is readable
>> --put textDecode(url "binfile:bamboobabies.com/news.txt", "utf-8") into
>> vText
>>
>> put line 1 of vText
>>
>> put "<BR><BR><BR><BR>"
>>
>> put char 500 to 550 of vText
>>  ?>
>> </body>
>> </html>
>>
>
> Try this:
>
> <?lc set the outputTextEncoding to "utf-8" ?>
> <?lc put header "Content-Type: text/html; charset=UTF-8" ?>
> <!DOCTYPE HTML>
> <html>
>     <head>
>         <meta http-equiv="Content-type" content="text/html; charset=UTF8">
>         <title>workbench</title>
>     </head>
> <body>
> <?lc
> --This line loads readable japanese text, but putting char 500 to 550
> breaks beginning and ending kanji
> put textDecode(url "http://bamboobabies.com/news.txt", "utf-8") into vText
>
> put line 1 of vText
>
> put "<BR><BR><BR><BR>"
>
> put char 500 to 550 of vText
>  ?>
> </body>
> </html>
>
> The problem you are having is that your text-file is UTF-8, but the engine
> doesn't know that - you need to explicit decode it into a LiveCode string
> using textDecode. You can then manipulate it as chars etc. correctly with
> Unicode. That solves the 'getting data into livecode in the form needed'
> problem.
>
> The other side of the problem is the text encoding used when you do 'put'.
> By default this is 'native' - by setting the outputTextEncoding at the
> start, the engine will automatically encode any strings you 'put' with the
> encoding specified.
>
> Hope this helps!
>
> Mark.
>
> --
> Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
> LiveCode: Everyone can create apps
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: UTF8 on LC server

Niggemann, Bernd via use-livecode
Mark,

Success!  Greatly appreciate your walking me through this.

Have a great weekend.

Tim Selander
Tokyo, Japan


On Fri, Jun 1, 2018 at 7:15 AM, Mark Waddingham via use-livecode <

> [hidden email]> wrote:
>
>> On 2018-06-01 12:53, Tim Selander via use-livecode wrote:
>>
>>> Hi Mark,
>>>
>>> Here is the script. The files I'm using are
>>> bamboobabies.com/getjapanesetext.lc, and the text it is getting is
>>> bamboobabies.com/news.txt.
>>>
>>> In the script, there are two lines reading the text file that I've
>>> taken turns commenting out....
>>>
>>> If you can give me any hints, it would be greatly appreciated.
>>>
>>> Tim Selander
>>>
>>>
>>> <?lc put header "Content-Type: text/html; charset=UTF-8" ?>
>>> <!DOCTYPE HTML>
>>> <html>
>>>      <head>
>>>          <meta http-equiv="Content-type" content="text/html; charset=UTF8">
>>>          <title>workbench</title>
>>>      </head>
>>> <body>
>>>
>>> <?lc
>>> --This line loads readable japanese text, but putting char 500 to 550
>>> breaks beginning and ending kanji
>>> put url "http://bamboobabies.com/news.txt" into vText
>>>
>>> --When this line is used, none of the put text is readable
>>> --put textDecode(url "binfile:bamboobabies.com/news.txt", "utf-8") into
>>> vText
>>>
>>> put line 1 of vText
>>>
>>> put "<BR><BR><BR><BR>"
>>>
>>> put char 500 to 550 of vText
>>>   ?>
>>> </body>
>>> </html>
>>>
>>
>> Try this:
>>
>> <?lc set the outputTextEncoding to "utf-8" ?>
>> <?lc put header "Content-Type: text/html; charset=UTF-8" ?>
>> <!DOCTYPE HTML>
>> <html>
>>      <head>
>>          <meta http-equiv="Content-type" content="text/html; charset=UTF8">
>>          <title>workbench</title>
>>      </head>
>> <body>
>> <?lc
>> --This line loads readable japanese text, but putting char 500 to 550
>> breaks beginning and ending kanji
>> put textDecode(url "http://bamboobabies.com/news.txt", "utf-8") into vText
>>
>> put line 1 of vText
>>
>> put "<BR><BR><BR><BR>"
>>
>> put char 500 to 550 of vText
>>   ?>
>> </body>
>> </html>
>>
>> The problem you are having is that your text-file is UTF-8, but the engine
>> doesn't know that - you need to explicit decode it into a LiveCode string
>> using textDecode. You can then manipulate it as chars etc. correctly with
>> Unicode. That solves the 'getting data into livecode in the form needed'
>> problem.
>>
>> The other side of the problem is the text encoding used when you do 'put'.
>> By default this is 'native' - by setting the outputTextEncoding at the
>> start, the engine will automatically encode any strings you 'put' with the
>> encoding specified.
>>
>> Hope this helps!
>>
>> Mark.
>>
>> --
>> Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
>> LiveCode: Everyone can create apps
>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode