Re: Decoding "quoted-printable" -- Help needed -- Reopened

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Decoding "quoted-printable" -- Help needed -- Reopened

Brian Milby via use-livecode
Oh, sorry, I was too quick declaring a solution.

Even though the code of the function works fine, the result also converts
back, but the "quoted-printable" or "UTF-8" code expects that each
codepoint is encoded in Hex with just two ASCII letters representing a
codepoint.

For example, for the Euro symbol "€" we have three codepoints.
The function below converts to "=E2=201A=AC" while it must be "=E2=82=AC".
The "=" sign is just a delimiter in quoted-printable.

Now, I do not know what is wrong in my thinking as I am not getting quite
the same results.
(The result is ok for other symbols such as 'ü'.)

EXAMPLE:

put "€" into tChar
       // First encode to UTF-8:
put textEncode(tChar,"UTF-8") into tCodedChar
       // Repeat for each codepoint in the UTF-8 char
repeat for each codePoint tCodePoint in tCodedChar
       // Encode each codepoint to its integer expression and convert to
Hex value:
      put "="& BaseConvert ( codePointToNum (tCodePoint) , 10 , 16 ) after
tEncoded
end repeat
put tEncoded into field "Show Codepoints" -- Expected ASCII representing
Hex numbers
-- Result: "=E2=201A=AC" -- Instead of "=E2=82=AC" , but valid and working.

The actual "correct" UTF-8 result can be tested here:
http://www.endmemo.com/unicode/unicodeconverter.php

What am I missing?

Thanks a lot
Roland
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Decoding "quoted-printable" -- Help needed -- Reopened - Solved 2nd

Brian Milby via use-livecode
I am very sorry that I am overstressing this list. I keep on answering my
own questions.

The function needs to address bytes. I found this looking at some similar
C# code:

# Code snippet from C#
# Source:
https://stackoverflow.com/questions/32083334/consecutive-control-characters-in-quoted-printable-not-decoding-correctly
---
string sHex = input;
sHex = sHex.Substring(i + 1, 2);
int hex = Convert.ToInt32(sHex, 16);
byte b = Convert.ToByte(hex);
output.Add(b);
i += 3;
---

I oversaw that the value must be a byte value. Anyway, that is all new to
me.
So, the correct and tested converting to and from "quoted-printable" with
encoded UTF8 in LiveCode >7 is:

---
local tChar
local tItem
local tCodedChar
local tCodePoint
local tEncoded
local tDecoded

set the itemdelimiter to "="

// ENCODE EXAMPLE
put "€" into tChar
put textEncode ( tChar , "UTF-8" ) into tCodedChar
repeat for each codePoint tCodePoint in tCodedChar
      put "="& baseConvert ( byteToNum ( tCodePoint ) , 10 , 16 ) after
tEncoded
end repeat
put tEncoded into msg --->  "=E2=82=AC" - the quoted-printable UFT-8
encoding of the Euro symbol "€"

// DECODE EXAMPLE
put "=E2=82=AC" into tEncoded
delete char 1 of tEncoded
repeat for each item tItem in tEncoded
      put numToByte ( BaseConvert ( tItem , 16 , 10 ) ) after tDecoded
end repeat
put textDecode ( tDecoded , "UTF-8" ) into msg --> the Euro symbol "€"
---

Thanks to all.

Given a bit of time, I will post a solution for UTF8 quoted-printable
encoded E-Mail blocks of text in the Forum.

Roland


---

Am Do., 14. Nov. 2019 um 20:41 Uhr schrieb R.H. <[hidden email]
>:
>
> Oh, sorry, I was too quick declaring a solution.
>
> Even though the code of the function works fine, the result also converts
back, but the "quoted-printable" or "UTF-8" code expects that each
codepoint is encoded in Hex with just two ASCII letters representing a
codepoint.
>
> For example, for the Euro symbol "€" we have three codepoints.
> The function below converts to "=E2=201A=AC" while it must be "=E2=82=AC".
> The "=" sign is just a delimiter in quoted-printable.
>
> Now, I do not know what is wrong in my thinking as I am not getting quite
the same results.

> (The result is ok for other symbols such as 'ü'.)
>
> EXAMPLE:
>
> put "€" into tChar
>        // First encode to UTF-8:
> put textEncode(tChar,"UTF-8") into tCodedChar
>        // Repeat for each codepoint in the UTF-8 char
> repeat for each codePoint tCodePoint in tCodedChar
>        // Encode each codepoint to its integer expression and convert to
Hex value:
>       put "="& BaseConvert ( codePointToNum (tCodePoint) , 10 , 16 )
after tEncoded
> end repeat
> put tEncoded into field "Show Codepoints" -- Expected ASCII representing
Hex numbers
> -- Result: "=E2=201A=AC" -- Instead of "=E2=82=AC" , but valid and
working.
>
> The actual "correct" UTF-8 result can be tested here:
http://www.endmemo.com/unicode/unicodeconverter.php
>
> What am I missing?
>
> Thanks a lot
> Roland
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode