is it safe to rely on the hash of a livecode variable from a character encoding standpoint?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

is it safe to rely on the hash of a livecode variable from a character encoding standpoint?

Matthias Rebbe via use-livecode
Hi Everyone,

I want to ask how likely it is that at some point in the future some change
in character encoding could start producing a different hash for the same
sentence? just thinking about the nightmare scenarios facing a project that
heavily uses hashing to verify and address content......in international
characters......to boot.

Any thoughts on this would really help me out to not make a mistake in my
approach.

Thanks,

Tom
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: is it safe to rely on the hash of a livecode variable from a character encoding standpoint?

Matthias Rebbe via use-livecode
On 2018-01-26 18:50, Tom Glod via use-livecode wrote:

> Hi Everyone,
>
> I want to ask how likely it is that at some point in the future some
> change
> in character encoding could start producing a different hash for the
> same
> sentence? just thinking about the nightmare scenarios facing a project
> that
> heavily uses hashing to verify and address content......in
> international
> characters......to boot.

The hash/digest functions (e.g. sha1Digest) operate on binary data. So
if you do:

   put sha1Digest("foobar")

Then "foobar" is first converted to binary data using the native
encoding (i.e. the backwards-compatibility rule we have), then that is
hashed.

In every case where you produce a hash you have to explicitly choose an
encoding - so pick you favourite (unicode friendly!) encoding and do:

   get sha1Digest(textEncode(tMyString, tMyEncoding))

If you are generating hashes of strings to send to existing things, then
it should say *somewhere* in the docs of the thing you are sending what
encoding to use before applying the hash.

Also be aware that unicode allows the 'same' string to be encoded in
multiple ways - so its probably wise to choose a normalization form
first too (see normalizeText) - otherwise you could have two strings
which look the same (e.g. e,acute / e-acute) but hash to a different
value.

Warmest Regards,

Mark.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: is it safe to rely on the hash of a livecode variable from a character encoding standpoint?

Matthias Rebbe via use-livecode
wow i am glad I asked..thanks for the detailed answer Mark.

On Fri, Jan 26, 2018 at 1:31 PM, Mark Waddingham via use-livecode <
[hidden email]> wrote:

> On 2018-01-26 18:50, Tom Glod via use-livecode wrote:
>
>> Hi Everyone,
>>
>> I want to ask how likely it is that at some point in the future some
>> change
>> in character encoding could start producing a different hash for the same
>> sentence? just thinking about the nightmare scenarios facing a project
>> that
>> heavily uses hashing to verify and address content......in international
>> characters......to boot.
>>
>
> The hash/digest functions (e.g. sha1Digest) operate on binary data. So if
> you do:
>
>   put sha1Digest("foobar")
>
> Then "foobar" is first converted to binary data using the native encoding
> (i.e. the backwards-compatibility rule we have), then that is hashed.
>
> In every case where you produce a hash you have to explicitly choose an
> encoding - so pick you favourite (unicode friendly!) encoding and do:
>
>   get sha1Digest(textEncode(tMyString, tMyEncoding))
>
> If you are generating hashes of strings to send to existing things, then
> it should say *somewhere* in the docs of the thing you are sending what
> encoding to use before applying the hash.
>
> Also be aware that unicode allows the 'same' string to be encoded in
> multiple ways - so its probably wise to choose a normalization form first
> too (see normalizeText) - otherwise you could have two strings which look
> the same (e.g. e,acute / e-acute) but hash to a different value.
>
> Warmest Regards,
>
> Mark.
>
> --
> Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
> LiveCode: Everyone can create apps
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode