Re: use-livecode Digest, Vol 169, Issue 18

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: use-livecode Digest, Vol 169, Issue 18

Bob Sneidar via use-livecode
I agree that the redundant indexing is an expensive approach, however I have found that this abnormal structure of a repeat for each loop can be a lot faster than the other loop forms in some circumstances.

I can't find the reference that first highlighted to me the lack of guaranteed sequence of chunks, but I've assumed this was a restriction for some time. It would be great if this is incorrect as I've got some heavyweight looping in several of my apps that would benefit from this!

Can anyone from LiveCode give a definitive answer to this please?

Peter
--
Peter Reid
Loughborough, UK

> On 12 Oct 2017, at 4:36pm, [hidden email] wrote:
>
> Message: 17
> Date: Thu, 12 Oct 2017 08:35:50 -0700
> From: Richard Gaskin <[hidden email]>
> To: [hidden email]
> Subject: Re: Atkinson dither algorithm & 'for each' loop
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Peter Reid wrote:
>
>> One note of caution regarding the use of the "repeat for each" loop,
>> whilst you will get a loop iteration for every value in the collection
>> (fldhexa3 in your example), you are not guaranteed the order in which
>> they will occur.
>
> Maybe I misunderstand, but are you thinking of arrays there?
>
> With string containers a "repeat for each" expression should parse from
> beginning to end, sequentially.
>
> Any exception to that would be prohibitively unpredictable, and should
> be reported as a bug.
>
>
>> The following adjusted loop guarantees the sequence at the expense of
>> speed:
>>
>>  put 1 into i
>>  repeat for each word theWord in fldhexa3
>>     put word i of fldhexa3 into theWord
>
> The speed lost there appears to be the result of a redundancy, and a
> particularly expensive one:
>
> We love the convenience of chunk expressions, but in loops we want to
> remain mindful of "<chunk> <number> of <container>" because satisfying
> such expressions will require the engine to start from the beginning of
> the container, examine every character and counting delimiters, until it
> reaches the number of such delimiters specified in "<number>".
>
> So the "repeat" line above parses the chunk value into theWord, but then
> the next line ignores that that's already happened and reassigns the
> same theWord value using an expression that requires then engine to
> count words from the beginning of fldhexa3 each time through the loop.
>
>
> With this specific algo I believe there may be further benefits in using
> a chunk type other than word (based on a a hunch derived from word
> chunks being parsed more flexibly than items or lines), and perhaps not
> converting the binary data to hex at all, instead parsing bytes directly
> with a "step 4" option in the loop to keep track of the four components
> that define each pixel.
>
> --
>  Richard Gaskin
>  Fourth World Systems
>  Software Design and Development for the Desktop, Mobile, and the Web
>  ____________________________________________________________________
>  [hidden email]                http://www.FourthWorld.com


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: use-livecode Digest, Vol 169, Issue 18

Bob Sneidar via use-livecode
On 10/12/2017 09:38 AM, Peter Reid via use-livecode wrote:

> I can't find the reference that first highlighted to me the lack of guaranteed sequence of chunks, but I've assumed this was a restriction for some time. It would be great if this is incorrect as I've got some heavyweight looping in several of my apps that would benefit from this!

I have quite a lot of code that would fail miserably if the 'repeat for
each' construct would get things out of sequence. But if you're
referring to the 'repeat for each element' or 'repeat for each key'
constructs, then there's no ordered sequence in the array to begin with.

--
  Mark Wieder
  [hidden email]

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Repeat For Each Iteration Order (was Re: use-livecode Digest, Vol 169, Issue 18)

Bob Sneidar via use-livecode
In reply to this post by Bob Sneidar via use-livecode
On 2017-10-12 18:38, Peter Reid via use-livecode wrote:

> I agree that the redundant indexing is an expensive approach, however
> I have found that this abnormal structure of a repeat for each loop
> can be a lot faster than the other loop forms in some circumstances.
>
> I can't find the reference that first highlighted to me the lack of
> guaranteed sequence of chunks, but I've assumed this was a restriction
> for some time. It would be great if this is incorrect as I've got some
> heavyweight looping in several of my apps that would benefit from
> this!
>
> Can anyone from LiveCode give a definitive answer to this please?

Yes - the previous posts are correct.

If you are using 'repeat for each' to iterate over something which has
an order, then the iteration will follow that order.

Specifically:

    - string chunks (char, item, line, word, trueWord, sentence,
paragraph, token, codepoint, codeunit) are all ordered

    - byte chunks are ordered

    - key chunks are unordered (they use the hash-order of the array,
which can change arbitrarily even within a session)

    - element chunks are ordered for arrays which are sequences*,
unordered for all other arrays

* An array is a sequence if: it has only integer keys, and keys range
from 1 up to the number of elements in the array.
   e.g. If the keys are 1, 2, 3, 4 - then that is a sequence; 0, 1, 2, 3
is not; 1, 3, 4 is not.

Just to reiterate something Richard mentioned - the performance
advantage of a 'repeat for each' loop is completely lost if you use a
normal chunk within the loop to fetch another part (or the same part!)
of the container:

   put 0 into tIndex
   repeat for each word tWord in tString
     add 1 to tIndex
     put word somefunc(tIndex) of tString into tSameWord
     ...
   end repeat

This is because the chunk evaluation must step through somefunc(tIndex)
words in the string *again*. (It takes N steps to access the N'th
string/byte chunk in a container - repeat for each gets its performance
advantage because it remembers where it got the previous chunk from so
it can fetch the next, so it is just 1 step each iteration).

The only types of access which are guaranteed not to have this N step
behavior are codeunit, byte and array accesses:

   - a LiveCode string is (internally) an array of codeunits; which means
that it takes only one step to get any codeunit in it.

   - a LiveCode binary string is (internally) an array of bytes; so it
only takes one step to get any byte from it

   - LiveCode arrays are hash-tables, so you can look up any key in one
step

Note: If you are using the 'byte' chunk, make sure that the thing they
are accessing *is* actually a binary string before using it on them. The
conversion from string -> binary string (which exists for compatibility
with pre-7 scripts) will cause a performance bump.

In terms of the codeunit/codepoint/character chunks - the engine *tries*
to ensure it does as little work as possible when accessing these.
Internally, the engine sets flags on a string so that it can optimize
these lookups. In particular:

   1) If a string contains only Unicode codepoints from the first 64k
Unicode codes *without* surrogates (they give access to everything above
64k) then codepoint access on that string will be 1 step.

   2) If a string contains characters which are only composed of single
codepoints in the first 64k Unicode code *without* surrogates, then char
access on that string will be 1 step.

In particular, if the string you are processing actually came from a
native string; then (1) and (2) are guaranteed to be the case. However,
if you have arbitrary Unicode strings, then it won't generally be the
case:

   a) any characters which have a Unicode code > 64k must be represented
by two codes < 64k (surrogate pairs); this means that the engine has to
step through the string codeunit by codeunit to count the codepoints so
it can find the one you asked for (N step again)

   b) any characters which are composed of multiple codepoints are in the
same boat

Case (b) can arise quite often even in Roman script languages. For
example, in Unicode you can write e-acute as EITHER e,combining-acute
(two codepoints) OR as e-with-acute (one codepoint). For some languages,
most characters will be multiple codepoints - e.g. the Indic languages.
There are also a lot of esoteric languages (for scholarly / academic
purposes) which are entirely defined by Unicode codepoints > 64k.

Upshot is that if you are doing char by char string processing of
general Unicode strings, then try and use repeat for each char; and if
you *can't* do that for some reason, then try and work out how you can
do the processing by using the codeunit chunk.

However, as Knuth is famed for saying - "Premature optimization is the
root of all evil" - don't spend time trying to optimize an algorithm
until you have a working version and can intimately analyse where the
slow parts are. Trying to optimize ahead of time will likely just cause
a lot of time being spent needlessly, and potential hair loss when
figuring out why the (probably much more complicated code) doesn't work
quite right, or why it didn't produce the speed improvement you were
expecting!

Warmest Regards,

Mark.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Repeat For Each Iteration Order (was Re: use-livecode Digest, Vol 169, Issue 18)

Bob Sneidar via use-livecode

On 12/10/2017 18:16, Mark Waddingham via use-livecode wrote:
>     - element chunks are ordered for arrays which are sequences*, unordered
> for all other arrays
>
> * An array is a sequence if: it has only integer keys, and keys range from 1
> up to the number of elements in the array.
>    e.g. If the keys are 1, 2, 3, 4 - then that is a sequence; 0, 1, 2, 3 is
> not; 1, 3, 4 is not.

Ooh - I didn't know that! That's very useful, especially now that I've
switched to using the standard JSON library where we're frequently iterating
over such sequences.

Ben

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Repeat For Each Iteration Order (was Re: use-livecode Digest, Vol 169, Issue 18)

Bob Sneidar via use-livecode
In reply to this post by Bob Sneidar via use-livecode
Hi All

I feel my IQ goes up by at lest 10 points when I read some of Mark's
detailed answers concerning Livecode internals - and the thought processes
that go into them.

Mark's and many others' answers are worth the price of admission to this
list - thanks a lot Mark for such an erudite exposition of this and other
arcane Livecode concepts.

I have "Joel on Software"  but a" Mark on software" would be a great read
as well.

Lagi

On 12 October 2017 at 18:16, Mark Waddingham via use-livecode <
[hidden email]> wrote:

> On 2017-10-12 18:38, Peter Reid via use-livecode wrote:
>
>> I agree that the redundant indexing is an expensive approach, however
>> I have found that this abnormal structure of a repeat for each loop
>> can be a lot faster than the other loop forms in some circumstances.
>>
>> I can't find the reference that first highlighted to me the lack of
>> guaranteed sequence of chunks, but I've assumed this was a restriction
>> for some time. It would be great if this is incorrect as I've got some
>> heavyweight looping in several of my apps that would benefit from
>> this!
>>
>> Can anyone from LiveCode give a definitive answer to this please?
>>
>
> Yes - the previous posts are correct.
>
> If you are using 'repeat for each' to iterate over something which has an
> order, then the iteration will follow that order.
>
> Specifically:
>
>    - string chunks (char, item, line, word, trueWord, sentence, paragraph,
> token, codepoint, codeunit) are all ordered
>
>    - byte chunks are ordered
>
>    - key chunks are unordered (they use the hash-order of the array, which
> can change arbitrarily even within a session)
>
>    - element chunks are ordered for arrays which are sequences*, unordered
> for all other arrays
>
> * An array is a sequence if: it has only integer keys, and keys range from
> 1 up to the number of elements in the array.
>   e.g. If the keys are 1, 2, 3, 4 - then that is a sequence; 0, 1, 2, 3 is
> not; 1, 3, 4 is not.
>
> Just to reiterate something Richard mentioned - the performance advantage
> of a 'repeat for each' loop is completely lost if you use a normal chunk
> within the loop to fetch another part (or the same part!) of the container:
>
>   put 0 into tIndex
>   repeat for each word tWord in tString
>     add 1 to tIndex
>     put word somefunc(tIndex) of tString into tSameWord
>     ...
>   end repeat
>
> This is because the chunk evaluation must step through somefunc(tIndex)
> words in the string *again*. (It takes N steps to access the N'th
> string/byte chunk in a container - repeat for each gets its performance
> advantage because it remembers where it got the previous chunk from so it
> can fetch the next, so it is just 1 step each iteration).
>
> The only types of access which are guaranteed not to have this N step
> behavior are codeunit, byte and array accesses:
>
>   - a LiveCode string is (internally) an array of codeunits; which means
> that it takes only one step to get any codeunit in it.
>
>   - a LiveCode binary string is (internally) an array of bytes; so it only
> takes one step to get any byte from it
>
>   - LiveCode arrays are hash-tables, so you can look up any key in one step
>
> Note: If you are using the 'byte' chunk, make sure that the thing they are
> accessing *is* actually a binary string before using it on them. The
> conversion from string -> binary string (which exists for compatibility
> with pre-7 scripts) will cause a performance bump.
>
> In terms of the codeunit/codepoint/character chunks - the engine *tries*
> to ensure it does as little work as possible when accessing these.
> Internally, the engine sets flags on a string so that it can optimize these
> lookups. In particular:
>
>   1) If a string contains only Unicode codepoints from the first 64k
> Unicode codes *without* surrogates (they give access to everything above
> 64k) then codepoint access on that string will be 1 step.
>
>   2) If a string contains characters which are only composed of single
> codepoints in the first 64k Unicode code *without* surrogates, then char
> access on that string will be 1 step.
>
> In particular, if the string you are processing actually came from a
> native string; then (1) and (2) are guaranteed to be the case. However, if
> you have arbitrary Unicode strings, then it won't generally be the case:
>
>   a) any characters which have a Unicode code > 64k must be represented by
> two codes < 64k (surrogate pairs); this means that the engine has to step
> through the string codeunit by codeunit to count the codepoints so it can
> find the one you asked for (N step again)
>
>   b) any characters which are composed of multiple codepoints are in the
> same boat
>
> Case (b) can arise quite often even in Roman script languages. For
> example, in Unicode you can write e-acute as EITHER e,combining-acute (two
> codepoints) OR as e-with-acute (one codepoint). For some languages, most
> characters will be multiple codepoints - e.g. the Indic languages. There
> are also a lot of esoteric languages (for scholarly / academic purposes)
> which are entirely defined by Unicode codepoints > 64k.
>
> Upshot is that if you are doing char by char string processing of general
> Unicode strings, then try and use repeat for each char; and if you *can't*
> do that for some reason, then try and work out how you can do the
> processing by using the codeunit chunk.
>
> However, as Knuth is famed for saying - "Premature optimization is the
> root of all evil" - don't spend time trying to optimize an algorithm until
> you have a working version and can intimately analyse where the slow parts
> are. Trying to optimize ahead of time will likely just cause a lot of time
> being spent needlessly, and potential hair loss when figuring out why the
> (probably much more complicated code) doesn't work quite right, or why it
> didn't produce the speed improvement you were expecting!
>
> Warmest Regards,
>
> Mark.
>
> --
> Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
> LiveCode: Everyone can create apps
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode