Optimization can be tricky

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimization can be tricky

dunbarxx via use-livecode
​I have a routine that takes about a minute to run for the test case I
created, and who knows how long for the real use case. Given that I want to
run it several thousand times for actual work, I need it to run (much)
faster.

Roughly, the routine gathers together a bunch of data from arrays, sorts
the data based on its relationship to other arrays, and then returns a
subset of the result.

My first pass at the routine looked roughly like this:

   repeat for each line T in the keys of interestArray[uID]
      repeat for each line S in storyArray[T]
         if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
               and userSeenArray[uID][item 1 of S] < 4
         then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
               abs(item 2 of S - item 1 of interestArray[uID][T]) - \
               item 2 of interestArray[uID][T]),T,S & cr after candidateList
      end repeat
   end repeat
   sort lines of candidateList numeric by random(item 1 of each)

In simple terms: parse through the individual lines of all the entries that
possibly work, calculating a relevant value for each and appending that
value and the line to an interim list, which then gets sorted, randomly
favoring lower values.

I assumed the problem was all the line-by-line parsing, and I thought of a
clever way to accomplish the same thing. That led to this:

   put storyArray into R
   intersect R with interestArray[uID]
   combine R using cr and comma
   sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 * \
         abs(item 3 of each - item 1 of interestArray[uID][item 1 of each])
\
         - item 2 of interestArray[uID][item 1 of each])

Much simpler, albeit that's a heck of a "sort by" -- more complex by far
than any I had previously created, and a testament to the power and
flexibility of "each". Alas, despite condensing my code and removing
parsing and loops, that version took ten seconds more than the previous
version, I'm guessing because the previous version has that "if" statement
that weeds out undesirable entries before the sort has to deal with them.

(I'm writing this email as I parse through this figuring it out)

So it turns out that the crazy use of "each" is only part of the problem --
changing that line to:

   sort lines of R by random(10000)

still takes over 20 seconds -- 3x as fast, but still far too slow. It turns
out that the source data numbers anywhere from 1,000 to 2,000 lines, so
sorting it in any way to randomly select the several lines I need is a
really bad choice. removing the sort step but keeping everything else cuts
the execution time down to under a second.

Hmm. How to select several lines at weighted random from among a couple
thousand? I'll think on this and post a follow-up.
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
please do goeff.... this subject is very interesting to me.  i have some
problems where i need to optimize in a similar way.  which kind of repeat
look have u found works fastest? have u tried ? repeat for each key
this_key in array? is that slower?

i love saving milliseconds. :) makes a big diff at scale.

On Mon, Jun 11, 2018 at 8:21 PM, Geoff Canyon via use-livecode <
[hidden email]> wrote:

> ​I have a routine that takes about a minute to run for the test case I
> created, and who knows how long for the real use case. Given that I want to
> run it several thousand times for actual work, I need it to run (much)
> faster.
>
> Roughly, the routine gathers together a bunch of data from arrays, sorts
> the data based on its relationship to other arrays, and then returns a
> subset of the result.
>
> My first pass at the routine looked roughly like this:
>
>    repeat for each line T in the keys of interestArray[uID]
>       repeat for each line S in storyArray[T]
>          if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>                and userSeenArray[uID][item 1 of S] < 4
>          then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>                abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>                item 2 of interestArray[uID][T]),T,S & cr after
> candidateList
>       end repeat
>    end repeat
>    sort lines of candidateList numeric by random(item 1 of each)
>
> In simple terms: parse through the individual lines of all the entries that
> possibly work, calculating a relevant value for each and appending that
> value and the line to an interim list, which then gets sorted, randomly
> favoring lower values.
>
> I assumed the problem was all the line-by-line parsing, and I thought of a
> clever way to accomplish the same thing. That led to this:
>
>    put storyArray into R
>    intersect R with interestArray[uID]
>    combine R using cr and comma
>    sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 *
> \
>          abs(item 3 of each - item 1 of interestArray[uID][item 1 of each])
> \
>          - item 2 of interestArray[uID][item 1 of each])
>
> Much simpler, albeit that's a heck of a "sort by" -- more complex by far
> than any I had previously created, and a testament to the power and
> flexibility of "each". Alas, despite condensing my code and removing
> parsing and loops, that version took ten seconds more than the previous
> version, I'm guessing because the previous version has that "if" statement
> that weeds out undesirable entries before the sort has to deal with them.
>
> (I'm writing this email as I parse through this figuring it out)
>
> So it turns out that the crazy use of "each" is only part of the problem --
> changing that line to:
>
>    sort lines of R by random(10000)
>
> still takes over 20 seconds -- 3x as fast, but still far too slow. It turns
> out that the source data numbers anywhere from 1,000 to 2,000 lines, so
> sorting it in any way to randomly select the several lines I need is a
> really bad choice. removing the sort step but keeping everything else cuts
> the execution time down to under a second.
>
> Hmm. How to select several lines at weighted random from among a couple
> thousand? I'll think on this and post a follow-up.
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
At first glance, it looks like you might save some time by grabbing interestArray[uID][T] as soon as you have T, and then use what you grabbed in the 3 later places instead of re-figuring interestArray[uID][T] each time.

As in:
repeat for each line T in the keys of interestArray[uID]
     put interestArray[uID][T] into AuT —- grab it here
     repeat for each line S in storyArray[T]
        if abs(item 2 of S - item 1 of AuT) < 20 \
              and userSeenArray[uID][item 1 of S] < 4
        then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
              abs(item 2 of S - item 1 of AuT) - \
              item 2 of AuT),T,S & cr after candidateList
     end repeat
  end repeat
  sort lines of candidateList numeric by random(item 1 of each)

It looks like you could do a similar thing with userSeenArray[uID][item 1 of S] to avoid repeating the same calculation.
Also with item 1 of AuT, and item 2 of S, with lesser gains but while you’re at it…
That all will make it easier (or harder) to read, depending on if you can find descriptive variable names for the intermediate values.

It won’t help the sort speed, but saves a lot of array un-hashing.
I haven’t figured out what the algoraithm is doing, or the idea of the randomness in the sort.

All untested, of course!
Cheers,
Jerry

> On Jun 11, 2018, at 5:21 PM, Geoff Canyon via use-livecode <[hidden email]> wrote:
>
> My first pass at the routine looked roughly like this:
>
>   repeat for each line T in the keys of interestArray[uID]
>      repeat for each line S in storyArray[T]
>         if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>               and userSeenArray[uID][item 1 of S] < 4
>         then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>               abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>               item 2 of interestArray[uID][T]),T,S & cr after candidateList
>      end repeat
>   end repeat
>   sort lines of candidateList numeric by random(item 1 of each)


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
Do you need the entire list sorted randomly or are you just needing to select N random entries from the list (weighted)? How does the speed change if you do a simple numeric sort?

You could use a function on a random number to weight things - would need to work out the right math to get what you want. Something like random(n)^2/n^2 possibly. Depending on how heavily you want to prefer one end, you could use log or ln.

Your initial code is calculating “item 1 of interestArray[uID][T]” once or twice for every line S. I’m guessing that if you evaluate that to a variable once per repeat it would provide a small gain.

Depending on how often the if turns out to be true, some additional variables could possibly reduce the number of duplicate calculations enough to offset the cost of the assignment (what Jerry mentioned).
On Jun 11, 2018, 8:30 PM -0500, Jerry Jensen via use-livecode , wrote:

> At first glance, it looks like you might save some time by grabbing interestArray[uID][T] as soon as you have T, and then use what you grabbed in the 3 later places instead of re-figuring interestArray[uID][T] each time.
>
> As in:
> repeat for each line T in the keys of interestArray[uID]
> put interestArray[uID][T] into AuT —- grab it here
> repeat for each line S in storyArray[T]
> if abs(item 2 of S - item 1 of AuT) < 20 \
> and userSeenArray[uID][item 1 of S] < 4
> then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
> abs(item 2 of S - item 1 of AuT) - \
> item 2 of AuT),T,S & cr after candidateList
> end repeat
> end repeat
> sort lines of candidateList numeric by random(item 1 of each)
>
> It looks like you could do a similar thing with userSeenArray[uID][item 1 of S] to avoid repeating the same calculation.
> Also with item 1 of AuT, and item 2 of S, with lesser gains but while you’re at it…
> That all will make it easier (or harder) to read, depending on if you can find descriptive variable names for the intermediate values.
>
> It won’t help the sort speed, but saves a lot of array un-hashing.
> I haven’t figured out what the algoraithm is doing, or the idea of the randomness in the sort.
>
> All untested, of course!
> Cheers,
> Jerry
>
> > On Jun 11, 2018, at 5:21 PM, Geoff Canyon via use-livecode <[hidden email]> wrote:
> >
> > My first pass at the routine looked roughly like this:
> >
> > repeat for each line T in the keys of interestArray[uID]
> > repeat for each line S in storyArray[T]
> > if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
> > and userSeenArray[uID][item 1 of S] < 4
> > then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
> > abs(item 2 of S - item 1 of interestArray[uID][T]) - \
> > item 2 of interestArray[uID][T]),T,S & cr after candidateList
> > end repeat
> > end repeat
> > sort lines of candidateList numeric by random(item 1 of each)
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

RE: Optimization can be tricky

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
I know that this might be a different use-case but I have a CA Lottery - Fantasy five lottery parser that collects lottery data and loads it into a matrix and provides the total number of  hits for each number. This also has 4 optimization buttons to the show differences. This aupplication was optimized by our Pasadena LiveCode user's group. Drop me an email and I will send you a zipped copy of the Script with a sample lotto data file with approximately 8000 lines of picks.
Email: [hidden email]

-----Original Message-----
From: use-livecode <[hidden email]> On Behalf Of Geoff Canyon via use-livecode
Sent: Monday, June 11, 2018 5:22 PM
To: How to use LiveCode <[hidden email]>
Cc: Geoff Canyon <[hidden email]>
Subject: Optimization can be tricky

​I have a routine that takes about a minute to run for the test case I created, and who knows how long for the real use case. Given that I want to run it several thousand times for actual work, I need it to run (much) faster.

Roughly, the routine gathers together a bunch of data from arrays, sorts the data based on its relationship to other arrays, and then returns a subset of the result.

My first pass at the routine looked roughly like this:

   repeat for each line T in the keys of interestArray[uID]
      repeat for each line S in storyArray[T]
         if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
               and userSeenArray[uID][item 1 of S] < 4
         then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
               abs(item 2 of S - item 1 of interestArray[uID][T]) - \
               item 2 of interestArray[uID][T]),T,S & cr after candidateList
      end repeat
   end repeat
   sort lines of candidateList numeric by random(item 1 of each)

In simple terms: parse through the individual lines of all the entries that possibly work, calculating a relevant value for each and appending that value and the line to an interim list, which then gets sorted, randomly favoring lower values.

I assumed the problem was all the line-by-line parsing, and I thought of a clever way to accomplish the same thing. That led to this:

   put storyArray into R
   intersect R with interestArray[uID]
   combine R using cr and comma
   sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 * \
         abs(item 3 of each - item 1 of interestArray[uID][item 1 of each]) \
         - item 2 of interestArray[uID][item 1 of each])

Much simpler, albeit that's a heck of a "sort by" -- more complex by far than any I had previously created, and a testament to the power and flexibility of "each". Alas, despite condensing my code and removing parsing and loops, that version took ten seconds more than the previous version, I'm guessing because the previous version has that "if" statement that weeds out undesirable entries before the sort has to deal with them.

(I'm writing this email as I parse through this figuring it out)

So it turns out that the crazy use of "each" is only part of the problem -- changing that line to:

   sort lines of R by random(10000)

still takes over 20 seconds -- 3x as fast, but still far too slow. It turns out that the source data numbers anywhere from 1,000 to 2,000 lines, so sorting it in any way to randomly select the several lines I need is a really bad choice. removing the sort step but keeping everything else cuts the execution time down to under a second.

Hmm. How to select several lines at weighted random from among a couple thousand? I'll think on this and post a follow-up.
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
Hi Geoff,

One thing to try in your original code, which should be significantly
faster if the array is big, is using

> repeat for each key T in interestArray[uID]

instead of

> repeat for each line T in the keys of interestArray[uID]

The latter has to allocate memory for a string containing all the keys, AND
iterate through the lines of that string, whereas repeating for each
key directly
accesses the keys of the array in hash order, and thus no extra allocation
will occur and the iteration is faster than finding the next line delimiter
each time.



On Tue, Jun 12, 2018 at 8:35 AM Clarence Martin via use-livecode <
[hidden email]> wrote:

> I know that this might be a different use-case but I have a CA Lottery -
> Fantasy five lottery parser that collects lottery data and loads it into a
> matrix and provides the total number of  hits for each number. This also
> has 4 optimization buttons to the show differences. This aupplication was
> optimized by our Pasadena LiveCode user's group. Drop me an email and I
> will send you a zipped copy of the Script with a sample lotto data file
> with approximately 8000 lines of picks.
> Email: [hidden email]
>
> -----Original Message-----
> From: use-livecode <[hidden email]> On Behalf Of
> Geoff Canyon via use-livecode
> Sent: Monday, June 11, 2018 5:22 PM
> To: How to use LiveCode <[hidden email]>
> Cc: Geoff Canyon <[hidden email]>
> Subject: Optimization can be tricky
>
> ​I have a routine that takes about a minute to run for the test case I
> created, and who knows how long for the real use case. Given that I want to
> run it several thousand times for actual work, I need it to run (much)
> faster.
>
> Roughly, the routine gathers together a bunch of data from arrays, sorts
> the data based on its relationship to other arrays, and then returns a
> subset of the result.
>
> My first pass at the routine looked roughly like this:
>
>    repeat for each line T in the keys of interestArray[uID]
>       repeat for each line S in storyArray[T]
>          if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>                and userSeenArray[uID][item 1 of S] < 4
>          then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>                abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>                item 2 of interestArray[uID][T]),T,S & cr after
> candidateList
>       end repeat
>    end repeat
>    sort lines of candidateList numeric by random(item 1 of each)
>
> In simple terms: parse through the individual lines of all the entries
> that possibly work, calculating a relevant value for each and appending
> that value and the line to an interim list, which then gets sorted,
> randomly favoring lower values.
>
> I assumed the problem was all the line-by-line parsing, and I thought of a
> clever way to accomplish the same thing. That led to this:
>
>    put storyArray into R
>    intersect R with interestArray[uID]
>    combine R using cr and comma
>    sort lines of R by (101 + userSeenArray[uID][item 2 of each] * 30 + 5 *
> \
>          abs(item 3 of each - item 1 of interestArray[uID][item 1 of
> each]) \
>          - item 2 of interestArray[uID][item 1 of each])
>
> Much simpler, albeit that's a heck of a "sort by" -- more complex by far
> than any I had previously created, and a testament to the power and
> flexibility of "each". Alas, despite condensing my code and removing
> parsing and loops, that version took ten seconds more than the previous
> version, I'm guessing because the previous version has that "if" statement
> that weeds out undesirable entries before the sort has to deal with them.
>
> (I'm writing this email as I parse through this figuring it out)
>
> So it turns out that the crazy use of "each" is only part of the problem
> -- changing that line to:
>
>    sort lines of R by random(10000)
>
> still takes over 20 seconds -- 3x as fast, but still far too slow. It
> turns out that the source data numbers anywhere from 1,000 to 2,000 lines,
> so sorting it in any way to randomly select the several lines I need is a
> really bad choice. removing the sort step but keeping everything else cuts
> the execution time down to under a second.
>
> Hmm. How to select several lines at weighted random from among a couple
> thousand? I'll think on this and post a follow-up.
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
You could try the following:

repeat for each key T in interestArray[uID]
  put item 1 of interestArray[uID][T] into i1
  put item 2 of interestArray[uID][T] into i2
  repeat for each line S in storyArray[T]
    put userSeenArray[uID][item 1 of S] into s1
    put abs(item 2 of S - i1) into s2
    if s2 < 20 and s1 < 4 then
      put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \
      after candidateList
    end if
  end repeat
end repeat
sort candidateList numeric by item 1 of each


> Ali wrote:
> repeat for each key T in interestArray[uID]
>
>> Geoff wrote:
>> repeat for each line T in the keys of interestArray[uID]
>>   repeat for each line S in storyArray[T]
>> if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>>   and userSeenArray[uID][item 1 of S] < 4
>> then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>>   abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>>   item 2 of interestArray[uID][T]),T,S & cr after candidateList
>>   end repeat
>> end repeat
>> sort lines of candidateList numeric by random(item 1 of each)

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
Sorry, I forgot to mention Jerry who already proposed to pull out computations
from the inner loop and also to use variables in the inner loop. My experience
says to avoid getting items as often as possible. And to use the random() in the
inner repeat loop instead of in the final sort may be worth a trial with a large
data set.

I wrote:

> You could try the following.
>
> repeat for each key T in interestArray[uID]
>   put item 1 of interestArray[uID][T] into i1
>   put item 2 of interestArray[uID][T] into i2
>   repeat for each line S in storyArray[T]
>     put userSeenArray[uID][item 1 of S] into s1
>     put abs(item 2 of S - i1) into s2
>     if s2 < 20 and s1 < 4 then
>       put random(101 + s1 * 30 + 5 *  s2 - i2),T,S & cr after candidateList
>     end if
>   end repeat
> end repeat
> sort candidateList numeric by item 1 of each
>
>
> Ali wrote:
> repeat for each key T in interestArray[uID]
>
> Geoff wrote:
> repeat for each line T in the keys of interestArray[uID]
>   repeat for each line S in storyArray[T]
> if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>   and userSeenArray[uID][item 1 of S] < 4
> then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>   abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>   item 2 of interestArray[uID][T]),T,S & cr after candidateList
>   end repeat
> end repeat
> sort lines of candidateList numeric by random(item 1 of each)

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
I don't know if these are good ideas or BAD ideas .... and without some
suitable data, not sure if I could find out - so I'll throw the ideas
over and let you decide if they are worth trying :-)


1. Two ideas combined

  - avoid scanning for the "item 1" of each line
  - use the (hopefully optimised enough) numeric-index array lookups

So, instead of

       put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \
       after candidateList

try

       put T,S  into  candidateArray[count]
       put random(101 + s1 * 30 + 5 * s2 - i2) into candidateValue[count]
       add 1 to count

And then later
    put the keys of candidateValue into candidatelist
    sort numeric the lines of candidatelist by candidateValue of each
and the access via the candidateArray.

2. (even wilder)
Assume you have N entries to consider, and that you only need M results
The *IF* M << N (i.e. much less than - you only need a few results from a large input set).

You might try something like this for the inner loop :
(going back to using lines ...)

   repeat for each line S in storyArray[T]
     put userSeenArray[uID][item 1 of S] into s1
     put abs(item 2 of S - i1) into s2
     if s2 < 20 and s1 < 4 then
       put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \
           after candidateList
       add 1 to lineCount
       if lineCount > 4 * M then    -- (why 4 ? - maybe 2, maybe 8 ??)
           sort candidateList numeric by item 1 of each
           delete line M+1 to -1 of candidateList
           put M into lineCount
       end if
     end if
   end repeat


3. even wilder still ....
  - create K different candidateLists (where K is some power of 2)
  - sort each one,
  - mergesort in pairs, stopping when you get to M ...

put 8 into KK    -- (why 8  ? - maybe 16, or 32 ...)
repeat for each key T in interestArray[uID]
   put item 1 of interestArray[uID][T] into i1
   put item 2 of interestArray[uID][T] into i2
   repeat for each line S in storyArray[T]
     put userSeenArray[uID][item 1 of S] into s1
     put abs(item 2 of S - i1) into s2
     if s2 < 20 and s1 < 4 then
       add 1 to setCount
       put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \
           after candidateList[setCount]
       if setCount > KK then
           put 0 into setCount
       end if
     end if
   end repeat
repeat with i = 1 to KK
    sort numeric lines of candidateList[i] by item 1 of each
end repeat
and then write / use a mergesort that takes two sorted lists, and returns the first N of the merged list.

(or even better, returns the first N, with a limit value; then each mergesort after the first can use the max value from the N'th entry of earlier merges to allow for early exit...)

That is left as an exercise for the reader :-) :-)

As I say - could be totally crazy, could be worthwhile.

I'd be happy to try out benchmarking it - if I had some representative data to play with.

Alex.

On 12/06/2018 12:03, hh via use-livecode wrote:

> You could try the following:
>
> repeat for each key T in interestArray[uID]
>    put item 1 of interestArray[uID][T] into i1
>    put item 2 of interestArray[uID][T] into i2
>    repeat for each line S in storyArray[T]
>      put userSeenArray[uID][item 1 of S] into s1
>      put abs(item 2 of S - i1) into s2
>      if s2 < 20 and s1 < 4 then
>        put random(101 + s1 * 30 + 5 * s2 - i2),T,S & cr \
>        after candidateList
>      end if
>    end repeat
> end repeat
> sort candidateList numeric by item 1 of each
>
>
>> Ali wrote:
>> repeat for each key T in interestArray[uID]
>>
>>> Geoff wrote:
>>> repeat for each line T in the keys of interestArray[uID]
>>>    repeat for each line S in storyArray[T]
>>> if abs(item 2 of S - item 1 of interestArray[uID][T]) < 20 \
>>>   and userSeenArray[uID][item 1 of S] < 4
>>> then put (101 + userSeenArray[uID][item 1 of S] * 30 + 5 * \
>>>   abs(item 2 of S - item 1 of interestArray[uID][T]) - \
>>>   item 2 of interestArray[uID][T]),T,S & cr after candidateList
>>>    end repeat
>>> end repeat
>>> sort lines of candidateList numeric by random(item 1 of each)
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode

Put yourself in the computer's shoes, and also clarify what you need to
accomplish. You are asking it to sort the entire list of 2000 records,
but (if I understand) you only want a handful of those.

And it has already gone through all the records once before the sort. If
you asked a human to do that, it would be a risky interaction. There
might be raised voices when they present the neatly arranged file
cabinet with files in a special order, and you deliver the punch line
that you only need to pull a few. If you only need to pull 5 records,
most people don't want to go through all 2000 files - twice.

So generally, don't use a sort after doing a complete loop on a big text
list, use another method. Do unto your computer as you hope it will do
unto you when they run the world. :)

(But I would be interested in the speed of your original code on LC 9 vs
LC 6.7, just to see how engine optimization is faring. I'm surprised it
took a minute.)

Best wishes,

Curry Kenworthy

Custom Software Development
LiveCode Training and Consulting
http://livecodeconsulting.com/

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
The scenario Geoff described is roughly to get the
top ten (a handful) of 2000 records comparing a certain
numeric value of each.
To get that unknown value of each one has to go once through
all the records *and then sort* for that ranking.
(LiveCode is very fast with a simple numeric sort!)
Any other method of ranking while going through the data
will generously waste CPU time.

> Curry wrote:
> Put yourself in the computer's shoes, and also clarify what you need to
> accomplish. You are asking it to sort the entire list of 2000 records,
> but (if I understand) you only want a handful of those.
>
> And it has already gone through all the records once before the sort. If
> you asked a human to do that, it would be a risky interaction. There
> might be raised voices when they present the neatly arranged file
> cabinet with files in a special order, and you deliver the punch line
> that you only need to pull a few. If you only need to pull 5 records,
> most people don't want to go through all 2000 files - twice.
>
> So generally, don't use a sort after doing a complete loop on a big text
> list, use another method. Do unto your computer as you hope it will do
> unto you when they run the world. :)


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
Then there's something about Geoff's problem (or problem statement) that
I don't understand.

What on earth is in those records for sorting a mere 2000 of them to be
noticable? (I had thought he had to be talking about magnitudes more
lines than that).

LC (9.0) sorts 20,000 lines of >1000 chars each in 65ms or so, on my
aging MacBook Pro. So the delay can't be the sort - it must be in the
extracting of the data from the arrays into his 'candidateList'.

It would be good to get that clarified before we spend more time
optimizing the wrong thing :-)

Alex.



On 12/06/2018 21:39, hh via use-livecode wrote:

> The scenario Geoff described is roughly to get the
> top ten (a handful) of 2000 records comparing a certain
> numeric value of each.
> To get that unknown value of each one has to go once through
> all the records *and then sort* for that ranking.
> (LiveCode is very fast with a simple numeric sort!)
> Any other method of ranking while going through the data
> will generously waste CPU time.
>
>> Curry wrote:
>> Put yourself in the computer's shoes, and also clarify what you need to
>> accomplish. You are asking it to sort the entire list of 2000 records,
>> but (if I understand) you only want a handful of those.
>>
>> And it has already gone through all the records once before the sort. If
>> you asked a human to do that, it would be a risky interaction. There
>> might be raised voices when they present the neatly arranged file
>> cabinet with files in a special order, and you deliver the punch line
>> that you only need to pull a few. If you only need to pull 5 records,
>> most people don't want to go through all 2000 files - twice.
>>
>> So generally, don't use a sort after doing a complete loop on a big text
>> list, use another method. Do unto your computer as you hope it will do
>> unto you when they run the world. :)
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

RE: Optimization can be tricky

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
Filter is lightning fast. I have the list of all 40,000+ cities in the US
and have a predictive search field. I tried DBs, in memory DB and other
methods until I stumbled on to the filter operator. Load a text file into a
var and the start filtering. There is no perceptible delay when typing the
first few letters of a city name. I even carry some meta data along for the
ride.
I don't know if it can  be used here. I have not had the time to look at the
code or the problem presented.

Ralph DiMola
IT Director
Evergreen Information Services
[hidden email]


-----Original Message-----
From: use-livecode [mailto:[hidden email]] On Behalf
Of hh via use-livecode
Sent: Tuesday, June 12, 2018 4:39 PM
To: [hidden email]
Cc: hh
Subject: Re: Optimization can be tricky

The scenario Geoff described is roughly to get the top ten (a handful) of
2000 records comparing a certain numeric value of each.
To get that unknown value of each one has to go once through all the records
*and then sort* for that ranking.
(LiveCode is very fast with a simple numeric sort!) Any other method of
ranking while going through the data will generously waste CPU time.

> Curry wrote:
> Put yourself in the computer's shoes, and also clarify what you need
> to accomplish. You are asking it to sort the entire list of 2000
> records, but (if I understand) you only want a handful of those.
>
> And it has already gone through all the records once before the sort.
> If you asked a human to do that, it would be a risky interaction.
> There might be raised voices when they present the neatly arranged
> file cabinet with files in a special order, and you deliver the punch
> line that you only need to pull a few. If you only need to pull 5
> records, most people don't want to go through all 2000 files - twice.
>
> So generally, don't use a sort after doing a complete loop on a big
> text list, use another method. Do unto your computer as you hope it
> will do unto you when they run the world. :)


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode

Optimizing scripts in LC is not the same as running reports in other
software suites. If you only need 10 results, you probably don't want to
handle all items twice.

I hate to loop through all items even once. But if I do, I may be done!
I'm not making a full report to print out; I'm just getting my 10 items.
If I use sort or filter, likewise, I will do whatever necessary to keep
it short and sweet, even if that means adjusting some of the starting
assumptions.

If the sort really is taking more time than the loop, something is
bogging it down. Look at the random() and arithmetic attached to the
sort. That's potentially like running a whole lot of LCS. So if you
sort, try a plain numeric sort with no fancy stuff added. Then go grab
your 10 - the other requirements can be addressed before or after the sort.

Best wishes,

Curry Kenworthy

Custom Software Development
LiveCode Training and Consulting
http://livecodeconsulting.com/

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
Thanks for the tip Ralph....love the sound of that filer function.

On Tue, Jun 12, 2018 at 7:00 PM, Curry Kenworthy via use-livecode <
[hidden email]> wrote:

>
> Optimizing scripts in LC is not the same as running reports in other
> software suites. If you only need 10 results, you probably don't want to
> handle all items twice.
>
> I hate to loop through all items even once. But if I do, I may be done!
> I'm not making a full report to print out; I'm just getting my 10 items. If
> I use sort or filter, likewise, I will do whatever necessary to keep it
> short and sweet, even if that means adjusting some of the starting
> assumptions.
>
> If the sort really is taking more time than the loop, something is bogging
> it down. Look at the random() and arithmetic attached to the sort. That's
> potentially like running a whole lot of LCS. So if you sort, try a plain
> numeric sort with no fancy stuff added. Then go grab your 10 - the other
> requirements can be addressed before or after the sort.
>
> Best wishes,
>
> Curry Kenworthy
>
> Custom Software Development
> LiveCode Training and Consulting
> http://livecodeconsulting.com/
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
Thanks for all the suggestions -- I tried multiple ideas without much
improvement, then decided to rethink the problem. Roughly:

I have a bunch of source data that I categorized by two numbers, the first
from 1-N where N might be anywhere from 100-5,000, and the second from
1-100. For any given first number, there might be about fifty entries, with
varying values for the second number.

Then for a given query I was trying to:

1. Take a list of 20-100 values for the first number, and for each of
those, a value for the second number,
2. Return a set of about 10-20 entries, where each entry matches one of the
first numbers from the list, and then probabilistically selected based on
the proximity of the second number from the list to the second number of
the item.

(The userSeen aspect was an additional wrinkle)

Since there are about 50 entries for each of the first numbers, the entries
that *might* be returned number [20-100] * 50, or about 1,000-5,000
entries. And everything I tried was too slow, since ideally I want to be
able to do this anywhere from 10-100 times per second.

So I reframed at the source. Instead of keeping all the candidates in a
single array keyed by the first number, I kept them in sub-arrays keyed by
the first digit of the second number.

Then, instead of gathering all the possible values based on the first
number matches, I rotate the list of first number matches each time I
retrieve, and then parse through the resulting matches in order, gathering
any that work, until I have enough to return.

This is certainly not identical to the original method, but it's close
enough, and runs in a small fraction of a second.

thanks again,

gc

On Tue, Jun 12, 2018 at 6:04 PM, Tom Glod via use-livecode <
[hidden email]> wrote:

> Thanks for the tip Ralph....love the sound of that filer function.
>
> On Tue, Jun 12, 2018 at 7:00 PM, Curry Kenworthy via use-livecode <
> [hidden email]> wrote:
>
> >
> > Optimizing scripts in LC is not the same as running reports in other
> > software suites. If you only need 10 results, you probably don't want to
> > handle all items twice.
> >
> > I hate to loop through all items even once. But if I do, I may be done!
> > I'm not making a full report to print out; I'm just getting my 10 items.
> If
> > I use sort or filter, likewise, I will do whatever necessary to keep it
> > short and sweet, even if that means adjusting some of the starting
> > assumptions.
> >
> > If the sort really is taking more time than the loop, something is
> bogging
> > it down. Look at the random() and arithmetic attached to the sort. That's
> > potentially like running a whole lot of LCS. So if you sort, try a plain
> > numeric sort with no fancy stuff added. Then go grab your 10 - the other
> > requirements can be addressed before or after the sort.
> >
> > Best wishes,
> >
> > Curry Kenworthy
> >
> > Custom Software Development
> > LiveCode Training and Consulting
> > http://livecodeconsulting.com/
> >
> > _______________________________________________
> > use-livecode mailing list
> > [hidden email]
> > Please visit this url to subscribe, unsubscribe and manage your
> > subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
> >
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode

Geoff wrote:

 > This is certainly not identical to the original method, but it's
 > close enough, and runs in a small fraction of a second.

Good approach! Optimized code rules.

Best wishes,

Curry K.


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Optimization can be tricky

dunbarxx via use-livecode
Nice....the proof is in the pudding. Thats why the say a good programmer
spends most of their time staring at the ceiling ....... thinking.  Who
they are ...I do not know.

On Tue, Jun 19, 2018 at 12:47 AM, Curry Kenworthy via use-livecode <
[hidden email]> wrote:

>
> Geoff wrote:
>
> > This is certainly not identical to the original method, but it's
> > close enough, and runs in a small fraction of a second.
>
> Good approach! Optimized code rules.
>
> Best wishes,
>
> Curry K.
>
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode