Intersecting data question/challenge

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Intersecting data question/challenge

see3d
Hello clever Rev programmers,

I have a simple question or maybe it is a simple challenge.

I have two lists of integers.  The list are not long, perhaps 5 to 50  
items e.g.:
list1="310,423,522,211,107,340,"
lis2="311,312,313,318,320,323,325,330,333,337,340,"

I want to find if any of the items in list1 have a match in list2.
I know I could do it with a repeat, something like this:

get false
repeat for each item theItem in list1
   if theItem is not among the items of list2 then next repeat
   get true
   exit repeat
end repeat

I was wondering if it could be done faster without a repeat through  
some Rev trick.

Dennis
_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Eric Chatonet
Hi Dennis,

You astonished me!

Are you not the guy who is fascinated by arrays?
And, may be, by the intersect command?

Le 8 juil. 05 à 19:12, Dennis Brown a écrit :

> Hello clever Rev programmers,
>
> I have a simple question or maybe it is a simple challenge.
>
> I have two lists of integers.  The list are not long, perhaps 5 to  
> 50 items e.g.:
> list1="310,423,522,211,107,340,"
> lis2="311,312,313,318,320,323,325,330,333,337,340,"
>
> I want to find if any of the items in list1 have a match in list2.
> I know I could do it with a repeat, something like this:
>
> get false
> repeat for each item theItem in list1
>   if theItem is not among the items of list2 then next repeat
>   get true
>   exit repeat
> end repeat
>
> I was wondering if it could be done faster without a repeat through  
> some Rev trick.

Best Regards from Paris,

Eric Chatonet.
----------------------------------------------------------------
So Smart Software

For institutions, companies and associations
Built-to-order applications: management, multimedia, internet, etc.
Windows, Mac OS and Linux... With the French touch

Free plugins and tutorials on my website
----------------------------------------------------------------
Web site        http://www.sosmartsoftware.com/
Email        [hidden email]/
Phone        33 (0)1 43 31 77 62
Mobile        33 (0)6 20 74 50 86
----------------------------------------------------------------

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

jbv-2
In reply to this post by see3d


Dennis,

Use arrays; Something like :

put "" into myT1
put "" into myT2
repeat for each item i in list1
    put 1 into myT1[i]
end repeat
repeat for each item i in list2
    put 1 into myT2[i]
end repeat

get the keys of myT1
repeat for each line j in it
    if myT2[j]=1 then
        get true
        exit repeat
    end if
end repeat

JB

> Hello clever Rev programmers,
>
> I have a simple question or maybe it is a simple challenge.
>
> I have two lists of integers.  The list are not long, perhaps 5 to 50
> items e.g.:
> list1="310,423,522,211,107,340,"
> lis2="311,312,313,318,320,323,325,330,333,337,340,"
>
> I want to find if any of the items in list1 have a match in list2.
> I know I could do it with a repeat, something like this:
>
> get false
> repeat for each item theItem in list1
>    if theItem is not among the items of list2 then next repeat
>    get true
>    exit repeat
> end repeat
>
> I was wondering if it could be done faster without a repeat through
> some Rev trick.
>
> Dennis
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

see3d
In reply to this post by Eric Chatonet
Eric,

Yes, I looked at the intersect command, but it performs the action on  
the keys not the data from a list.  I would have to create an array  
element for each integer in the list with the integer as the key.  
Sounded like two loops that would run even slower than my example:

repeat for each item theItem in list1
   put empty into myArray1[theItem]
end repeat
repeat for each item theItem in list2
   put empty into myArray2[theItem]
end repeat
intersect myArray1 with myArray2
if the keys of myArray1 is empty then get false else get true

In actual practice, the above example runs 3 times slower than the  
below example for the sample data shown.

However, knowing that Rev had such a command for the keys, I thought  
perhaps someone knew of a more clever way to use it, or maybe there  
was another way to intersect data.

Dennis

On Jul 8, 2005, at 1:41 PM, Eric Chatonet wrote:

> Hi Dennis,
>
> You astonished me!
>
> Are you not the guy who is fascinated by arrays?
> And, may be, by the intersect command?
>
> Le 8 juil. 05 à 19:12, Dennis Brown a écrit :
>
>
>> Hello clever Rev programmers,
>>
>> I have a simple question or maybe it is a simple challenge.
>>
>> I have two lists of integers.  The list are not long, perhaps 5 to  
>> 50 items e.g.:
>> list1="310,423,522,211,107,340,"
>> list2="311,312,313,318,320,323,325,330,333,337,340,"
>>
>> I want to find if any of the items in list1 have a match in list2.
>> I know I could do it with a repeat, something like this:
>>
>> get false
>> repeat for each item theItem in list1
>>   if theItem is not among the items of list2 then next repeat
>>   get true
>>   exit repeat
>> end repeat
>>
>> I was wondering if it could be done faster without a repeat  
>> through some Rev trick.
>>
>
> Best Regards from Paris,
>
> Eric Chatonet.
> ----------------------------------------------------------------
> So Smart Software
>
> For institutions, companies and associations
> Built-to-order applications: management, multimedia, internet, etc.
> Windows, Mac OS and Linux... With the French touch
>
> Free plugins and tutorials on my website
> ----------------------------------------------------------------
> Web site        http://www.sosmartsoftware.com/
> Email        [hidden email]/
> Phone        33 (0)1 43 31 77 62
> Mobile        33 (0)6 20 74 50 86
> ----------------------------------------------------------------
>
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Eric Chatonet
Hello Dennis,

You are right.
I was a little bit fast.
At least more than 2 repeat loops ;-)

Le 8 juil. 05 à 20:50, Dennis Brown a écrit :

> Yes, I looked at the intersect command, but it performs the action  
> on the keys not the data from a list.  I would have to create an  
> array element for each integer in the list with the integer as the  
> key.  Sounded like two loops that would run even slower than my  
> example:
>
> repeat for each item theItem in list1
>   put empty into myArray1[theItem]
> end repeat
> repeat for each item theItem in list2
>   put empty into myArray2[theItem]
> end repeat
> intersect myArray1 with myArray2
> if the keys of myArray1 is empty then get false else get true
>
> In actual practice, the above example runs 3 times slower than the  
> below example for the sample data shown.
>
> However, knowing that Rev had such a command for the keys, I  
> thought perhaps someone knew of a more clever way to use it, or  
> maybe there was another way to intersect data.

Best Regards from Paris,

Eric Chatonet.
----------------------------------------------------------------
So Smart Software

For institutions, companies and associations
Built-to-order applications: management, multimedia, internet, etc.
Windows, Mac OS and Linux... With the French touch

Free plugins and tutorials on my website
----------------------------------------------------------------
Web site        http://www.sosmartsoftware.com/
Email        [hidden email]/
Phone        33 (0)1 43 31 77 62
Mobile        33 (0)6 20 74 50 86
----------------------------------------------------------------

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

see3d
In reply to this post by jbv-2
JB,

It will work, but not a speed improvement.
It will run about 8 times slower than my original example.
It takes Rev a lot of work (time) to create an array element.

Dennis

On Jul 8, 2005, at 2:50 PM, jbv wrote:

>
>
> Dennis,
>
> Use arrays; Something like :
>
> put "" into myT1
> put "" into myT2
> repeat for each item i in list1
>     put 1 into myT1[i]
> end repeat
> repeat for each item i in list2
>     put 1 into myT2[i]
> end repeat
>
> get the keys of myT1
> repeat for each line j in it
>     if myT2[j]=1 then
>         get true
>         exit repeat
>     end if
> end repeat
>
> JB
>
>
>> Hello clever Rev programmers,
>>
>> I have a simple question or maybe it is a simple challenge.
>>
>> I have two lists of integers.  The list are not long, perhaps 5 to 50
>> items e.g.:
>> list1="310,423,522,211,107,340,"
>> lis2="311,312,313,318,320,323,325,330,333,337,340,"
>>
>> I want to find if any of the items in list1 have a match in list2.
>> I know I could do it with a repeat, something like this:
>>
>> get false
>> repeat for each item theItem in list1
>>    if theItem is not among the items of list2 then next repeat
>>    get true
>>    exit repeat
>> end repeat
>>
>> I was wondering if it could be done faster without a repeat through
>> some Rev trick.
>>
>> Dennis
>> _______________________________________________
>> use-revolution mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your  
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Chris Sheffield
In reply to this post by see3d
Could you make use of the split command somehow?  You would have to  
format your lists a little differently, but if you did it right and  
specified a primary and secondary delimiter, you might be able to get  
quick results and still take advantage of the intersect command.  
Anyway, just another idea.

Chris Sheffield


On Jul 8, 2005, at 12:50 PM, Dennis Brown wrote:

> Eric,
>
> Yes, I looked at the intersect command, but it performs the action  
> on the keys not the data from a list.  I would have to create an  
> array element for each integer in the list with the integer as the  
> key.  Sounded like two loops that would run even slower than my  
> example:
>
> repeat for each item theItem in list1
>   put empty into myArray1[theItem]
> end repeat
> repeat for each item theItem in list2
>   put empty into myArray2[theItem]
> end repeat
> intersect myArray1 with myArray2
> if the keys of myArray1 is empty then get false else get true
>
> In actual practice, the above example runs 3 times slower than the  
> below example for the sample data shown.
>
> However, knowing that Rev had such a command for the keys, I  
> thought perhaps someone knew of a more clever way to use it, or  
> maybe there was another way to intersect data.
>
> Dennis

------------------------------------------
Chris Sheffield
Read Naturally
The Fluency Company
http://www.readnaturally.com
------------------------------------------


_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

jbv-2
In reply to this post by see3d


Dennis,

Using arrays will always be a speed improvement.

My script creates arrays from your lists of items with loops
just for the purpose of the demonstration. As someone else
suggested, you can use "split" to create your arrays, or better :
drop your lists of items and configure your data as arrays
from the beginning.

Just for the anecdote, I have a cgi script that makes extensive
use of arrays (and only arrays), and it builds 16 pages pdf
files of about 1.4 Mb in roughly 0.1 second, while the previous
version (using items lists) took 3 to 4 seconds...

JB

> JB,
>
> It will work, but not a speed improvement.
> It will run about 8 times slower than my original example.
> It takes Rev a lot of work (time) to create an array element.
>
> Dennis
>
> On Jul 8, 2005, at 2:50 PM, jbv wrote:
>
> >
> >
> > Dennis,
> >
> > Use arrays; Something like :
> >
> > put "" into myT1
> > put "" into myT2
> > repeat for each item i in list1
> >     put 1 into myT1[i]
> > end repeat
> > repeat for each item i in list2
> >     put 1 into myT2[i]
> > end repeat
> >
> > get the keys of myT1
> > repeat for each line j in it
> >     if myT2[j]=1 then
> >         get true
> >         exit repeat
> >     end if
> > end repeat
> >
> > JB
> >
> >
> >> Hello clever Rev programmers,
> >>
> >> I have a simple question or maybe it is a simple challenge.
> >>
> >> I have two lists of integers.  The list are not long, perhaps 5 to 50
> >> items e.g.:
> >> list1="310,423,522,211,107,340,"
> >> lis2="311,312,313,318,320,323,325,330,333,337,340,"
> >>
> >> I want to find if any of the items in list1 have a match in list2.
> >> I know I could do it with a repeat, something like this:
> >>
> >> get false
> >> repeat for each item theItem in list1
> >>    if theItem is not among the items of list2 then next repeat
> >>    get true
> >>    exit repeat
> >> end repeat
> >>
> >> I was wondering if it could be done faster without a repeat through
> >> some Rev trick.
> >>
> >> Dennis
> >> _______________________________________________
> >> use-revolution mailing list
> >> [hidden email]
> >> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >> http://lists.runrev.com/mailman/listinfo/use-revolution
> >>
> >
> > _______________________________________________
> > use-revolution mailing list
> > [hidden email]
> > Please visit this url to subscribe, unsubscribe and manage your
> > subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-revolution
> >
>
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Alex Tweedly
In reply to this post by see3d
Dennis Brown wrote:

> Hello clever Rev programmers,
>
> I have a simple question or maybe it is a simple challenge.
>
> I have two lists of integers.  The list are not long, perhaps 5 to 50  
> items e.g.:
> list1="310,423,522,211,107,340,"
> lis2="311,312,313,318,320,323,325,330,333,337,340,"
>
Are the items in each list known to be unique or not ?
i.e. could I have   list1 = "310,423,310" ?

> I want to find if any of the items in list1 have a match in list2.
> I know I could do it with a repeat, something like this:
>
> get false
> repeat for each item theItem in list1
>   if theItem is not among the items of list2 then next repeat
>   get true
>   exit repeat
> end repeat
>
> I was wondering if it could be done faster without a repeat through  
> some Rev trick.
>
For data samples that small, I doubt if there will be anything faster.

For data large enough to overcome the cost of some set-up time (say
upwards of 20,000 items in each set), you might get faster with either
an array/intersect based scheme, or simply by sorting and stepwise
comparing each one. The array method is especially appealing if there
can be repeated entries.

For *large* data sets, you might  be best with binary-search comparing
of sorted lists - especially if you can arrange things such that the
lists are sorted ahead of time.

And for 5 to 50 items - who cares : the simple solution doesn't take
long enough to warrant any time spent optimizing it :-)

--
Alex Tweedly       http://www.tweedly.net



--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.8.10/43 - Release Date: 06/07/2005

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Alex Tweedly
In reply to this post by Chris Sheffield
Chris Sheffield wrote:

> Could you make use of the split command somehow?  You would have to  
> format your lists a little differently, but if you did it right and  
> specified a primary and secondary delimiter, you might be able to get  
> quick results and still take advantage of the intersect command.  
> Anyway, just another idea.
>
I tried that out; it's very similar in timing to the other array method
(that includes the time to reformat the arrays into something that can
be "split" as you need it - if the wider context could be changed, that
could change).

JB said:

> put "" into myT1
> put "" into myT2
> repeat for each item i in list1
>     put 1 into myT1[i]
> end repeat
> repeat for each item i in list2
>     put 1 into myT2[i]
> end repeat
>
> get the keys of myT1
> repeat for each line j in it
>     if myT2[j]=1 then
>         get true
>         exit repeat
>     end if
> end repeat

If all you do with myT1 is take the keys of it, you don't need to create
that array - you can put list2 into an array (either element by element,
or using split), and the do a (fast)   repeat for each item of list1

Might be worth it if list2 is large enough ...


--
Alex Tweedly       http://www.tweedly.net



--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.8.10/43 - Release Date: 06/07/2005

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

see3d
In reply to this post by jbv-2
JB,

You are right in that if I already had the lists setup as keys in an  
array it could run faster.  In that case your example below with a  
repeat loop runs about the same speed as my original example with a  
repeat loop.  However, using the intersect command instead of a  
repeat loop would run 4-8 times as fast as either loop version and is  
fast enough to use for my intended purpose as a direct test to see  
which blocks of code to execute inside a large loop.

Unfortunately, an array is not an easy thing to use as a constant to  
check against.  I would have to pre-build all my arrays beforehand.  
If I have to do that, I might just as well build an execution test  
matrix using the simple lists as parameters.  I just wanted to avoid  
one more level of indirection for the sake of speed (I have so many  
already).  However, I think I understand some new tricks now with  
your help and from Eric, Alex, and Chris.

Thanks,
Dennis

On Jul 8, 2005, at 4:27 PM, jbv wrote:

>
>
> Dennis,
>
> Using arrays will always be a speed improvement.
>
> My script creates arrays from your lists of items with loops
> just for the purpose of the demonstration. As someone else
> suggested, you can use "split" to create your arrays, or better :
> drop your lists of items and configure your data as arrays
> from the beginning.
>
> Just for the anecdote, I have a cgi script that makes extensive
> use of arrays (and only arrays), and it builds 16 pages pdf
> files of about 1.4 Mb in roughly 0.1 second, while the previous
> version (using items lists) took 3 to 4 seconds...
>
> JB
>
>
>> JB,
>>
>> It will work, but not a speed improvement.
>> It will run about 8 times slower than my original example.
>> It takes Rev a lot of work (time) to create an array element.
>>
>> Dennis
>>
>> On Jul 8, 2005, at 2:50 PM, jbv wrote:
>>
>>
>>>
>>>
>>> Dennis,
>>>
>>> Use arrays; Something like :
>>>
>>> put "" into myT1
>>> put "" into myT2
>>> repeat for each item i in list1
>>>     put 1 into myT1[i]
>>> end repeat
>>> repeat for each item i in list2
>>>     put 1 into myT2[i]
>>> end repeat
>>>
>>> get the keys of myT1
>>> repeat for each line j in it
>>>     if myT2[j]=1 then
>>>         get true
>>>         exit repeat
>>>     end if
>>> end repeat
>>>
>>> JB
>>>
>>>
>>>
>>>> Hello clever Rev programmers,
>>>>
>>>> I have a simple question or maybe it is a simple challenge.
>>>>
>>>> I have two lists of integers.  The list are not long, perhaps 5  
>>>> to 50
>>>> items e.g.:
>>>> list1="310,423,522,211,107,340,"
>>>> lis2="311,312,313,318,320,323,325,330,333,337,340,"
>>>>
>>>> I want to find if any of the items in list1 have a match in list2.
>>>> I know I could do it with a repeat, something like this:
>>>>
>>>> get false
>>>> repeat for each item theItem in list1
>>>>    if theItem is not among the items of list2 then next repeat
>>>>    get true
>>>>    exit repeat
>>>> end repeat
>>>>
>>>> I was wondering if it could be done faster without a repeat through
>>>> some Rev trick.
>>>>
>>>> Dennis
>>>> _______________________________________________
>>>> use-revolution mailing list
>>>> [hidden email]
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>>
>>>>
>>>
>>> _______________________________________________
>>> use-revolution mailing list
>>> [hidden email]
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>
>>>
>>
>> _______________________________________________
>> use-revolution mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your  
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Jon-3
In reply to this post by Chris Sheffield
How about loading a string with the numbers from one list, represented
as strings (1 ==> "001", etc) appended and separated by spaces or commas.

Then run through the second number list searching for each number in the
above string?

Hugely clunky, due to Rev's sloth, but it might be faster.

I *@(*%# hate it when one has to jump through these kinds of hoops to
make something work fast enough.  I have better things to do with my
time.  Sigh.

:)

Jon


Chris Sheffield wrote:

> Could you make use of the split command somehow?  You would have to  
> format your lists a little differently, but if you did it right and  
> specified a primary and secondary delimiter, you might be able to get  
> quick results and still take advantage of the intersect command.  
> Anyway, just another idea.
>
> Chris Sheffield
>
>
> On Jul 8, 2005, at 12:50 PM, Dennis Brown wrote:
>
>> Eric,
>>
>> Yes, I looked at the intersect command, but it performs the action  
>> on the keys not the data from a list.  I would have to create an  
>> array element for each integer in the list with the integer as the  
>> key.  Sounded like two loops that would run even slower than my  
>> example:
>>
>> repeat for each item theItem in list1
>>   put empty into myArray1[theItem]
>> end repeat
>> repeat for each item theItem in list2
>>   put empty into myArray2[theItem]
>> end repeat
>> intersect myArray1 with myArray2
>> if the keys of myArray1 is empty then get false else get true
>>
>> In actual practice, the above example runs 3 times slower than the  
>> below example for the sample data shown.
>>
>> However, knowing that Rev had such a command for the keys, I  thought
>> perhaps someone knew of a more clever way to use it, or  maybe there
>> was another way to intersect data.
>>
>> Dennis
>
>
> ------------------------------------------
> Chris Sheffield
> Read Naturally
> The Fluency Company
> http://www.readnaturally.com
> ------------------------------------------
>
>
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
>
_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Raymond E. Griffith
In reply to this post by see3d
Dennis,

I have a suggestion. It isn't perfect, but it does appear to be relatively
fast.

I start by creating return-delimited lists. The lists have 5000 elements in
them and 2000 elements in them, respectively, although due to repeats the
customkeys are significantly less.

One problem is that you cannot set the keys of a variable directly. You can,
however, set the customkeys of an object directly, then put those
customproperties into a variable.

Then use intersect.

As I said, this appears to me to be relatively fast.

on mouseUp
  put 5000 into n1
  put 2000 into n2
  repeat with i = 1 to n1
    put random(10000) & cr after A
  end repeat
  repeat with i = 1 to n2
    put random(10000) & cr after B
  end repeat
  put the long milliseconds into ms
  set the customkeys of fld "LA" to A
  set the customkeys of fld "LB" to B
  put the customproperties of fld "LA" into arrA
  put the customproperties of fld "LB" into arrB
  intersect arrA with arrB
  answer keys(arrA) & return & "___" & the long milliseconds - ms
end mouseUp

Perhaps someone can try comparing this idea with others for time trials?

I hope this helps.

Raymond E. Griffith


> JB,
>
> You are right in that if I already had the lists setup as keys in an
> array it could run faster.  In that case your example below with a
> repeat loop runs about the same speed as my original example with a
> repeat loop.  However, using the intersect command instead of a
> repeat loop would run 4-8 times as fast as either loop version and is
> fast enough to use for my intended purpose as a direct test to see
> which blocks of code to execute inside a large loop.
>
> Unfortunately, an array is not an easy thing to use as a constant to
> check against.  I would have to pre-build all my arrays beforehand.
> If I have to do that, I might just as well build an execution test
> matrix using the simple lists as parameters.  I just wanted to avoid
> one more level of indirection for the sake of speed (I have so many
> already).  However, I think I understand some new tricks now with
> your help and from Eric, Alex, and Chris.
>
> Thanks,
> Dennis
>
> On Jul 8, 2005, at 4:27 PM, jbv wrote:
>
>>
>>
>> Dennis,
>>
>> Using arrays will always be a speed improvement.
>>
>> My script creates arrays from your lists of items with loops
>> just for the purpose of the demonstration. As someone else
>> suggested, you can use "split" to create your arrays, or better :
>> drop your lists of items and configure your data as arrays
>> from the beginning.
>>
>> Just for the anecdote, I have a cgi script that makes extensive
>> use of arrays (and only arrays), and it builds 16 pages pdf
>> files of about 1.4 Mb in roughly 0.1 second, while the previous
>> version (using items lists) took 3 to 4 seconds...
>>
>> JB
>>
>>
>>> JB,
>>>
>>> It will work, but not a speed improvement.
>>> It will run about 8 times slower than my original example.
>>> It takes Rev a lot of work (time) to create an array element.
>>>
>>> Dennis
>>>
>>> On Jul 8, 2005, at 2:50 PM, jbv wrote:
>>>
>>>
>>>>
>>>>
>>>> Dennis,
>>>>
>>>> Use arrays; Something like :
>>>>
>>>> put "" into myT1
>>>> put "" into myT2
>>>> repeat for each item i in list1
>>>>     put 1 into myT1[i]
>>>> end repeat
>>>> repeat for each item i in list2
>>>>     put 1 into myT2[i]
>>>> end repeat
>>>>
>>>> get the keys of myT1
>>>> repeat for each line j in it
>>>>     if myT2[j]=1 then
>>>>         get true
>>>>         exit repeat
>>>>     end if
>>>> end repeat
>>>>
>>>> JB
>>>>
>>>>
>>>>
>>>>> Hello clever Rev programmers,
>>>>>
>>>>> I have a simple question or maybe it is a simple challenge.
>>>>>
>>>>> I have two lists of integers.  The list are not long, perhaps 5
>>>>> to 50
>>>>> items e.g.:
>>>>> list1="310,423,522,211,107,340,"
>>>>> lis2="311,312,313,318,320,323,325,330,333,337,340,"
>>>>>
>>>>> I want to find if any of the items in list1 have a match in list2.
>>>>> I know I could do it with a repeat, something like this:
>>>>>
>>>>> get false
>>>>> repeat for each item theItem in list1
>>>>>    if theItem is not among the items of list2 then next repeat
>>>>>    get true
>>>>>    exit repeat
>>>>> end repeat
>>>>>
>>>>> I was wondering if it could be done faster without a repeat through
>>>>> some Rev trick.
>>>>>
>>>>> Dennis
>>>>> _______________________________________________
>>>>> use-revolution mailing list
>>>>> [hidden email]
>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>> subscription preferences:
>>>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> use-revolution mailing list
>>>> [hidden email]
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>>
>>>>
>>>
>>> _______________________________________________
>>> use-revolution mailing list
>>> [hidden email]
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>
>>
>> _______________________________________________
>> use-revolution mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution


_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

see3d
Raymond,

Good idea.  It does get around the need to iterate to get the keys.  
Unfortunately, that operation seems to be very slow in Rev.  If I use  
the data from the previous tests, and do everything starting with the  
lists, is is 10 times slower than my first example.  If I save the  
constant array first, it is 5 times slower.  If I save both arrays  
and only get the get the customProperties to the array in the timing  
loop, it is still twice as slow, which is 17 times slower than the  
fastest way.

So you met the challenge of no loops --good job.  But the Rev setting  
customKeys and getting customProperties seems to be much slower than  
any other operations tested.  They must be using the crawl method for  
those operations.

Dennis

On Jul 8, 2005, at 8:26 PM, Raymond E. Griffith wrote:

> on mouseUp
>   put 5000 into n1
>   put 2000 into n2
>   repeat with i = 1 to n1
>     put random(10000) & cr after A
>   end repeat
>   repeat with i = 1 to n2
>     put random(10000) & cr after B
>   end repeat
>   put the long milliseconds into ms
>   set the customkeys of fld "LA" to A
>   set the customkeys of fld "LB" to B
>   put the customproperties of fld "LA" into arrA
>   put the customproperties of fld "LB" into arrB
>   intersect arrA with arrB
>   answer keys(arrA) & return & "___" & the long milliseconds - ms
> end mouseUp
>

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

see3d
In reply to this post by Jon-3
Jon,

Unless I am not understanding your suggestion, that is the method  
used to start off this thread.

Dennis

On Jul 8, 2005, at 6:49 PM, Jon wrote:

> How about loading a string with the numbers from one list,  
> represented as strings (1 ==> "001", etc) appended and separated by  
> spaces or commas.
>
> Then run through the second number list searching for each number  
> in the above string?
>
> Hugely clunky, due to Rev's sloth, but it might be faster.
>
> I *@(*%# hate it when one has to jump through these kinds of hoops  
> to make something work fast enough.  I have better things to do  
> with my time.  Sigh.
>
> :)
>
> Jon
>
>
> Chris Sheffield wrote:
>
>
>> Could you make use of the split command somehow?  You would have  
>> to  format your lists a little differently, but if you did it  
>> right and  specified a primary and secondary delimiter, you might  
>> be able to get  quick results and still take advantage of the  
>> intersect command.   Anyway, just another idea.
>>
>> Chris Sheffield
>>
>>
>> On Jul 8, 2005, at 12:50 PM, Dennis Brown wrote:
>>
>>
>>> Eric,
>>>
>>> Yes, I looked at the intersect command, but it performs the  
>>> action  on the keys not the data from a list.  I would have to  
>>> create an  array element for each integer in the list with the  
>>> integer as the  key.  Sounded like two loops that would run even  
>>> slower than my  example:
>>>
>>> repeat for each item theItem in list1
>>>   put empty into myArray1[theItem]
>>> end repeat
>>> repeat for each item theItem in list2
>>>   put empty into myArray2[theItem]
>>> end repeat
>>> intersect myArray1 with myArray2
>>> if the keys of myArray1 is empty then get false else get true
>>>
>>> In actual practice, the above example runs 3 times slower than  
>>> the  below example for the sample data shown.
>>>
>>> However, knowing that Rev had such a command for the keys, I  
>>> thought perhaps someone knew of a more clever way to use it, or  
>>> maybe there was another way to intersect data.
>>>
>>> Dennis
>>>
>>
>>
>> ------------------------------------------
>> Chris Sheffield
>> Read Naturally
>> The Fluency Company
>> http://www.readnaturally.com
>> ------------------------------------------
>>
>>
>> _______________________________________________
>> use-revolution mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your  
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>>
>>
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Jon-3
Whoops!  Sorry!

:)


Dennis Brown wrote:

> Jon,
>
> Unless I am not understanding your suggestion, that is the method  
> used to start off this thread.
>
> Dennis
>
> On Jul 8, 2005, at 6:49 PM, Jon wrote:
>
>> How about loading a string with the numbers from one list,  
>> represented as strings (1 ==> "001", etc) appended and separated by  
>> spaces or commas.
>>
>> Then run through the second number list searching for each number  in
>> the above string?
>>
>> Hugely clunky, due to Rev's sloth, but it might be faster.
>>
>> I *@(*%# hate it when one has to jump through these kinds of hoops  
>> to make something work fast enough.  I have better things to do  with
>> my time.  Sigh.
>>
>> :)
>>
>> Jon
>>
>>
>> Chris Sheffield wrote:
>>
>>
>>> Could you make use of the split command somehow?  You would have  
>>> to  format your lists a little differently, but if you did it  right
>>> and  specified a primary and secondary delimiter, you might  be able
>>> to get  quick results and still take advantage of the  intersect
>>> command.   Anyway, just another idea.
>>>
>>> Chris Sheffield
>>>
>>>
>>> On Jul 8, 2005, at 12:50 PM, Dennis Brown wrote:
>>>
>>>
>>>> Eric,
>>>>
>>>> Yes, I looked at the intersect command, but it performs the  
>>>> action  on the keys not the data from a list.  I would have to  
>>>> create an  array element for each integer in the list with the  
>>>> integer as the  key.  Sounded like two loops that would run even  
>>>> slower than my  example:
>>>>
>>>> repeat for each item theItem in list1
>>>>   put empty into myArray1[theItem]
>>>> end repeat
>>>> repeat for each item theItem in list2
>>>>   put empty into myArray2[theItem]
>>>> end repeat
>>>> intersect myArray1 with myArray2
>>>> if the keys of myArray1 is empty then get false else get true
>>>>
>>>> In actual practice, the above example runs 3 times slower than  
>>>> the  below example for the sample data shown.
>>>>
>>>> However, knowing that Rev had such a command for the keys, I  
>>>> thought perhaps someone knew of a more clever way to use it, or  
>>>> maybe there was another way to intersect data.
>>>>
>>>> Dennis
>>>>
>>>
>>>
>>> ------------------------------------------
>>> Chris Sheffield
>>> Read Naturally
>>> The Fluency Company
>>> http://www.readnaturally.com
>>> ------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> use-revolution mailing list
>>> [hidden email]
>>> Please visit this url to subscribe, unsubscribe and manage your  
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>>
>>>
>>>
>> _______________________________________________
>> use-revolution mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your  
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
>
_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Wouter-12
In reply to this post by Raymond E. Griffith

On 09 Jul 2005, at 02:26, Raymond E. Griffith wrote:

> Dennis,
>
> I have a suggestion. It isn't perfect, but it does appear to be  
> relatively
> fast.
>
> I start by creating return-delimited lists. The lists have 5000  
> elements in
> them and 2000 elements in them, respectively, although due to  
> repeats the
> customkeys are significantly less.
>
> One problem is that you cannot set the keys of a variable directly.  
> You can,
> however, set the customkeys of an object directly, then put those
> customproperties into a variable.
>
> Then use intersect.
>
> As I said, this appears to me to be relatively fast.
>
> on mouseUp
>   put 5000 into n1
>   put 2000 into n2
>   repeat with i = 1 to n1
>     put random(10000) & cr after A
>   end repeat
>   repeat with i = 1 to n2
>     put random(10000) & cr after B
>   end repeat
>   put the long milliseconds into ms
>   set the customkeys of fld "LA" to A
>   set the customkeys of fld "LB" to B
>   put the customproperties of fld "LA" into arrA
>   put the customproperties of fld "LB" into arrB
>   intersect arrA with arrB
>   answer keys(arrA) & return & "___" & the long milliseconds - ms
> end mouseUp
>
> Perhaps someone can try comparing this idea with others for time  
> trials?
>
> I hope this helps.
>
> Raymond E. Griffith

> Raymond,
>
> Good idea.  It does get around the need to iterate to get the  
> keys.  Unfortunately, that operation seems to be very slow in Rev.  
> If I use the data from the previous tests, and do everything  
> starting with the lists, is is 10 times slower than my first  
> example.  If I save the constant array first, it is 5 times  
> slower.  If I save both arrays and only get the get the  
> customProperties to the array in the timing loop, it is still twice  
> as slow, which is 17 times slower than the fastest way.
>
> So you met the challenge of no loops --good job.  But the Rev  
> setting customKeys and getting customProperties seems to be much  
> slower than any other operations tested.  They must be using the  
> crawl method for those operations.
>
> Dennis




Hi Raymond, Dennis and everybody else,


The way proposed by Dennis is indeed the fastest on not too large  
amounts of data.
So it is only fair to test the other way around too and try Dennis  
proposal on the same amount of data on which Raymond used his handler
Raymond's handler is a neat trick.
Though it is at least 2 times slower than a replace + split  method.

I adapted Raymond's  method slightly to be able to produce a "one  
button copy-paste script" test for comparison :

on mouseUp
   ### filling the vars
   repeat 5000
     put random(10000) & cr after A
   end repeat
   repeat 2000
     put random(10000) & cr after B
   end repeat
   put A into x
   put B into y
   ### custom prop method
   put the long millisecs into zap
   set the customkeys of me to A
   put the customproperties of me into arrA
   set the customkeys of me to B
   put the customproperties of me into arrB
   intersect arrA with arrB
   put keys(arrA) into tKeys1
   put the long millisecs - zap into time1
   set the customkeys of me to ""
   ### replace split method
   put the long millisecs into zap
   replace cr with tab & cr in A
   split A with cr and tab
   replace cr with tab & cr in B
   split B with cr and tab
   intersect A with B
   put keys(A) into tKeys2
   put the long millisecs - zap into time2
   ### repeat for each + is not among method
   replace cr with comma in x
   replace cr with comma in y
   put the long millisecs into zap
   repeat for each item i in x
     if i is not among the items of y then put i & cr after tList
   end repeat
   put the long millisecs - zap into time3
   put time1 &cr& time2 & cr & time3
end mouseUp


Greetings,
Wouter

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Wouter-12
As the use of the array method takes out double data in each of the  
data sets, it is also fair to add this to the repeat for each method.
So here is a little change for the test handler.

Test with different amounts in the data sets and see when the  
differences set in.


on mouseUp
   ### filling the data sets
  ### change by hand or by use a scrollbar
   repeat 500  --round(thumbpos of sb "A")
     put random(10000) & cr after A
   end repeat
   repeat 200   --round(thumbpos of sb "B")
     put random(10000) & cr after B
   end repeat
   put 0 into time1
   put 0 into time2
   put 0 into time3
   repeat 10
     put A into x
     put B into y
     ### custom prop method
     put the long seconds into zap
     set the customkeys of me to x
     put the customproperties of me into arrA
     set the customkeys of me to y
     put the customproperties of me into arrB
     intersect arrA with arrB
     put keys(arrA) into tKeys1
     add the long seconds - zap to time1
     set the customkeys of me to ""
     ### replace split method
     put the long seconds into zap
     replace cr with tab & cr in x
     split x with cr and tab
     replace cr with tab & cr in y
     split y with cr and tab
     intersect x with y
     put keys(y) into tKeys2
     add the long seconds - zap to time2
     ### repeat for each + is not among method
     put A into x
     put B into y
     replace cr with comma in x
     replace cr with comma in y
     put "" into tList
     put the long seconds into zap
     repeat for each item i in x
       if i is not among the items of y and i is not among the lines  
of tList then put i & cr after tList
     end repeat
     add the long seconds - zap to time3
   end repeat
   put time1 &cr& time2 & cr & time3
end mouseUp


Greetings,
Wouter



_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

see3d
In reply to this post by Wouter-12

On Jul 8, 2005, at 10:57 PM, Buster wrote:

> Hi Raymond, Dennis and everybody else,
>
>
> The way proposed by Dennis is indeed the fastest on not too large  
> amounts of data.
> So it is only fair to test the other way around too and try Dennis  
> proposal on the same amount of data on which Raymond used his handler
> Raymond's handler is a neat trick.
> Though it is at least 2 times slower than a replace + split  method.
>
> Greetings,
> Wouter

Wouter,

Nice script for comparing the different methods.  However, you have  
changed the original problem from a true/false test if there was a  
match of any items in two arrays to returning an intersect of the  
data.  That is Ok and a useful operation to understand, but it does  
skew the results in a different way.  If I apply the original  
criteria, the repeat for each completes as soon as it finds the first  
match.  With the random data in your example, that usually happens in  
much less time than than the other two methods.

Dennis
_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Intersecting data question/challenge

Dick Kriesel
In reply to this post by Wouter-12
On 7/9/05 2:49 AM, "Buster" <[hidden email]> wrote:

> on mouseUp
>    ### filling the data sets
>   ### change by hand or by use a scrollbar
>    repeat 500  --round(thumbpos of sb "A")
>      put random(10000) & cr after A
>    end repeat
>    repeat 200   --round(thumbpos of sb "B")
>      put random(10000) & cr after B
>    end repeat
>    put 0 into time1
>    put 0 into time2
>    put 0 into time3
>    repeat 10
>      put A into x
>      put B into y
>      ### custom prop method
>      put the long seconds into zap
>      set the customkeys of me to x
>      put the customproperties of me into arrA
>      set the customkeys of me to y
>      put the customproperties of me into arrB
>      intersect arrA with arrB
>      put keys(arrA) into tKeys1
>      add the long seconds - zap to time1
>      set the customkeys of me to ""
>      ### replace split method
>      put the long seconds into zap
>      replace cr with tab & cr in x
>      split x with cr and tab
>      replace cr with tab & cr in y
>      split y with cr and tab
>      intersect x with y
>      put keys(y) into tKeys2
>      add the long seconds - zap to time2
>      ### repeat for each + is not among method
>      put A into x
>      put B into y
>      replace cr with comma in x
>      replace cr with comma in y
>      put "" into tList
>      put the long seconds into zap
>      repeat for each item i in x
>        if i is not among the items of y and i is not among the lines
> of tList then put i & cr after tList
>      end repeat
>      add the long seconds - zap to time3
>    end repeat
>    put time1 &cr& time2 & cr & time3
> end mouseUp


The times reported for the second and third methods above are suspect,
because tKeys2 and tList do not match tKeys1.

>      put keys(y) into tKeys2
should be "put keys(x) into tKeys2"

>        if i is not among the items of y
should be "if i is among the items of y"

Also, the time for the second method is overstated, because the code
includes two unnecessary statements:
>      replace cr with tab & cr in x
>      replace cr with tab & cr in y

Despite the inaccuracies, the conclusions remain:
  the first method loses
  the second method wins for long lists
  the third method wins for short lists

Informal testing suggests that the third method wins when the shorter of the
two input lists has around fifty or more lines.  So a general, optimized
handler can choose the method by inspecting the input lists.

Further insights, anyone?

-- Dick


_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution