surprising filter benchmarks

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

surprising filter benchmarks

Richard Gaskin

I figured the filter command would carry at least some overhead for its
convenience, but I had no idea how much!

I wrote the test below to compare it with walking through a list line by
line, and the results were surprising:

on mouseUp
   put  fwdbCurTableData() into s -- gets 10,800 lines of
   --                                tab-delimited data
   --
   -- Method 1: filter command
   --
   put format("*a*\t*r*\tr\t*\t*\t*\t*\t*") into tFilter
   put s into result1
   put the millisecs into t
   filter result1 with tFilter
   put the millisecs - t into t1
   --
   --
   -- Method 2: repeat for each
   --
   set the itemdel to tab
   put the millisecs into t
   repeat for each line tLine in s
     if item 1 of tLine contains "a" \
         AND item 2 of tLine contains "r"\
         AND item 3 of tLine is "r" then
       put tLine&cr  after result2
     end if
   end repeat
   delete last char of result2
   put the millisecs - t into t2
   --
   put result1 into fld "result"
   put result2 into fld "result2"
   --
   put "Filter: "&t1 &cr& "Repeat: "&t2
end mouseUp



Results -
    Filter: 745
    Repeat: 40

Did I miss something, or am I just seeing the penalty for the filter
command's generalization?

--
  Richard Gaskin
  Managing Editor, revJournal
  _______________________________________________________
  Rev tips, tutorials and more: http://www.revJournal.com
_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: surprising filter benchmarks

Eric Chatonet
Hi Richard,

I think the speed depends on the filter complexity.
For instance:

on mouseUp
   repeat 100000
     if random (2) = 1 then put "zaz" & cr after tList
     else put "zbz" & cr after tList
   end repeat
   -----
   put the milliseconds into tStart1
   filter tList with "*a*"
   put the milliseconds - tStart1 into tResult1
   -----
   put the milliseconds into tStart2
   repeat for each line tLine in tList
     if "a" is in tList then put tLine & cr after tNewList
   end repeat
   delete char -1 of tNewList
   put the milliseconds - tStart2 into tResult2
   -----
   put "Filter: " && tResult1 & cr &"Repeat:" &&  tResult2
end mouseUp

Results -
    Filter: 41
    Repeat: 117

So may be we have to choose the right method according to the context.
Two cents that do not make life easier :-)

Le 12 juil. 05 à 22:26, Richard Gaskin a écrit :

> I figured the filter command would carry at least some overhead for  
> its convenience, but I had no idea how much!
>
> I wrote the test below to compare it with walking through a list  
> line by line, and the results were surprising:
>
> on mouseUp
>   put  fwdbCurTableData() into s -- gets 10,800 lines of
>   --                                tab-delimited data
>   --
>   -- Method 1: filter command
>   --
>   put format("*a*\t*r*\tr\t*\t*\t*\t*\t*") into tFilter
>   put s into result1
>   put the millisecs into t
>   filter result1 with tFilter
>   put the millisecs - t into t1
>   --
>   --
>   -- Method 2: repeat for each
>   --
>   set the itemdel to tab
>   put the millisecs into t
>   repeat for each line tLine in s
>     if item 1 of tLine contains "a" \
>         AND item 2 of tLine contains "r"\
>         AND item 3 of tLine is "r" then
>       put tLine&cr  after result2
>     end if
>   end repeat
>   delete last char of result2
>   put the millisecs - t into t2
>   --
>   put result1 into fld "result"
>   put result2 into fld "result2"
>   --
>   put "Filter: "&t1 &cr& "Repeat: "&t2
> end mouseUp
>
>
>
> Results -
>    Filter: 745
>    Repeat: 40
>
> Did I miss something, or am I just seeing the penalty for the  
> filter command's generalization?

Best Regards from Paris,

Eric Chatonet.
----------------------------------------------------------------
So Smart Software

For institutions, companies and associations
Built-to-order applications: management, multimedia, internet, etc.
Windows, Mac OS and Linux... With the French touch

Free plugins and tutorials on my website
----------------------------------------------------------------
Web site        http://www.sosmartsoftware.com/
Email        [hidden email]/
Phone        33 (0)1 43 31 77 62
Mobile        33 (0)6 20 74 50 86
----------------------------------------------------------------

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: surprising filter benchmarks

Richard Gaskin
Eric Chatonet wrote:

> Hi Richard,
>
> I think the speed depends on the filter complexity.
> For instance:
>
> on mouseUp
>   repeat 100000
>     if random (2) = 1 then put "zaz" & cr after tList
>     else put "zbz" & cr after tList
>   end repeat
>   -----
>   put the milliseconds into tStart1
>   filter tList with "*a*"
>   put the milliseconds - tStart1 into tResult1
>   -----
>   put the milliseconds into tStart2
>   repeat for each line tLine in tList
>     if "a" is in tList then put tLine & cr after tNewList
>   end repeat
>   delete char -1 of tNewList
>   put the milliseconds - tStart2 into tResult2
>   -----
>   put "Filter: " && tResult1 & cr &"Repeat:" &&  tResult2
> end mouseUp
>
> Results -
>    Filter: 41
>    Repeat: 117

To get cleaner results I think the second test's "is in tList" should be
"is in tLine", which also cuts execution time down dramatically.

But the central point remains:  with a small number of criteria the
filter command does a fine job compared to repeat loops, but for complex
criteria (in my app it's rare that we'll ever have fewer than three
distinct comparisons) "repeat for each" does well.

Another advantage of "repeat for each" is that it allows "or" in additon
to "and", which would require multiple passes with "filter", and makes
it easy to structure comparisons using parentheses to control the order
of precedence.

For the moment I'm sticking with the repeat loop for the situation I'm
currently using it in, but it's good to know that filter is quick for
simple searches.

--
  Richard Gaskin
  Fourth World Media Corporation
  ___________________________________________________________
  [hidden email]       http://www.FourthWorld.com


>
> So may be we have to choose the right method according to the context.
> Two cents that do not make life easier :-)
>
> Le 12 juil. 05 à 22:26, Richard Gaskin a écrit :
>
>> I figured the filter command would carry at least some overhead for  
>> its convenience, but I had no idea how much!
>>
>> I wrote the test below to compare it with walking through a list  line
>> by line, and the results were surprising:
>>
>> on mouseUp
>>   put  fwdbCurTableData() into s -- gets 10,800 lines of
>>   --                                tab-delimited data
>>   --
>>   -- Method 1: filter command
>>   --
>>   put format("*a*\t*r*\tr\t*\t*\t*\t*\t*") into tFilter
>>   put s into result1
>>   put the millisecs into t
>>   filter result1 with tFilter
>>   put the millisecs - t into t1
>>   --
>>   --
>>   -- Method 2: repeat for each
>>   --
>>   set the itemdel to tab
>>   put the millisecs into t
>>   repeat for each line tLine in s
>>     if item 1 of tLine contains "a" \
>>         AND item 2 of tLine contains "r"\
>>         AND item 3 of tLine is "r" then
>>       put tLine&cr  after result2
>>     end if
>>   end repeat
>>   delete last char of result2
>>   put the millisecs - t into t2
>>   --
>>   put result1 into fld "result"
>>   put result2 into fld "result2"
>>   --
>>   put "Filter: "&t1 &cr& "Repeat: "&t2
>> end mouseUp
>>
>>
>>
>> Results -
>>    Filter: 745
>>    Repeat: 40
>>
>> Did I miss something, or am I just seeing the penalty for the  filter
>> command's generalization?

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: surprising filter benchmarks

Alex Tweedly
In reply to this post by Richard Gaskin
Richard Gaskin wrote:

>
> I figured the filter command would carry at least some overhead for
> its convenience, but I had no idea how much!
>
> I wrote the test below to compare it with walking through a list line
> by line, and the results were surprising:
>
> on mouseUp
>   put  fwdbCurTableData() into s -- gets 10,800 lines of
>   --                                tab-delimited data
>   --
>   -- Method 1: filter command
>   --
>   put format("*a*\t*r*\tr\t*\t*\t*\t*\t*") into tFilter  
>   put s into result1

Richard, I'm too busy (lazy??) to create some data and test this for
timings right now, but this filter is more complex than you need - it
verifies that there are the right number of tabs. You could simplify it to
put format("*a*\t*r*\tr\t*") into tFilter

and should get the same results more quickly.

-- Alex.

--
Alex Tweedly       http://www.tweedly.net



--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.8.12/46 - Release Date: 11/07/2005

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: surprising filter benchmarks

Eric Chatonet
In reply to this post by Richard Gaskin
Hi Richard,

You are right, I made an error writing tList instead of tLine (then  
we gain about 25/30%).
But unfortunately it's not enough to say that a method or another is  
always the better one.
That would be good news...

Le 12 juil. 05 à 23:10, Richard Gaskin a écrit :

> Eric Chatonet wrote:
>
>> Hi Richard,
>> I think the speed depends on the filter complexity.
>> For instance:
>> on mouseUp
>>   repeat 100000
>>     if random (2) = 1 then put "zaz" & cr after tList
>>     else put "zbz" & cr after tList
>>   end repeat
>>   -----
>>   put the milliseconds into tStart1
>>   filter tList with "*a*"
>>   put the milliseconds - tStart1 into tResult1
>>   -----
>>   put the milliseconds into tStart2
>>   repeat for each line tLine in tList
>>     if "a" is in tList then put tLine & cr after tNewList
>>   end repeat
>>   delete char -1 of tNewList
>>   put the milliseconds - tStart2 into tResult2
>>   -----
>>   put "Filter: " && tResult1 & cr &"Repeat:" &&  tResult2
>> end mouseUp
>> Results -
>>    Filter: 41
>>    Repeat: 117
>>
>
> To get cleaner results I think the second test's "is in tList"  
> should be "is in tLine", which also cuts execution time down  
> dramatically.
>
> But the central point remains:  with a small number of criteria the  
> filter command does a fine job compared to repeat loops, but for  
> complex criteria (in my app it's rare that we'll ever have fewer  
> than three distinct comparisons) "repeat for each" does well.
>
> Another advantage of "repeat for each" is that it allows "or" in  
> additon to "and", which would require multiple passes with  
> "filter", and makes it easy to structure comparisons using  
> parentheses to control the order of precedence.
>
> For the moment I'm sticking with the repeat loop for the situation  
> I'm currently using it in, but it's good to know that filter is  
> quick for simple searches.
>

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: surprising filter benchmarks

Alex Tweedly
In reply to this post by Alex Tweedly
Alex Tweedly wrote:

>    You could simplify it to
> put format("*a*\t*r*\tr\t*") into tFilter
>
> and should get the same results more quickly.
>
No you can't - I'll go back to sleep now ....
(You need all that    \t*\t sequence  to ensure the  *a* is in the first
item - right ?)

--
Alex Tweedly       http://www.tweedly.net



--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.8.12/46 - Release Date: 11/07/2005

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: surprising filter benchmarks

Walton Sumner
In reply to this post by Richard Gaskin
The difference is quite a bit smaller if the loop checks the number of items,
as the filter is designed to do. Still usually a 3 to 6 fold difference, loop
being faster.

...
if item 1 of tLine contains "a" \
                 AND item 2 of tLine contains "r"\
                 AND item 3 of tLine is "r" \
        AND number of items of tLine is 8\  <--------- new
    then
...

Filter: 277
Repeat: 61

----------- You wrote -----------
From: Richard Gaskin <[hidden email]>
Subject: surprising filter benchmarks
To: How to use Revolution <[hidden email]>
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed


I figured the filter command would carry at least some overhead for its
convenience, but I had no idea how much!

I wrote the test below to compare it with walking through a list line by
line, and the results were surprising:

on mouseUp
   put  fwdbCurTableData() into s -- gets 10,800 lines of
   --                                tab-delimited data
   --
   -- Method 1: filter command
   --
   put format("*a*\t*r*\tr\t*\t*\t*\t*\t*") into tFilter      
   put s into result1
   put the millisecs into t
   filter result1 with tFilter
   put the millisecs - t into t1
   --
   --
   -- Method 2: repeat for each
   --  
   set the itemdel to tab
   put the millisecs into t
   repeat for each line tLine in s
     if item 1 of tLine contains "a" \
         AND item 2 of tLine contains "r"\
         AND item 3 of tLine is "r" then
       put tLine&cr  after result2
     end if
   end repeat
   delete last char of result2
   put the millisecs - t into t2
   --
   put result1 into fld "result"
   put result2 into fld "result2"
   --
   put "Filter: "&t1 &cr& "Repeat: "&t2
end mouseUp



Results -
    Filter: 745
    Repeat: 40

Did I miss something, or am I just seeing the penalty for the filter
command's generalization?
_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution