Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
Hi Geoff,

thank you for this beautiful script.

I modified it a bit to accept multi-character search string and also for case sensitivity.

It definitely is a lot faster for unicode text than anything I have seen.

-----------------------------
function offsetList D,S, pCase
   -- returns a comma-delimited list of the offsets of D in S
   -- pCase is a boolean for caseSensitive
   set the caseSensitive to pCase
   set the itemDel to D
   put the length of D into tDelimLength
   repeat for each item i in S
      add length(i) + tDelimLength to C
      put C - (tDelimLength - 1),"" after R
   end repeat
   set the itemDel to comma
   if char -1 of S is D then return char 1 to -2 of R
   put length(C) + 1 into lenC
   put length(R) into lenR
   if lenC = lenR then return 0
   return char 1 to lenR - lenC - 1 of R
end offsetList
------------------------------

Kind regards
Bernd





>
> Date: Thu, 1 Nov 2018 00:15:37 -0700
> From: Geoff Canyon
> To: How to use LiveCode <[hidden email]>
> Subject: Re: How to find the offset of the last instance of a
> repeating character in a string?
>
> I was curious if using the itemDelimiter might work for this, so I wrote
> the below code out of curiosity; but in my quick testing with single-byte
> characters it was only about 30% faster than the above methods, so I didn't
> bother to post it.
>
> But Ben Rubinstein just posted about a terrible slow-down doing pretty much
> this same thing for text with unicode characters. So I ran a simple test
> with 8000 character long strings that start with a single unicode
> character, this is about 15x faster than offset() with skip. For
> 100,000-character lines it's about 300x faster, so it seems to be immune to
> the line-painter issues skip is subject to. So for what it's worth:
>
> function offsetList D,S
>   -- returns a comma-delimited list of the offsets of D in S
>   set the itemDel to D
>   repeat for each item i in S
>      add length(i) + 1 to C
>      put C,"" after R
>   end repeat
>   set the itemDel to comma
>   if char -1 of S is D then return char 1 to -2 of R
>   put length(C) + 1 into lenC
>   put length(R) into lenR
>   if lenC = lenR then return 0
>   return char 1 to lenR - lenC - 1 of R
> end offsetList
>


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
Nice! I *just* finished creating a github repository for it, and adding
support for multi-char search strings, much as you did. I was coming to the
list to post the update when I saw your post.

Here's the GitHub link: https://github.com/gcanyon/offsetlist

Here's my updated version:

function offsetList D,S,pCase
   -- returns a comma-delimited list of the offsets of D in S
   set the caseSensitive to pCase is true
   set the itemDel to D
   put length(D) into dLength
   put 1 - dLength into C
   repeat for each item i in S
      add length(i) + dLength to C
      put C,"" after R
   end repeat
   set the itemDel to comma
   if char -dLength to -1 of S is D then return char 1 to -2 of R
   put length(C) + 1 into lenC
   put length(R) into lenR
   if lenC = lenR then return 0
   return char 1 to lenR - lenC - 1 of R
end offsetList

On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
[hidden email]> wrote:

> Hi Geoff,
>
> thank you for this beautiful script.
>
> I modified it a bit to accept multi-character search string and also for
> case sensitivity.
>
> It definitely is a lot faster for unicode text than anything I have seen.
>
> -----------------------------
> function offsetList D,S, pCase
>    -- returns a comma-delimited list of the offsets of D in S
>    -- pCase is a boolean for caseSensitive
>    set the caseSensitive to pCase
>    set the itemDel to D
>    put the length of D into tDelimLength
>    repeat for each item i in S
>       add length(i) + tDelimLength to C
>       put C - (tDelimLength - 1),"" after R
>    end repeat
>    set the itemDel to comma
>    if char -1 of S is D then return char 1 to -2 of R
>    put length(C) + 1 into lenC
>    put length(R) into lenR
>    if lenC = lenR then return 0
>    return char 1 to lenR - lenC - 1 of R
> end offsetList
> ------------------------------
>
> Kind regards
> Bernd
>
>
>
>
>
> >
> > Date: Thu, 1 Nov 2018 00:15:37 -0700
> > From: Geoff Canyon
> > To: How to use LiveCode <[hidden email]>
> > Subject: Re: How to find the offset of the last instance of a
> >       repeating       character in a string?
> >
> > I was curious if using the itemDelimiter might work for this, so I wrote
> > the below code out of curiosity; but in my quick testing with single-byte
> > characters it was only about 30% faster than the above methods, so I
> didn't
> > bother to post it.
> >
> > But Ben Rubinstein just posted about a terrible slow-down doing pretty
> much
> > this same thing for text with unicode characters. So I ran a simple test
> > with 8000 character long strings that start with a single unicode
> > character, this is about 15x faster than offset() with skip. For
> > 100,000-character lines it's about 300x faster, so it seems to be immune
> to
> > the line-painter issues skip is subject to. So for what it's worth:
> >
> > function offsetList D,S
> >   -- returns a comma-delimited list of the offsets of D in S
> >   set the itemDel to D
> >   repeat for each item i in S
> >      add length(i) + 1 to C
> >      put C,"" after R
> >   end repeat
> >   set the itemDel to comma
> >   if char -1 of S is D then return char 1 to -2 of R
> >   put length(C) + 1 into lenC
> >   put length(R) into lenR
> >   if lenC = lenR then return 0
> >   return char 1 to lenR - lenC - 1 of R
> > end offsetList
> >
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
It probably should be named listOffset, like itemOffset or lineOffset.

Bob S


> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <[hidden email]> wrote:
>
> Nice! I *just* finished creating a github repository for it, and adding
> support for multi-char search strings, much as you did. I was coming to the
> list to post the update when I saw your post.
>
> Here's the GitHub link: https://github.com/gcanyon/offsetlist
>
> Here's my updated version:
>
> function offsetList D,S,pCase
>   -- returns a comma-delimited list of the offsets of D in S
>   set the caseSensitive to pCase is true
>   set the itemDel to D
>   put length(D) into dLength
>   put 1 - dLength into C
>   repeat for each item i in S
>      add length(i) + dLength to C
>      put C,"" after R
>   end repeat
>   set the itemDel to comma
>   if char -dLength to -1 of S is D then return char 1 to -2 of R
>   put length(C) + 1 into lenC
>   put length(R) into lenR
>   if lenC = lenR then return 0
>   return char 1 to lenR - lenC - 1 of R
> end offsetList
>
> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> [hidden email]> wrote:
>
>> Hi Geoff,
>>
>> thank you for this beautiful script.
>>
>> I modified it a bit to accept multi-character search string and also for
>> case sensitivity.
>>
>> It definitely is a lot faster for unicode text than anything I have seen.
>>
>> -----------------------------
>> function offsetList D,S, pCase
>>   -- returns a comma-delimited list of the offsets of D in S
>>   -- pCase is a boolean for caseSensitive
>>   set the caseSensitive to pCase
>>   set the itemDel to D
>>   put the length of D into tDelimLength
>>   repeat for each item i in S
>>      add length(i) + tDelimLength to C
>>      put C - (tDelimLength - 1),"" after R
>>   end repeat
>>   set the itemDel to comma
>>   if char -1 of S is D then return char 1 to -2 of R
>>   put length(C) + 1 into lenC
>>   put length(R) into lenR
>>   if lenC = lenR then return 0
>>   return char 1 to lenR - lenC - 1 of R
>> end offsetList
>> ------------------------------
>>
>> Kind regards
>> Bernd
>>
>>
>>
>>
>>
>>>
>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
>>> From: Geoff Canyon
>>> To: How to use LiveCode <[hidden email]>
>>> Subject: Re: How to find the offset of the last instance of a
>>>      repeating       character in a string?
>>>
>>> I was curious if using the itemDelimiter might work for this, so I wrote
>>> the below code out of curiosity; but in my quick testing with single-byte
>>> characters it was only about 30% faster than the above methods, so I
>> didn't
>>> bother to post it.
>>>
>>> But Ben Rubinstein just posted about a terrible slow-down doing pretty
>> much
>>> this same thing for text with unicode characters. So I ran a simple test
>>> with 8000 character long strings that start with a single unicode
>>> character, this is about 15x faster than offset() with skip. For
>>> 100,000-character lines it's about 300x faster, so it seems to be immune
>> to
>>> the line-painter issues skip is subject to. So for what it's worth:
>>>
>>> function offsetList D,S
>>>  -- returns a comma-delimited list of the offsets of D in S
>>>  set the itemDel to D
>>>  repeat for each item i in S
>>>     add length(i) + 1 to C
>>>     put C,"" after R
>>>  end repeat
>>>  set the itemDel to comma
>>>  if char -1 of S is D then return char 1 to -2 of R
>>>  put length(C) + 1 into lenC
>>>  put length(R) into lenR
>>>  if lenC = lenR then return 0
>>>  return char 1 to lenR - lenC - 1 of R
>>> end offsetList
>>>
>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
All of those return a single value; I wanted to convey the concept of
returning multiple values. To me listOffset implies it does the same thing
as itemOffset, since items come in a list. How about:

offsets -- not my favorite because it's almost indistinguishable from offset
offsetsOf -- seems a tad clumsy

On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
[hidden email]> wrote:

> It probably should be named listOffset, like itemOffset or lineOffset.
>
> Bob S
>
>
> > On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
> [hidden email]> wrote:
> >
> > Nice! I *just* finished creating a github repository for it, and adding
> > support for multi-char search strings, much as you did. I was coming to
> the
> > list to post the update when I saw your post.
> >
> > Here's the GitHub link: https://github.com/gcanyon/offsetlist
> >
> > Here's my updated version:
> >
> > function offsetList D,S,pCase
> >   -- returns a comma-delimited list of the offsets of D in S
> >   set the caseSensitive to pCase is true
> >   set the itemDel to D
> >   put length(D) into dLength
> >   put 1 - dLength into C
> >   repeat for each item i in S
> >      add length(i) + dLength to C
> >      put C,"" after R
> >   end repeat
> >   set the itemDel to comma
> >   if char -dLength to -1 of S is D then return char 1 to -2 of R
> >   put length(C) + 1 into lenC
> >   put length(R) into lenR
> >   if lenC = lenR then return 0
> >   return char 1 to lenR - lenC - 1 of R
> > end offsetList
> >
> > On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> > [hidden email]> wrote:
> >
> >> Hi Geoff,
> >>
> >> thank you for this beautiful script.
> >>
> >> I modified it a bit to accept multi-character search string and also for
> >> case sensitivity.
> >>
> >> It definitely is a lot faster for unicode text than anything I have
> seen.
> >>
> >> -----------------------------
> >> function offsetList D,S, pCase
> >>   -- returns a comma-delimited list of the offsets of D in S
> >>   -- pCase is a boolean for caseSensitive
> >>   set the caseSensitive to pCase
> >>   set the itemDel to D
> >>   put the length of D into tDelimLength
> >>   repeat for each item i in S
> >>      add length(i) + tDelimLength to C
> >>      put C - (tDelimLength - 1),"" after R
> >>   end repeat
> >>   set the itemDel to comma
> >>   if char -1 of S is D then return char 1 to -2 of R
> >>   put length(C) + 1 into lenC
> >>   put length(R) into lenR
> >>   if lenC = lenR then return 0
> >>   return char 1 to lenR - lenC - 1 of R
> >> end offsetList
> >> ------------------------------
> >>
> >> Kind regards
> >> Bernd
> >>
> >>
> >>
> >>
> >>
> >>>
> >>> Date: Thu, 1 Nov 2018 00:15:37 -0700
> >>> From: Geoff Canyon
> >>> To: How to use LiveCode <[hidden email]>
> >>> Subject: Re: How to find the offset of the last instance of a
> >>>      repeating       character in a string?
> >>>
> >>> I was curious if using the itemDelimiter might work for this, so I
> wrote
> >>> the below code out of curiosity; but in my quick testing with
> single-byte
> >>> characters it was only about 30% faster than the above methods, so I
> >> didn't
> >>> bother to post it.
> >>>
> >>> But Ben Rubinstein just posted about a terrible slow-down doing pretty
> >> much
> >>> this same thing for text with unicode characters. So I ran a simple
> test
> >>> with 8000 character long strings that start with a single unicode
> >>> character, this is about 15x faster than offset() with skip. For
> >>> 100,000-character lines it's about 300x faster, so it seems to be
> immune
> >> to
> >>> the line-painter issues skip is subject to. So for what it's worth:
> >>>
> >>> function offsetList D,S
> >>>  -- returns a comma-delimited list of the offsets of D in S
> >>>  set the itemDel to D
> >>>  repeat for each item i in S
> >>>     add length(i) + 1 to C
> >>>     put C,"" after R
> >>>  end repeat
> >>>  set the itemDel to comma
> >>>  if char -1 of S is D then return char 1 to -2 of R
> >>>  put length(C) + 1 into lenC
> >>>  put length(R) into lenR
> >>>  if lenC = lenR then return 0
> >>>  return char 1 to lenR - lenC - 1 of R
> >>> end offsetList
> >>>
> >>
> >>
> >> _______________________________________________
> >> use-livecode mailing list
> >> [hidden email]
> >> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>
> > _______________________________________________
> > use-livecode mailing list
> > [hidden email]
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
how about allOffsets?

Bob S


> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <[hidden email]> wrote:
>
> All of those return a single value; I wanted to convey the concept of
> returning multiple values. To me listOffset implies it does the same thing
> as itemOffset, since items come in a list. How about:
>
> offsets -- not my favorite because it's almost indistinguishable from offset
> offsetsOf -- seems a tad clumsy
>
> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
> [hidden email]> wrote:
>
>> It probably should be named listOffset, like itemOffset or lineOffset.
>>
>> Bob S
>>
>>
>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
>> [hidden email]> wrote:
>>>
>>> Nice! I *just* finished creating a github repository for it, and adding
>>> support for multi-char search strings, much as you did. I was coming to
>> the
>>> list to post the update when I saw your post.
>>>
>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
>>>
>>> Here's my updated version:
>>>
>>> function offsetList D,S,pCase
>>>  -- returns a comma-delimited list of the offsets of D in S
>>>  set the caseSensitive to pCase is true
>>>  set the itemDel to D
>>>  put length(D) into dLength
>>>  put 1 - dLength into C
>>>  repeat for each item i in S
>>>     add length(i) + dLength to C
>>>     put C,"" after R
>>>  end repeat
>>>  set the itemDel to comma
>>>  if char -dLength to -1 of S is D then return char 1 to -2 of R
>>>  put length(C) + 1 into lenC
>>>  put length(R) into lenR
>>>  if lenC = lenR then return 0
>>>  return char 1 to lenR - lenC - 1 of R
>>> end offsetList
>>>
>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
>>> [hidden email]> wrote:
>>>
>>>> Hi Geoff,
>>>>
>>>> thank you for this beautiful script.
>>>>
>>>> I modified it a bit to accept multi-character search string and also for
>>>> case sensitivity.
>>>>
>>>> It definitely is a lot faster for unicode text than anything I have
>> seen.
>>>>
>>>> -----------------------------
>>>> function offsetList D,S, pCase
>>>>  -- returns a comma-delimited list of the offsets of D in S
>>>>  -- pCase is a boolean for caseSensitive
>>>>  set the caseSensitive to pCase
>>>>  set the itemDel to D
>>>>  put the length of D into tDelimLength
>>>>  repeat for each item i in S
>>>>     add length(i) + tDelimLength to C
>>>>     put C - (tDelimLength - 1),"" after R
>>>>  end repeat
>>>>  set the itemDel to comma
>>>>  if char -1 of S is D then return char 1 to -2 of R
>>>>  put length(C) + 1 into lenC
>>>>  put length(R) into lenR
>>>>  if lenC = lenR then return 0
>>>>  return char 1 to lenR - lenC - 1 of R
>>>> end offsetList
>>>> ------------------------------
>>>>
>>>> Kind regards
>>>> Bernd
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
>>>>> From: Geoff Canyon
>>>>> To: How to use LiveCode <[hidden email]>
>>>>> Subject: Re: How to find the offset of the last instance of a
>>>>>     repeating       character in a string?
>>>>>
>>>>> I was curious if using the itemDelimiter might work for this, so I
>> wrote
>>>>> the below code out of curiosity; but in my quick testing with
>> single-byte
>>>>> characters it was only about 30% faster than the above methods, so I
>>>> didn't
>>>>> bother to post it.
>>>>>
>>>>> But Ben Rubinstein just posted about a terrible slow-down doing pretty
>>>> much
>>>>> this same thing for text with unicode characters. So I ran a simple
>> test
>>>>> with 8000 character long strings that start with a single unicode
>>>>> character, this is about 15x faster than offset() with skip. For
>>>>> 100,000-character lines it's about 300x faster, so it seems to be
>> immune
>>>> to
>>>>> the line-painter issues skip is subject to. So for what it's worth:
>>>>>
>>>>> function offsetList D,S
>>>>> -- returns a comma-delimited list of the offsets of D in S
>>>>> set the itemDel to D
>>>>> repeat for each item i in S
>>>>>    add length(i) + 1 to C
>>>>>    put C,"" after R
>>>>> end repeat
>>>>> set the itemDel to comma
>>>>> if char -1 of S is D then return char 1 to -2 of R
>>>>> put length(C) + 1 into lenC
>>>>> put length(R) into lenR
>>>>> if lenC = lenR then return 0
>>>>> return char 1 to lenR - lenC - 1 of R
>>>>> end offsetList
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> [hidden email]
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> [hidden email]
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
I like that, changing it. Now available at
https://github.com/gcanyon/alloffsets

One thing I don't see how to do without significantly impacting performance
is to return all offsets if there are overlapping strings. For example:

allOffsets("aba","abababa")

would return 1,5, when it might be reasonable to expect it to return 1,3,5.
Using the offset function with numToSkip would make that easy; adapting
allOffsets to do so would be harder to do cleanly I think.

gc

On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
[hidden email]> wrote:

> how about allOffsets?
>
> Bob S
>
>
> > On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
> [hidden email]> wrote:
> >
> > All of those return a single value; I wanted to convey the concept of
> > returning multiple values. To me listOffset implies it does the same
> thing
> > as itemOffset, since items come in a list. How about:
> >
> > offsets -- not my favorite because it's almost indistinguishable from
> offset
> > offsetsOf -- seems a tad clumsy
> >
> > On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
> > [hidden email]> wrote:
> >
> >> It probably should be named listOffset, like itemOffset or lineOffset.
> >>
> >> Bob S
> >>
> >>
> >>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
> >> [hidden email]> wrote:
> >>>
> >>> Nice! I *just* finished creating a github repository for it, and adding
> >>> support for multi-char search strings, much as you did. I was coming to
> >> the
> >>> list to post the update when I saw your post.
> >>>
> >>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
> >>>
> >>> Here's my updated version:
> >>>
> >>> function offsetList D,S,pCase
> >>>  -- returns a comma-delimited list of the offsets of D in S
> >>>  set the caseSensitive to pCase is true
> >>>  set the itemDel to D
> >>>  put length(D) into dLength
> >>>  put 1 - dLength into C
> >>>  repeat for each item i in S
> >>>     add length(i) + dLength to C
> >>>     put C,"" after R
> >>>  end repeat
> >>>  set the itemDel to comma
> >>>  if char -dLength to -1 of S is D then return char 1 to -2 of R
> >>>  put length(C) + 1 into lenC
> >>>  put length(R) into lenR
> >>>  if lenC = lenR then return 0
> >>>  return char 1 to lenR - lenC - 1 of R
> >>> end offsetList
> >>>
> >>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> >>> [hidden email]> wrote:
> >>>
> >>>> Hi Geoff,
> >>>>
> >>>> thank you for this beautiful script.
> >>>>
> >>>> I modified it a bit to accept multi-character search string and also
> for
> >>>> case sensitivity.
> >>>>
> >>>> It definitely is a lot faster for unicode text than anything I have
> >> seen.
> >>>>
> >>>> -----------------------------
> >>>> function offsetList D,S, pCase
> >>>>  -- returns a comma-delimited list of the offsets of D in S
> >>>>  -- pCase is a boolean for caseSensitive
> >>>>  set the caseSensitive to pCase
> >>>>  set the itemDel to D
> >>>>  put the length of D into tDelimLength
> >>>>  repeat for each item i in S
> >>>>     add length(i) + tDelimLength to C
> >>>>     put C - (tDelimLength - 1),"" after R
> >>>>  end repeat
> >>>>  set the itemDel to comma
> >>>>  if char -1 of S is D then return char 1 to -2 of R
> >>>>  put length(C) + 1 into lenC
> >>>>  put length(R) into lenR
> >>>>  if lenC = lenR then return 0
> >>>>  return char 1 to lenR - lenC - 1 of R
> >>>> end offsetList
> >>>> ------------------------------
> >>>>
> >>>> Kind regards
> >>>> Bernd
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
> >>>>> From: Geoff Canyon
> >>>>> To: How to use LiveCode <[hidden email]>
> >>>>> Subject: Re: How to find the offset of the last instance of a
> >>>>>     repeating       character in a string?
> >>>>>
> >>>>> I was curious if using the itemDelimiter might work for this, so I
> >> wrote
> >>>>> the below code out of curiosity; but in my quick testing with
> >> single-byte
> >>>>> characters it was only about 30% faster than the above methods, so I
> >>>> didn't
> >>>>> bother to post it.
> >>>>>
> >>>>> But Ben Rubinstein just posted about a terrible slow-down doing
> pretty
> >>>> much
> >>>>> this same thing for text with unicode characters. So I ran a simple
> >> test
> >>>>> with 8000 character long strings that start with a single unicode
> >>>>> character, this is about 15x faster than offset() with skip. For
> >>>>> 100,000-character lines it's about 300x faster, so it seems to be
> >> immune
> >>>> to
> >>>>> the line-painter issues skip is subject to. So for what it's worth:
> >>>>>
> >>>>> function offsetList D,S
> >>>>> -- returns a comma-delimited list of the offsets of D in S
> >>>>> set the itemDel to D
> >>>>> repeat for each item i in S
> >>>>>    add length(i) + 1 to C
> >>>>>    put C,"" after R
> >>>>> end repeat
> >>>>> set the itemDel to comma
> >>>>> if char -1 of S is D then return char 1 to -2 of R
> >>>>> put length(C) + 1 into lenC
> >>>>> put length(R) into lenR
> >>>>> if lenC = lenR then return 0
> >>>>> return char 1 to lenR - lenC - 1 of R
> >>>>> end offsetList
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> use-livecode mailing list
> >>>> [hidden email]
> >>>> Please visit this url to subscribe, unsubscribe and manage your
> >>>> subscription preferences:
> >>>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>>>
> >>> _______________________________________________
> >>> use-livecode mailing list
> >>> [hidden email]
> >>> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>
> >>
> >> _______________________________________________
> >> use-livecode mailing list
> >> [hidden email]
> >> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>
> > _______________________________________________
> > use-livecode mailing list
> > [hidden email]
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode

On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:

> I like that, changing it. Now available at
> https://github.com/gcanyon/alloffsets
>
> One thing I don't see how to do without significantly impacting performance
> is to return all offsets if there are overlapping strings. For example:
>
> allOffsets("aba","abababa")
>
> would return 1,5, when it might be reasonable to expect it to return 1,3,5.
> Using the offset function with numToSkip would make that easy; adapting
> allOffsets to do so would be harder to do cleanly I think.
>
Can I suggest changing it to "someOffsets()" :-) :-)

But seriously, can you not iteratively run "allofsets" ?
something like .... (typed straight into email - totally untested)

function allOffsets pDel, pStr
  repeat with c = 1 to 255  -- or some other upper limit ?
     if NOT pDel contains numtochar(c) then
        put numtochar(c) into c
        exit repeat
     end if
   end repeat
   repeat forever
     put someOffsets(pDel, pStr) into newR
     if the number of items in newR = 0 then exit repeat
     repeat for each item I in newR
        put c into char I of newR
     end repeat
     put newR after R
   end repeat
   sort items of R numeric
   return R
end alloffsets

-- Alex.

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
Oh dear - answering my own posts .... rarely a good sign :-)


On 03/11/2018 02:10, Alex Tweedly via use-livecode wrote:

>
> On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
>> One thing I don't see how to do without significantly impacting
>> performance
>> is to return all offsets if there are overlapping strings. For example:
>>
>> allOffsets("aba","abababa")
>>
>> would return 1,5, when it might be reasonable to expect it to return
>> 1,3,5.
>> Using the offset function with numToSkip would make that easy; adapting
>> allOffsets to do so would be harder to do cleanly I think.
>>
> Can I suggest changing it to "someOffsets()" :-) :-)
>
> But seriously, can you not iteratively run "allofsets" ?
>
Answer : NO. That doesn't work.
However, there is a more efficient way that does work - but it needs to
be tested before I post it.

-- Alex.

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
Here is something... probably needs some optimization

function allOffsets2 D,S,pCase
   local dLength, C, R
   -- returns a comma-delimited list of the offsets of D in S
   set the caseSensitive to pCase is true
   set the itemDel to D
   put length(D) into dLength
   put 1 - dLength into C

   if dLength > 1 then
      local n, i, j, D2, L2
      put 0 into n
      repeat with i = 2 to dLength
         if char i to -1 of D is char 1 to -i of D then
            add 1 to n
            put char (1-i) to -1 of D into D2[n]
            put i-1 into L2[n]
         end if
      end repeat
   end if

   repeat for each item i in S
      if C > 0 and n > 0 then
         repeat with j = 1 to n
            if i&D begins with D2[j] then
               put C+L2[j],"" after R
            end if
         end repeat
      end if
      add length(i) + dLength to C
      put C,"" after R
   end repeat
   set the itemDel to comma
   delete char -1 of R

   if item -1 of R > len(S) then
      if the number of items of R is 1 then
         return 0
      else
         delete item -1 of R
      end if
   end if

   if char -dLength to -1 of S is D then
      return R
   end if

   repeat with j = n down to 1
      if char -len(D2[j]) to -1 of S is D2[j] then
         delete item -1 of R
      end if
   end repeat
   return R
end allOffsets2


I think a couple of private functions would be good.  One for 0 overlap,
one for a single overlap, then a final general one for any number of
overlaps (the core of the above).  After the loop that generates D2/L2 I
would branch based on n to avoid the additional comparisons inside the loop.

On Fri, Nov 2, 2018 at 9:45 PM Alex Tweedly via use-livecode <
[hidden email]> wrote:

> Oh dear - answering my own posts .... rarely a good sign :-)
>
>
> On 03/11/2018 02:10, Alex Tweedly via use-livecode wrote:
> >
> > On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
> >> One thing I don't see how to do without significantly impacting
> >> performance
> >> is to return all offsets if there are overlapping strings. For example:
> >>
> >> allOffsets("aba","abababa")
> >>
> >> would return 1,5, when it might be reasonable to expect it to return
> >> 1,3,5.
> >> Using the offset function with numToSkip would make that easy; adapting
> >> allOffsets to do so would be harder to do cleanly I think.
> >>
> > Can I suggest changing it to "someOffsets()" :-) :-)
> >
> > But seriously, can you not iteratively run "allofsets" ?
> >
> Answer : NO. That doesn't work.
> However, there is a more efficient way that does work - but it needs to
> be tested before I post it.
>
> -- Alex.
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
In reply to this post by Pi Digital via use-livecode
Hi Geoff,

unfortunately the impact of overlapping delimiter strings is more severe
than simply not finding them. The code on github gets the wrong answer
if there is an overlapping string at the very end of the search string, e.g.

alloffsets("aaaa", "aaaaaaaaa")    wrongly gives  1,5,10

I suspect the test for

  if char -dLength to -1 of S is D then return char 1 to -2 of R
should be (something like)
   if item -1 of S is empty then return char 1 to -2 of R
but to be honest, I'm not 10% certain of that.

Alex.



On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:

> I like that, changing it. Now available at
> https://github.com/gcanyon/alloffsets
>
> One thing I don't see how to do without significantly impacting performance
> is to return all offsets if there are overlapping strings. For example:
>
> allOffsets("aba","abababa")
>
> would return 1,5, when it might be reasonable to expect it to return 1,3,5.
> Using the offset function with numToSkip would make that easy; adapting
> allOffsets to do so would be harder to do cleanly I think.
>
> gc
>
> On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
> [hidden email]> wrote:
>
>> how about allOffsets?
>>
>> Bob S
>>
>>
>>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
>> [hidden email]> wrote:
>>> All of those return a single value; I wanted to convey the concept of
>>> returning multiple values. To me listOffset implies it does the same
>> thing
>>> as itemOffset, since items come in a list. How about:
>>>
>>> offsets -- not my favorite because it's almost indistinguishable from
>> offset
>>> offsetsOf -- seems a tad clumsy
>>>
>>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
>>> [hidden email]> wrote:
>>>
>>>> It probably should be named listOffset, like itemOffset or lineOffset.
>>>>
>>>> Bob S
>>>>
>>>>
>>>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
>>>> [hidden email]> wrote:
>>>>> Nice! I *just* finished creating a github repository for it, and adding
>>>>> support for multi-char search strings, much as you did. I was coming to
>>>> the
>>>>> list to post the update when I saw your post.
>>>>>
>>>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
>>>>>
>>>>> Here's my updated version:
>>>>>
>>>>> function offsetList D,S,pCase
>>>>>   -- returns a comma-delimited list of the offsets of D in S
>>>>>   set the caseSensitive to pCase is true
>>>>>   set the itemDel to D
>>>>>   put length(D) into dLength
>>>>>   put 1 - dLength into C
>>>>>   repeat for each item i in S
>>>>>      add length(i) + dLength to C
>>>>>      put C,"" after R
>>>>>   end repeat
>>>>>   set the itemDel to comma
>>>>>   if char -dLength to -1 of S is D then return char 1 to -2 of R
>>>>>   put length(C) + 1 into lenC
>>>>>   put length(R) into lenR
>>>>>   if lenC = lenR then return 0
>>>>>   return char 1 to lenR - lenC - 1 of R
>>>>> end offsetList
>>>>>
>>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
>>>>> [hidden email]> wrote:
>>>>>
>>>>>> Hi Geoff,
>>>>>>
>>>>>> thank you for this beautiful script.
>>>>>>
>>>>>> I modified it a bit to accept multi-character search string and also
>> for
>>>>>> case sensitivity.
>>>>>>
>>>>>> It definitely is a lot faster for unicode text than anything I have
>>>> seen.
>>>>>> -----------------------------
>>>>>> function offsetList D,S, pCase
>>>>>>   -- returns a comma-delimited list of the offsets of D in S
>>>>>>   -- pCase is a boolean for caseSensitive
>>>>>>   set the caseSensitive to pCase
>>>>>>   set the itemDel to D
>>>>>>   put the length of D into tDelimLength
>>>>>>   repeat for each item i in S
>>>>>>      add length(i) + tDelimLength to C
>>>>>>      put C - (tDelimLength - 1),"" after R
>>>>>>   end repeat
>>>>>>   set the itemDel to comma
>>>>>>   if char -1 of S is D then return char 1 to -2 of R
>>>>>>   put length(C) + 1 into lenC
>>>>>>   put length(R) into lenR
>>>>>>   if lenC = lenR then return 0
>>>>>>   return char 1 to lenR - lenC - 1 of R
>>>>>> end offsetList
>>>>>> ------------------------------
>>>>>>
>>>>>> Kind regards
>>>>>> Bernd
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
>>>>>>> From: Geoff Canyon
>>>>>>> To: How to use LiveCode <[hidden email]>
>>>>>>> Subject: Re: How to find the offset of the last instance of a
>>>>>>>      repeating       character in a string?
>>>>>>>
>>>>>>> I was curious if using the itemDelimiter might work for this, so I
>>>> wrote
>>>>>>> the below code out of curiosity; but in my quick testing with
>>>> single-byte
>>>>>>> characters it was only about 30% faster than the above methods, so I
>>>>>> didn't
>>>>>>> bother to post it.
>>>>>>>
>>>>>>> But Ben Rubinstein just posted about a terrible slow-down doing
>> pretty
>>>>>> much
>>>>>>> this same thing for text with unicode characters. So I ran a simple
>>>> test
>>>>>>> with 8000 character long strings that start with a single unicode
>>>>>>> character, this is about 15x faster than offset() with skip. For
>>>>>>> 100,000-character lines it's about 300x faster, so it seems to be
>>>> immune
>>>>>> to
>>>>>>> the line-painter issues skip is subject to. So for what it's worth:
>>>>>>>
>>>>>>> function offsetList D,S
>>>>>>> -- returns a comma-delimited list of the offsets of D in S
>>>>>>> set the itemDel to D
>>>>>>> repeat for each item i in S
>>>>>>>     add length(i) + 1 to C
>>>>>>>     put C,"" after R
>>>>>>> end repeat
>>>>>>> set the itemDel to comma
>>>>>>> if char -1 of S is D then return char 1 to -2 of R
>>>>>>> put length(C) + 1 into lenC
>>>>>>> put length(R) into lenR
>>>>>>> if lenC = lenR then return 0
>>>>>>> return char 1 to lenR - lenC - 1 of R
>>>>>>> end offsetList
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> use-livecode mailing list
>>>>>> [hidden email]
>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>> subscription preferences:
>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>>
>>>>> _______________________________________________
>>>>> use-livecode mailing list
>>>>> [hidden email]
>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> [hidden email]
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> [hidden email]
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
Good catch Alex.  My code was closer, but didn't handle repeating
characters correctly.  Here is an updated version.

function allOffsets2 D,S,pCase
   local dLength, C, R
   -- returns a comma-delimited list of the offsets of D in S
   set the caseSensitive to pCase is true
   set the itemDel to D
   put length(D) into dLength
   put 1 - dLength into C

   if dLength > 1 then
      local n, i, j, D2, L2
      put 0 into n
      repeat with i = 2 to dLength
         if char i to -1 of D is char 1 to -i of D then
            add 1 to n
            put char (1-i) to -1 of D into D2[n]
            put i-1 into L2[n]
         end if
      end repeat
   end if

   repeat for each item i in S
      if C > 0 and n > 0 then
         repeat with j = 1 to n
            if i&D begins with D2[j] then
               put C+L2[j],"" after R
            end if
         end repeat
      end if
      add length(i) + dLength to C
      put C,"" after R
   end repeat
   set the itemDel to comma
   delete char -1 of R

   if item -1 of R > len(S) then
      if the number of items of R is 1 then
         return 0
      else
         delete item -1 of R
      end if
   end if

   if len(i) > 0 then
      repeat with j = n down to len(i)+1
         if char -len(D2[j]) to -1 of S is D2[j] then
            delete item -1 of R
         end if
      end repeat
   end if
   return R
end allOffsets2


On Sat, Nov 3, 2018 at 8:33 AM Alex Tweedly via use-livecode <
[hidden email]> wrote:

> Hi Geoff,
>
> unfortunately the impact of overlapping delimiter strings is more severe
> than simply not finding them. The code on github gets the wrong answer
> if there is an overlapping string at the very end of the search string,
> e.g.
>
> alloffsets("aaaa", "aaaaaaaaa")    wrongly gives  1,5,10
>
> I suspect the test for
>
>   if char -dLength to -1 of S is D then return char 1 to -2 of R
> should be (something like)
>    if item -1 of S is empty then return char 1 to -2 of R
> but to be honest, I'm not 10% certain of that.
>
> Alex.
>
>
>
> On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
> > I like that, changing it. Now available at
> > https://github.com/gcanyon/alloffsets
> >
> > One thing I don't see how to do without significantly impacting
> performance
> > is to return all offsets if there are overlapping strings. For example:
> >
> > allOffsets("aba","abababa")
> >
> > would return 1,5, when it might be reasonable to expect it to return
> 1,3,5.
> > Using the offset function with numToSkip would make that easy; adapting
> > allOffsets to do so would be harder to do cleanly I think.
> >
> > gc
> >
> > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
> > [hidden email]> wrote:
> >
> >> how about allOffsets?
> >>
> >> Bob S
> >>
> >>
> >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
> >> [hidden email]> wrote:
> >>> All of those return a single value; I wanted to convey the concept of
> >>> returning multiple values. To me listOffset implies it does the same
> >> thing
> >>> as itemOffset, since items come in a list. How about:
> >>>
> >>> offsets -- not my favorite because it's almost indistinguishable from
> >> offset
> >>> offsetsOf -- seems a tad clumsy
> >>>
> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
> >>> [hidden email]> wrote:
> >>>
> >>>> It probably should be named listOffset, like itemOffset or lineOffset.
> >>>>
> >>>> Bob S
> >>>>
> >>>>
> >>>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
> >>>> [hidden email]> wrote:
> >>>>> Nice! I *just* finished creating a github repository for it, and
> adding
> >>>>> support for multi-char search strings, much as you did. I was coming
> to
> >>>> the
> >>>>> list to post the update when I saw your post.
> >>>>>
> >>>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
> >>>>>
> >>>>> Here's my updated version:
> >>>>>
> >>>>> function offsetList D,S,pCase
> >>>>>   -- returns a comma-delimited list of the offsets of D in S
> >>>>>   set the caseSensitive to pCase is true
> >>>>>   set the itemDel to D
> >>>>>   put length(D) into dLength
> >>>>>   put 1 - dLength into C
> >>>>>   repeat for each item i in S
> >>>>>      add length(i) + dLength to C
> >>>>>      put C,"" after R
> >>>>>   end repeat
> >>>>>   set the itemDel to comma
> >>>>>   if char -dLength to -1 of S is D then return char 1 to -2 of R
> >>>>>   put length(C) + 1 into lenC
> >>>>>   put length(R) into lenR
> >>>>>   if lenC = lenR then return 0
> >>>>>   return char 1 to lenR - lenC - 1 of R
> >>>>> end offsetList
> >>>>>
> >>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
> >>>>> [hidden email]> wrote:
> >>>>>
> >>>>>> Hi Geoff,
> >>>>>>
> >>>>>> thank you for this beautiful script.
> >>>>>>
> >>>>>> I modified it a bit to accept multi-character search string and also
> >> for
> >>>>>> case sensitivity.
> >>>>>>
> >>>>>> It definitely is a lot faster for unicode text than anything I have
> >>>> seen.
> >>>>>> -----------------------------
> >>>>>> function offsetList D,S, pCase
> >>>>>>   -- returns a comma-delimited list of the offsets of D in S
> >>>>>>   -- pCase is a boolean for caseSensitive
> >>>>>>   set the caseSensitive to pCase
> >>>>>>   set the itemDel to D
> >>>>>>   put the length of D into tDelimLength
> >>>>>>   repeat for each item i in S
> >>>>>>      add length(i) + tDelimLength to C
> >>>>>>      put C - (tDelimLength - 1),"" after R
> >>>>>>   end repeat
> >>>>>>   set the itemDel to comma
> >>>>>>   if char -1 of S is D then return char 1 to -2 of R
> >>>>>>   put length(C) + 1 into lenC
> >>>>>>   put length(R) into lenR
> >>>>>>   if lenC = lenR then return 0
> >>>>>>   return char 1 to lenR - lenC - 1 of R
> >>>>>> end offsetList
> >>>>>> ------------------------------
> >>>>>>
> >>>>>> Kind regards
> >>>>>> Bernd
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
> >>>>>>> From: Geoff Canyon
> >>>>>>> To: How to use LiveCode <[hidden email]>
> >>>>>>> Subject: Re: How to find the offset of the last instance of a
> >>>>>>>      repeating       character in a string?
> >>>>>>>
> >>>>>>> I was curious if using the itemDelimiter might work for this, so I
> >>>> wrote
> >>>>>>> the below code out of curiosity; but in my quick testing with
> >>>> single-byte
> >>>>>>> characters it was only about 30% faster than the above methods, so
> I
> >>>>>> didn't
> >>>>>>> bother to post it.
> >>>>>>>
> >>>>>>> But Ben Rubinstein just posted about a terrible slow-down doing
> >> pretty
> >>>>>> much
> >>>>>>> this same thing for text with unicode characters. So I ran a simple
> >>>> test
> >>>>>>> with 8000 character long strings that start with a single unicode
> >>>>>>> character, this is about 15x faster than offset() with skip. For
> >>>>>>> 100,000-character lines it's about 300x faster, so it seems to be
> >>>> immune
> >>>>>> to
> >>>>>>> the line-painter issues skip is subject to. So for what it's worth:
> >>>>>>>
> >>>>>>> function offsetList D,S
> >>>>>>> -- returns a comma-delimited list of the offsets of D in S
> >>>>>>> set the itemDel to D
> >>>>>>> repeat for each item i in S
> >>>>>>>     add length(i) + 1 to C
> >>>>>>>     put C,"" after R
> >>>>>>> end repeat
> >>>>>>> set the itemDel to comma
> >>>>>>> if char -1 of S is D then return char 1 to -2 of R
> >>>>>>> put length(C) + 1 into lenC
> >>>>>>> put length(R) into lenR
> >>>>>>> if lenC = lenR then return 0
> >>>>>>> return char 1 to lenR - lenC - 1 of R
> >>>>>>> end offsetList
> >>>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> use-livecode mailing list
> >>>>>> [hidden email]
> >>>>>> Please visit this url to subscribe, unsubscribe and manage your
> >>>>>> subscription preferences:
> >>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>>>>>
> >>>>> _______________________________________________
> >>>>> use-livecode mailing list
> >>>>> [hidden email]
> >>>>> Please visit this url to subscribe, unsubscribe and manage your
> >>>> subscription preferences:
> >>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>>>
> >>>> _______________________________________________
> >>>> use-livecode mailing list
> >>>> [hidden email]
> >>>> Please visit this url to subscribe, unsubscribe and manage your
> >>>> subscription preferences:
> >>>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>>>
> >>> _______________________________________________
> >>> use-livecode mailing list
> >>> [hidden email]
> >>> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >>> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>
> >> _______________________________________________
> >> use-livecode mailing list
> >> [hidden email]
> >> Please visit this url to subscribe, unsubscribe and manage your
> >> subscription preferences:
> >> http://lists.runrev.com/mailman/listinfo/use-livecode
> >>
> > _______________________________________________
> > use-livecode mailing list
> > [hidden email]
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
I've posted a binary stack version that includes my version.  I cloned and
made a "bwm" branch in my clone.  Here's the direct link to the script with
the posted code (updated to use private functions):

https://github.com/bwmilby/alloffsets/blob/bwm/bwm/allOffsets_Scripts/stack_allOffsets_button_id_1009.livecodescript

The binary stack can be found here:

https://github.com/bwmilby/alloffsets/tree/bwm/bwm

There are 3 button across the top.  The first is Geoff's version.  The
second is my combined version.  The third is the one with private functions
added.  The first button replaces the results field.  The second and third
add their results to the results field.

The top field is the string to find (needle), the second is the string to
search (haystack), the third is for the results.
Everything is in a background group so you can add cards for unique
searches.

On Sat, Nov 3, 2018 at 9:17 AM Brian Milby <[hidden email]> wrote:

> Good catch Alex.  My code was closer, but didn't handle repeating
> characters correctly.  Here is an updated version.
>
> function allOffsets2 D,S,pCase
>    local dLength, C, R
>    -- returns a comma-delimited list of the offsets of D in S
>    set the caseSensitive to pCase is true
>    set the itemDel to D
>    put length(D) into dLength
>    put 1 - dLength into C
>
>    if dLength > 1 then
>       local n, i, j, D2, L2
>       put 0 into n
>       repeat with i = 2 to dLength
>          if char i to -1 of D is char 1 to -i of D then
>             add 1 to n
>             put char (1-i) to -1 of D into D2[n]
>             put i-1 into L2[n]
>          end if
>       end repeat
>    end if
>
>    repeat for each item i in S
>       if C > 0 and n > 0 then
>          repeat with j = 1 to n
>             if i&D begins with D2[j] then
>                put C+L2[j],"" after R
>             end if
>          end repeat
>       end if
>       add length(i) + dLength to C
>       put C,"" after R
>    end repeat
>    set the itemDel to comma
>    delete char -1 of R
>
>    if item -1 of R > len(S) then
>       if the number of items of R is 1 then
>          return 0
>       else
>          delete item -1 of R
>       end if
>    end if
>
>    if len(i) > 0 then
>       repeat with j = n down to len(i)+1
>          if char -len(D2[j]) to -1 of S is D2[j] then
>             delete item -1 of R
>          end if
>       end repeat
>    end if
>    return R
> end allOffsets2
>
>
> On Sat, Nov 3, 2018 at 8:33 AM Alex Tweedly via use-livecode <
> [hidden email]> wrote:
>
>> Hi Geoff,
>>
>> unfortunately the impact of overlapping delimiter strings is more severe
>> than simply not finding them. The code on github gets the wrong answer
>> if there is an overlapping string at the very end of the search string,
>> e.g.
>>
>> alloffsets("aaaa", "aaaaaaaaa")    wrongly gives  1,5,10
>>
>> I suspect the test for
>>
>>   if char -dLength to -1 of S is D then return char 1 to -2 of R
>> should be (something like)
>>    if item -1 of S is empty then return char 1 to -2 of R
>> but to be honest, I'm not 10% certain of that.
>>
>> Alex.
>>
>>
>>
>> On 03/11/2018 00:43, Geoff Canyon via use-livecode wrote:
>> > I like that, changing it. Now available at
>> > https://github.com/gcanyon/alloffsets
>> >
>> > One thing I don't see how to do without significantly impacting
>> performance
>> > is to return all offsets if there are overlapping strings. For example:
>> >
>> > allOffsets("aba","abababa")
>> >
>> > would return 1,5, when it might be reasonable to expect it to return
>> 1,3,5.
>> > Using the offset function with numToSkip would make that easy; adapting
>> > allOffsets to do so would be harder to do cleanly I think.
>> >
>> > gc
>> >
>> > On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
>> > [hidden email]> wrote:
>> >
>> >> how about allOffsets?
>> >>
>> >> Bob S
>> >>
>> >>
>> >>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
>> >> [hidden email]> wrote:
>> >>> All of those return a single value; I wanted to convey the concept of
>> >>> returning multiple values. To me listOffset implies it does the same
>> >> thing
>> >>> as itemOffset, since items come in a list. How about:
>> >>>
>> >>> offsets -- not my favorite because it's almost indistinguishable from
>> >> offset
>> >>> offsetsOf -- seems a tad clumsy
>> >>>
>> >>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
>> >>> [hidden email]> wrote:
>> >>>
>> >>>> It probably should be named listOffset, like itemOffset or
>> lineOffset.
>> >>>>
>> >>>> Bob S
>> >>>>
>> >>>>
>> >>>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
>> >>>> [hidden email]> wrote:
>> >>>>> Nice! I *just* finished creating a github repository for it, and
>> adding
>> >>>>> support for multi-char search strings, much as you did. I was
>> coming to
>> >>>> the
>> >>>>> list to post the update when I saw your post.
>> >>>>>
>> >>>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
>> >>>>>
>> >>>>> Here's my updated version:
>> >>>>>
>> >>>>> function offsetList D,S,pCase
>> >>>>>   -- returns a comma-delimited list of the offsets of D in S
>> >>>>>   set the caseSensitive to pCase is true
>> >>>>>   set the itemDel to D
>> >>>>>   put length(D) into dLength
>> >>>>>   put 1 - dLength into C
>> >>>>>   repeat for each item i in S
>> >>>>>      add length(i) + dLength to C
>> >>>>>      put C,"" after R
>> >>>>>   end repeat
>> >>>>>   set the itemDel to comma
>> >>>>>   if char -dLength to -1 of S is D then return char 1 to -2 of R
>> >>>>>   put length(C) + 1 into lenC
>> >>>>>   put length(R) into lenR
>> >>>>>   if lenC = lenR then return 0
>> >>>>>   return char 1 to lenR - lenC - 1 of R
>> >>>>> end offsetList
>> >>>>>
>> >>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
>> >>>>> [hidden email]> wrote:
>> >>>>>
>> >>>>>> Hi Geoff,
>> >>>>>>
>> >>>>>> thank you for this beautiful script.
>> >>>>>>
>> >>>>>> I modified it a bit to accept multi-character search string and
>> also
>> >> for
>> >>>>>> case sensitivity.
>> >>>>>>
>> >>>>>> It definitely is a lot faster for unicode text than anything I have
>> >>>> seen.
>> >>>>>> -----------------------------
>> >>>>>> function offsetList D,S, pCase
>> >>>>>>   -- returns a comma-delimited list of the offsets of D in S
>> >>>>>>   -- pCase is a boolean for caseSensitive
>> >>>>>>   set the caseSensitive to pCase
>> >>>>>>   set the itemDel to D
>> >>>>>>   put the length of D into tDelimLength
>> >>>>>>   repeat for each item i in S
>> >>>>>>      add length(i) + tDelimLength to C
>> >>>>>>      put C - (tDelimLength - 1),"" after R
>> >>>>>>   end repeat
>> >>>>>>   set the itemDel to comma
>> >>>>>>   if char -1 of S is D then return char 1 to -2 of R
>> >>>>>>   put length(C) + 1 into lenC
>> >>>>>>   put length(R) into lenR
>> >>>>>>   if lenC = lenR then return 0
>> >>>>>>   return char 1 to lenR - lenC - 1 of R
>> >>>>>> end offsetList
>> >>>>>> ------------------------------
>> >>>>>>
>> >>>>>> Kind regards
>> >>>>>> Bernd
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
>> >>>>>>> From: Geoff Canyon
>> >>>>>>> To: How to use LiveCode <[hidden email]>
>> >>>>>>> Subject: Re: How to find the offset of the last instance of a
>> >>>>>>>      repeating       character in a string?
>> >>>>>>>
>> >>>>>>> I was curious if using the itemDelimiter might work for this, so I
>> >>>> wrote
>> >>>>>>> the below code out of curiosity; but in my quick testing with
>> >>>> single-byte
>> >>>>>>> characters it was only about 30% faster than the above methods,
>> so I
>> >>>>>> didn't
>> >>>>>>> bother to post it.
>> >>>>>>>
>> >>>>>>> But Ben Rubinstein just posted about a terrible slow-down doing
>> >> pretty
>> >>>>>> much
>> >>>>>>> this same thing for text with unicode characters. So I ran a
>> simple
>> >>>> test
>> >>>>>>> with 8000 character long strings that start with a single unicode
>> >>>>>>> character, this is about 15x faster than offset() with skip. For
>> >>>>>>> 100,000-character lines it's about 300x faster, so it seems to be
>> >>>> immune
>> >>>>>> to
>> >>>>>>> the line-painter issues skip is subject to. So for what it's
>> worth:
>> >>>>>>>
>> >>>>>>> function offsetList D,S
>> >>>>>>> -- returns a comma-delimited list of the offsets of D in S
>> >>>>>>> set the itemDel to D
>> >>>>>>> repeat for each item i in S
>> >>>>>>>     add length(i) + 1 to C
>> >>>>>>>     put C,"" after R
>> >>>>>>> end repeat
>> >>>>>>> set the itemDel to comma
>> >>>>>>> if char -1 of S is D then return char 1 to -2 of R
>> >>>>>>> put length(C) + 1 into lenC
>> >>>>>>> put length(R) into lenR
>> >>>>>>> if lenC = lenR then return 0
>> >>>>>>> return char 1 to lenR - lenC - 1 of R
>> >>>>>>> end offsetList
>> >>>>>>>
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> use-livecode mailing list
>> >>>>>> [hidden email]
>> >>>>>> Please visit this url to subscribe, unsubscribe and manage your
>> >>>>>> subscription preferences:
>> >>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> >>>>>>
>> >>>>> _______________________________________________
>> >>>>> use-livecode mailing list
>> >>>>> [hidden email]
>> >>>>> Please visit this url to subscribe, unsubscribe and manage your
>> >>>> subscription preferences:
>> >>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> >>>>
>> >>>> _______________________________________________
>> >>>> use-livecode mailing list
>> >>>> [hidden email]
>> >>>> Please visit this url to subscribe, unsubscribe and manage your
>> >>>> subscription preferences:
>> >>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> >>>>
>> >>> _______________________________________________
>> >>> use-livecode mailing list
>> >>> [hidden email]
>> >>> Please visit this url to subscribe, unsubscribe and manage your
>> >> subscription preferences:
>> >>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> >>
>> >> _______________________________________________
>> >> use-livecode mailing list
>> >> [hidden email]
>> >> Please visit this url to subscribe, unsubscribe and manage your
>> >> subscription preferences:
>> >> http://lists.runrev.com/mailman/listinfo/use-livecode
>> >>
>> > _______________________________________________
>> > use-livecode mailing list
>> > [hidden email]
>> > Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> > http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
Alex, good catch! The code below and at
https://github.com/gcanyon/alloffsets now puts a stop character after the
string to prevent the error you found. I also added a "with overlaps"
option. I think this is correct, and about as efficient as possible, but
thanks to anyone who finds a bug or a faster way.

gc


function allOffsets D,S,pCase,pWithOverlaps
   -- returns a comma-delimited list of the offsets of D in S
   set the caseSensitive to pCase is true
   put length(D) into dLength
   put numtochar(chartonum(char -1 of D) mod 2 + 1) after S
   if pWithOverlaps then
      repeat with i = 1 to dLength - 1
         if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then
next repeat
         put char -i to -1 of D into OV[i]
         put i & cr after kList
      end repeat
   end if
   set the itemDel to D
   put 1 - dLength into C
   if pWithOverlaps then
      repeat for each item i in S
         repeat for each line K in kList
            if char 1 to K of (i & D) is OV[K] then put (C + K),"" after R
         end repeat
         add length(i) + dLength to C
         put C,"" after R
      end repeat
   else
      repeat for each item i in S
         add length(i) + dLength to C
         put C,"" after R
      end repeat
   end if
   set the itemDel to comma
   repeat until item 1 of R > 0
      delete item 1 of R
   end repeat
   delete item -1 of R
   if R is empty then return 0 else return char 1 to -2 of R
end allOffsets
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
Logic matches my solution.  I also validated my solution using just the
offset function.  Speed hit for with overlap is similar.  One possible
optimization:

put kList is not empty into pWithOverlaps

If with overlaps was requested but the source delimiter did not contain any
overlaps, then the extra loops are skipped.

Adding a character to the end is clever.  I'll need to incorporate that and
see what it does to my method.

My take on the code updates is here:
https://github.com/bwmilby/alloffsets/blob/bwm/bwm/allOffsets_Scripts/stack_allOffsets_button_id_1026.livecodescript

Stack and index of scripts here:
https://github.com/bwmilby/alloffsets/tree/bwm/bwm

On Sun, Nov 4, 2018 at 12:42 PM Geoff Canyon via use-livecode <
[hidden email]> wrote:

> Alex, good catch! The code below and at
> https://github.com/gcanyon/alloffsets now puts a stop character after the
> string to prevent the error you found. I also added a "with overlaps"
> option. I think this is correct, and about as efficient as possible, but
> thanks to anyone who finds a bug or a faster way.
>
> gc
>
>
> function allOffsets D,S,pCase,pWithOverlaps
>    -- returns a comma-delimited list of the offsets of D in S
>    set the caseSensitive to pCase is true
>    put length(D) into dLength
>    put numtochar(chartonum(char -1 of D) mod 2 + 1) after S
>    if pWithOverlaps then
>       repeat with i = 1 to dLength - 1
>          if not (char i + 1 to -1 of D is char 1 to dLength - i of D) then
> next repeat
>          put char -i to -1 of D into OV[i]
>          put i & cr after kList
>       end repeat
>    end if
>    set the itemDel to D
>    put 1 - dLength into C
>    if pWithOverlaps then
>       repeat for each item i in S
>          repeat for each line K in kList
>             if char 1 to K of (i & D) is OV[K] then put (C + K),"" after R
>          end repeat
>          add length(i) + dLength to C
>          put C,"" after R
>       end repeat
>    else
>       repeat for each item i in S
>          add length(i) + dLength to C
>          put C,"" after R
>       end repeat
>    end if
>    set the itemDel to comma
>    repeat until item 1 of R > 0
>       delete item 1 of R
>    end repeat
>    delete item -1 of R
>    if R is empty then return 0 else return char 1 to -2 of R
> end allOffsets
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
In reply to this post by Pi Digital via use-livecode
On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote:
> I also added a "with overlaps" option.

My problem with the pWithOverlaps parameter is that is requires a priori
knowledge of the data being consumed. If you already know there are
overlaps then you'd set the parameter to true. If you don't know whether
or not there are overlaps, then you'd need to set it to true so you
don't miss anything (aside, of course, for the trivial case where you
don't care whether or not there are overlaps - is there a use case for
this?).

The only time you would set it to false is after you've already
determined that there are no overlaps, and the time spent on that would
probably more than offset the extra processing in the function.

--
  Mark Wieder
  [hidden email]

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
My updated solution always looks for overlap but if none are found it uses optimized versions of the search (private functions instead of inside the main function). I special case for no overlap and a single overlap in the delimiter. It is about the same speed as Geoff’s.

Thanks,
Brian
On Nov 4, 2018, 6:34 PM -0600, Mark Wieder via use-livecode <[hidden email]>, wrote:

> On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote:
> > I also added a "with overlaps" option.
>
> My problem with the pWithOverlaps parameter is that is requires a priori
> knowledge of the data being consumed. If you already know there are
> overlaps then you'd set the parameter to true. If you don't know whether
> or not there are overlaps, then you'd need to set it to true so you
> don't miss anything (aside, of course, for the trivial case where you
> don't care whether or not there are overlaps - is there a use case for
> this?).
>
> The only time you would set it to false is after you've already
> determined that there are no overlaps, and the time spent on that would
> probably more than offset the extra processing in the function.
>
> --
> Mark Wieder
> [hidden email]
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
In reply to this post by Pi Digital via use-livecode
On Sun, Nov 4, 2018 at 4:34 PM Mark Wieder via use-livecode <
[hidden email]> wrote:

> On 11/4/18 10:40 AM, Geoff Canyon via use-livecode wrote:
> > I also added a "with overlaps" option.
>
> My problem with the pWithOverlaps parameter is that is requires a priori
> knowledge of the data being consumed. If you already know there are
> overlaps then you'd set the parameter to true. If you don't know whether
> or not there are overlaps, then you'd need to set it to true so you
> don't miss anything (aside, of course, for the trivial case where you
> don't care whether or not there are overlaps - is there a use case for
> this?).
>
> The only time you would set it to false is after you've already
> determined that there are no overlaps, and the time spent on that would
> probably more than offset the extra processing in the function.


I'm not sure I agree that it would be so unlikely to know that overlaps
won't occur (or that it's unreasonable to not want them). If I'm looking
for every instance of "romeo" in romeo and juliet, then obviously I'm not
expecting, nor do I want, overlaps. Likewise, overlaps can only occur if
the search string allows for them, so "romeo" makes it impossible from the
get go

That said, it seems reasonable to default overlaps to true rather than
false. I'll set it up that way when I add the modification below.

On Sun, Nov 4, 2018 at 4:02 PM Brian Milby via use-livecode <
[hidden email]> wrote:

>
> put kList is not empty into pWithOverlaps
>

Good point -- I suppose it also makes sense (albeit that the speed
improvement would be trivial) to not bother even building kList if the term
to be found is a single character.

gc
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
On 11/4/18 6:49 PM, Geoff Canyon via use-livecode wrote:

> I'm not sure I agree that it would be so unlikely to know that overlaps
> won't occur (or that it's unreasonable to not want them). If I'm looking
> for every instance of "romeo" in romeo and juliet, then obviously I'm not
> expecting, nor do I want, overlaps.
Sure, but in that case you'd be better off using the faster 'offset'
function. Or do you mean every instance of 'romeo' in the play itself?
There I can see why you'd want to set it to false for speed.

My point isn't really whether pOverlaps should default to true or false,
but that you need detailed knowledge of the corpus of data before
calling the function.

If you're looking for 'romeo' in pText, would you set pOverlaps to true
or to false?

--
  Mark Wieder
  [hidden email]

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
In reply to this post by Pi Digital via use-livecode
Simply add 1 to the last offset pointer. If after the first iteration you return 1, then set the charsToSkip to 2 instead of offset + len(searchString) if you take my meaning.

Bob S


> On Nov 2, 2018, at 17:43 , Geoff Canyon via use-livecode <[hidden email]> wrote:
>
> I like that, changing it. Now available at
> https://github.com/gcanyon/alloffsets
>
> One thing I don't see how to do without significantly impacting performance
> is to return all offsets if there are overlapping strings. For example:
>
> allOffsets("aba","abababa")
>
> would return 1,5, when it might be reasonable to expect it to return 1,3,5.
> Using the offset function with numToSkip would make that easy; adapting
> allOffsets to do so would be harder to do cleanly I think.
>
> gc
>
> On Fri, Nov 2, 2018 at 12:17 PM Bob Sneidar via use-livecode <
> [hidden email]> wrote:
>
>> how about allOffsets?
>>
>> Bob S
>>
>>
>>> On Nov 2, 2018, at 09:16 , Geoff Canyon via use-livecode <
>> [hidden email]> wrote:
>>>
>>> All of those return a single value; I wanted to convey the concept of
>>> returning multiple values. To me listOffset implies it does the same
>> thing
>>> as itemOffset, since items come in a list. How about:
>>>
>>> offsets -- not my favorite because it's almost indistinguishable from
>> offset
>>> offsetsOf -- seems a tad clumsy
>>>
>>> On Fri, Nov 2, 2018 at 7:41 AM Bob Sneidar via use-livecode <
>>> [hidden email]> wrote:
>>>
>>>> It probably should be named listOffset, like itemOffset or lineOffset.
>>>>
>>>> Bob S
>>>>
>>>>
>>>>> On Nov 1, 2018, at 17:04 , Geoff Canyon via use-livecode <
>>>> [hidden email]> wrote:
>>>>>
>>>>> Nice! I *just* finished creating a github repository for it, and adding
>>>>> support for multi-char search strings, much as you did. I was coming to
>>>> the
>>>>> list to post the update when I saw your post.
>>>>>
>>>>> Here's the GitHub link: https://github.com/gcanyon/offsetlist
>>>>>
>>>>> Here's my updated version:
>>>>>
>>>>> function offsetList D,S,pCase
>>>>> -- returns a comma-delimited list of the offsets of D in S
>>>>> set the caseSensitive to pCase is true
>>>>> set the itemDel to D
>>>>> put length(D) into dLength
>>>>> put 1 - dLength into C
>>>>> repeat for each item i in S
>>>>>    add length(i) + dLength to C
>>>>>    put C,"" after R
>>>>> end repeat
>>>>> set the itemDel to comma
>>>>> if char -dLength to -1 of S is D then return char 1 to -2 of R
>>>>> put length(C) + 1 into lenC
>>>>> put length(R) into lenR
>>>>> if lenC = lenR then return 0
>>>>> return char 1 to lenR - lenC - 1 of R
>>>>> end offsetList
>>>>>
>>>>> On Thu, Nov 1, 2018 at 8:28 AM Niggemann, Bernd via use-livecode <
>>>>> [hidden email]> wrote:
>>>>>
>>>>>> Hi Geoff,
>>>>>>
>>>>>> thank you for this beautiful script.
>>>>>>
>>>>>> I modified it a bit to accept multi-character search string and also
>> for
>>>>>> case sensitivity.
>>>>>>
>>>>>> It definitely is a lot faster for unicode text than anything I have
>>>> seen.
>>>>>>
>>>>>> -----------------------------
>>>>>> function offsetList D,S, pCase
>>>>>> -- returns a comma-delimited list of the offsets of D in S
>>>>>> -- pCase is a boolean for caseSensitive
>>>>>> set the caseSensitive to pCase
>>>>>> set the itemDel to D
>>>>>> put the length of D into tDelimLength
>>>>>> repeat for each item i in S
>>>>>>    add length(i) + tDelimLength to C
>>>>>>    put C - (tDelimLength - 1),"" after R
>>>>>> end repeat
>>>>>> set the itemDel to comma
>>>>>> if char -1 of S is D then return char 1 to -2 of R
>>>>>> put length(C) + 1 into lenC
>>>>>> put length(R) into lenR
>>>>>> if lenC = lenR then return 0
>>>>>> return char 1 to lenR - lenC - 1 of R
>>>>>> end offsetList
>>>>>> ------------------------------
>>>>>>
>>>>>> Kind regards
>>>>>> Bernd
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Date: Thu, 1 Nov 2018 00:15:37 -0700
>>>>>>> From: Geoff Canyon
>>>>>>> To: How to use LiveCode <[hidden email]>
>>>>>>> Subject: Re: How to find the offset of the last instance of a
>>>>>>>    repeating       character in a string?
>>>>>>>
>>>>>>> I was curious if using the itemDelimiter might work for this, so I
>>>> wrote
>>>>>>> the below code out of curiosity; but in my quick testing with
>>>> single-byte
>>>>>>> characters it was only about 30% faster than the above methods, so I
>>>>>> didn't
>>>>>>> bother to post it.
>>>>>>>
>>>>>>> But Ben Rubinstein just posted about a terrible slow-down doing
>> pretty
>>>>>> much
>>>>>>> this same thing for text with unicode characters. So I ran a simple
>>>> test
>>>>>>> with 8000 character long strings that start with a single unicode
>>>>>>> character, this is about 15x faster than offset() with skip. For
>>>>>>> 100,000-character lines it's about 300x faster, so it seems to be
>>>> immune
>>>>>> to
>>>>>>> the line-painter issues skip is subject to. So for what it's worth:
>>>>>>>
>>>>>>> function offsetList D,S
>>>>>>> -- returns a comma-delimited list of the offsets of D in S
>>>>>>> set the itemDel to D
>>>>>>> repeat for each item i in S
>>>>>>>   add length(i) + 1 to C
>>>>>>>   put C,"" after R
>>>>>>> end repeat
>>>>>>> set the itemDel to comma
>>>>>>> if char -1 of S is D then return char 1 to -2 of R
>>>>>>> put length(C) + 1 into lenC
>>>>>>> put length(R) into lenR
>>>>>>> if lenC = lenR then return 0
>>>>>>> return char 1 to lenR - lenC - 1 of R
>>>>>>> end offsetList
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> use-livecode mailing list
>>>>>> [hidden email]
>>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>>>> subscription preferences:
>>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>>>
>>>>> _______________________________________________
>>>>> use-livecode mailing list
>>>>> [hidden email]
>>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>>>
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> [hidden email]
>>>> Please visit this url to subscribe, unsubscribe and manage your
>>>> subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>>
>>> _______________________________________________
>>> use-livecode mailing list
>>> [hidden email]
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string? (Geoff Canyon)

Pi Digital via use-livecode
In reply to this post by Pi Digital via use-livecode
On 11/4/18 4:45 PM, Brian Milby via use-livecode wrote:
> My updated solution always looks for overlap but if none are found it uses optimized versions of the search (private functions instead of inside the main function). I special case for no overlap and a single overlap in the delimiter. It is about the same speed as Geoff’s.

Nice. I tried to get tricky and replace that 'replace with' loop with a
'repeat for each' loop, but ended up about 20% slower. Not at all what I
expected.

--
  Mark Wieder
  [hidden email]

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
12