How to find the offset of the last instance of a repeating character in a string?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
Folks,
Is there a simple way to find the offset of a character from the ‘right’ end of a string, rather than the beginning - or alternatively get a list of all occurrences?

I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character.

Thanks & regards,
Keith
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
On Oct 29, 2018, at 9:32 AM, Keith Clarke via use-livecode <[hidden email]> wrote:
>
> Folks,
> Is there a simple way to find the offset of a character from the ‘right’ end of a string, rather than the beginning - or alternatively get a list of all occurrences?
>
> I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character.
>
> Thanks & regards,
> Keith


There was a discussion on this topic on the list a few years ago, and I saved these functions in my script library:

From Peter Brigham:
These are utility functions I use constantly for text processing. Offsets(str,cntr) returns a comma-delimited list of all the offsets of str in ctnr. Lineoffsets(str,cntr) does the same with lineoffsets. Then you can interate over the list of offsets to do whatever you want to each instance of str in cntr. I keep them in a utility stack that is in the stackinuse, so it is available to all stacks. I don't use regex, as I have never gotten the regex syntax to stick in my head firmly enough to find it natural, and in any case doing it by script turns out to be as fast or faster.

Peter's lineOffsets function returns a line number for each found char offset. I added a function that returns only unique line numbers.

function offsets str,cntr
    -- returns a comma-delimited list of
    -- all the offsets of str in cntr
    put "" into oList
    put 0 into startPoint
    repeat
        put offset(str,cntr,startPoint) into os
        if os = 0 then exit repeat
        add os to startPoint
        put startPoint & "," after oList
    end repeat
    if oList = "" then return "0"
    return item 1 to -1 of oList
end offsets

function lineOffsetsAll str,cntr
    -- returns a comma-delimited list of
    -- all the lineoffsets of str in cntr
    # (returns a line number for ALL instances)
    put offsets(str,cntr) into charList
    if charList = "0" then return "0"
    put the number of items of charList into nbr
    put "" into oList
    repeat for each item n in charList
        put the number of lines of (char 1 to n of cntr) \
                & "," after oList
    end repeat
    return item 1 to -1 of oList
end lineOffsetsAll
   
# added by Devin Asay
function lineOffsets pStr,pSearchTxt
    # (returns only unique line numbers)
    put empty into tList
    put 0 into tStartLine
    repeat
        put lineOffset(pStr,pSearchTxt,tStartLine) into tLineNum
        if tLineNum = 0 then exit repeat
        add tLineNum to tStartLine
        put tStartLine & "," after tList
    end repeat
    if tList is empty then return "0"
    return item 1 to -1 of tList
end lineOffsets

Hope this helps.

Devin

Devin Asay
Director
Office of Digital Humanities
Brigham Young University

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
Perfect, thanks Devin - I was hoping to see ‘offsets’ in the docs under ‘offset’, so this will do nicely! :-)
Best,
Keith  

> On 29 Oct 2018, at 15:49, Devin Asay via use-livecode <[hidden email]> wrote:
>
> On Oct 29, 2018, at 9:32 AM, Keith Clarke via use-livecode <[hidden email]> wrote:
>>
>> Folks,
>> Is there a simple way to find the offset of a character from the ‘right’ end of a string, rather than the beginning - or alternatively get a list of all occurrences?
>>
>> I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character.
>>
>> Thanks & regards,
>> Keith
>
>
> There was a discussion on this topic on the list a few years ago, and I saved these functions in my script library:
>
> From Peter Brigham:
> These are utility functions I use constantly for text processing. Offsets(str,cntr) returns a comma-delimited list of all the offsets of str in ctnr. Lineoffsets(str,cntr) does the same with lineoffsets. Then you can interate over the list of offsets to do whatever you want to each instance of str in cntr. I keep them in a utility stack that is in the stackinuse, so it is available to all stacks. I don't use regex, as I have never gotten the regex syntax to stick in my head firmly enough to find it natural, and in any case doing it by script turns out to be as fast or faster.
>
> Peter's lineOffsets function returns a line number for each found char offset. I added a function that returns only unique line numbers.
>
> function offsets str,cntr
>    -- returns a comma-delimited list of
>    -- all the offsets of str in cntr
>    put "" into oList
>    put 0 into startPoint
>    repeat
>        put offset(str,cntr,startPoint) into os
>        if os = 0 then exit repeat
>        add os to startPoint
>        put startPoint & "," after oList
>    end repeat
>    if oList = "" then return "0"
>    return item 1 to -1 of oList
> end offsets
>
> function lineOffsetsAll str,cntr
>    -- returns a comma-delimited list of
>    -- all the lineoffsets of str in cntr
>    # (returns a line number for ALL instances)
>    put offsets(str,cntr) into charList
>    if charList = "0" then return "0"
>    put the number of items of charList into nbr
>    put "" into oList
>    repeat for each item n in charList
>        put the number of lines of (char 1 to n of cntr) \
>                & "," after oList
>    end repeat
>    return item 1 to -1 of oList
> end lineOffsetsAll
>
> # added by Devin Asay
> function lineOffsets pStr,pSearchTxt
>    # (returns only unique line numbers)
>    put empty into tList
>    put 0 into tStartLine
>    repeat
>        put lineOffset(pStr,pSearchTxt,tStartLine) into tLineNum
>        if tLineNum = 0 then exit repeat
>        add tLineNum to tStartLine
>        put tStartLine & "," after tList
>    end repeat
>    if tList is empty then return "0"
>    return item 1 to -1 of tList
> end lineOffsets
>
> Hope this helps.
>
> Devin
>
> Devin Asay
> Director
> Office of Digital Humanities
> Brigham Young University
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
Looks like Devin beat me to it. :-)

Bob S


> On Oct 29, 2018, at 08:49 , Devin Asay via use-livecode <[hidden email]> wrote:
>
> On Oct 29, 2018, at 9:32 AM, Keith Clarke via use-livecode <[hidden email]> wrote:
>>
>> Folks,
>> Is there a simple way to find the offset of a character from the ‘right’ end of a string, rather than the beginning - or alternatively get a list of all occurrences?
>>
>> I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character.
>>
>> Thanks & regards,
>> Keith
>
>
> There was a discussion on this topic on the list a few years ago, and I saved these functions in my script library:
>
> From Peter Brigham:
> These are utility functions I use constantly for text processing. Offsets(str,cntr) returns a comma-delimited list of all the offsets of str in ctnr. Lineoffsets(str,cntr) does the same with lineoffsets. Then you can interate over the list of offsets to do whatever you want to each instance of str in cntr. I keep them in a utility stack that is in the stackinuse, so it is available to all stacks. I don't use regex, as I have never gotten the regex syntax to stick in my head firmly enough to find it natural, and in any case doing it by script turns out to be as fast or faster.
>
> Peter's lineOffsets function returns a line number for each found char offset. I added a function that returns only unique line numbers.
>
> function offsets str,cntr
>    -- returns a comma-delimited list of
>    -- all the offsets of str in cntr
>    put "" into oList
>    put 0 into startPoint
>    repeat
>        put offset(str,cntr,startPoint) into os
>        if os = 0 then exit repeat
>        add os to startPoint
>        put startPoint & "," after oList
>    end repeat
>    if oList = "" then return "0"
>    return item 1 to -1 of oList
> end offsets
>
> function lineOffsetsAll str,cntr
>    -- returns a comma-delimited list of
>    -- all the lineoffsets of str in cntr
>    # (returns a line number for ALL instances)
>    put offsets(str,cntr) into charList
>    if charList = "0" then return "0"
>    put the number of items of charList into nbr
>    put "" into oList
>    repeat for each item n in charList
>        put the number of lines of (char 1 to n of cntr) \
>                & "," after oList
>    end repeat
>    return item 1 to -1 of oList
> end lineOffsetsAll
>
> # added by Devin Asay
> function lineOffsets pStr,pSearchTxt
>    # (returns only unique line numbers)
>    put empty into tList
>    put 0 into tStartLine
>    repeat
>        put lineOffset(pStr,pSearchTxt,tStartLine) into tLineNum
>        if tLineNum = 0 then exit repeat
>        add tLineNum to tStartLine
>        put tStartLine & "," after tList
>    end repeat
>    if tList is empty then return "0"
>    return item 1 to -1 of tList
> end lineOffsets
>
> Hope this helps.
>
> Devin
>
> Devin Asay
> Director
> Office of Digital Humanities
> Brigham Young University
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
In dBase/Foxpro they had an AT function synonymous (roughly) with our offset function. They also had a RAT (Reverse AT) function. I needed something like this many moons ago.

What I did to get all occurrences is I have a "pointer" variable I maintain with the position of the first character after the last instance of the string found. But to get the actual position in the original text, you have to add the pointer to the offset like so:

put 0 into tPointer
repeat
        put offset(tVar, tTextChunk, tPointer) into tNextPos
        if tNextPos = 0 then exit repeat
        add tPointer to tNextPos
        put char tNextPos to tNextPos + length(tVar) of tTextChunk into aFoundChunks [tNextPos] [length(tVar)]
        put tNextPos + length(tVar) +1 into tPointer
end repeat

Something along those lines. Not tested, but you get the idea.

Bob S


> On Oct 29, 2018, at 08:32 , Keith Clarke via use-livecode <[hidden email]> wrote:
>
> Folks,
> Is there a simple way to find the offset of a character from the ‘right’ end of a string, rather than the beginning - or alternatively get a list of all occurrences?
>
> I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character.
>
> Thanks & regards,
> Keith
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
On 10/29/2018 08:32 AM, Keith Clarke via use-livecode wrote:

> I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character.

function rightmostSlashOf pText
    set the itemdelimiter to "/"
    return offset(item -1 of pText, pText)
end rightmostSlashOf

--
  Mark Wieder
  [hidden email]

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
That will only give him the item, not the character position. But it's a start. You can now get the number of characters of item 1 to -2 of pText +1. I didn't know the text you were searching had regular delimiters, and you were searching for the last delimiter. That makes things *much* easier.

Bob S


> On Oct 29, 2018, at 15:32 , Mark Wieder via use-livecode <[hidden email]> wrote:
>
> On 10/29/2018 08:32 AM, Keith Clarke via use-livecode wrote:
>
>> I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character.
>
> function rightmostSlashOf pText
>   set the itemdelimiter to "/"
>   return offset(item -1 of pText, pText)
> end rightmostSlashOf
>
> --
> Mark Wieder
> [hidden email]
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
On 10/29/2018 03:55 PM, Bob Sneidar via use-livecode wrote:
> That will only give him the item, not the character position.

Nope. It returns the position.

--
  Mark Wieder
  [hidden email]

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
Oh right you are!

Bob S


> On Oct 29, 2018, at 16:04 , Mark Wieder via use-livecode <[hidden email]> wrote:
>
> On 10/29/2018 03:55 PM, Bob Sneidar via use-livecode wrote:
>> That will only give him the item, not the character position.
>
> Nope. It returns the position.
>
> --
> Mark Wieder
> [hidden email]
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
"toplevel/somename/another/somename"


On 29/10/2018 22:32, Mark Wieder via use-livecode wrote:

> On 10/29/2018 08:32 AM, Keith Clarke via use-livecode wrote:
>
>> I’m trying to separate paths & pages from a list of URLs and so
>> looking to identify the position of the last ‘/‘ character.
>
> function rightmostSlashOf pText
>    set the itemdelimiter to "/"
>    return offset(item -1 of pText, pText)
> end rightmostSlashOf
>


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
On 29/10/2018 22:32, Mark Wieder via use-livecode wrote:
> On 10/29/2018 08:32 AM, Keith Clarke via use-livecode wrote:
>
>> I’m trying to separate paths & pages from a list of URLs and so
>> looking to identify the position of the last ‘/‘ character.
>
How about ....
> function rightmostSlashOf p
>    set the itemdelimiter to "/"
>    return  (thenumberofcharsinp) - (thenumberofcharsinitem-1 ofp)
> end rightmostSlashOf
>

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
In reply to this post by dunbarxx via use-livecode
On Tue, Oct 30, 2018 at 2:33 AM Keith Clarke via use-livecode
<[hidden email]> wrote:
>
> I’m trying to separate paths & pages from a list of URLs and so looking to identify the position of the last ‘/‘ character.
>
If that is all you are after then I think setting the itemDelimiter to
"/" and separating the 'item -1' (page) from 'items 1 to -2' (path)
would give you a very simple a readable solution.  The only problem is
if you have the unlikely but not impossible situation where you have
paths that contain no pages.  Because of the known gotcha with LC and
how it counts items when the last item is empty you may need to
include and 'if' statement.

Try this, create a new Stack with a field and a button.

Into the field load the following text:

https://www.my.org/assets/general/february/
https://www.my.org/assets/general/march/
https://www.my.org/assets/general/april/2018.zip
https://www.my.org/assets/general/may/2018.zip
https://www.my.org/assets/general/june/2018.zip
https://www.my.org/assets/general/july/2018.zip
https://www.my.org/assets/general/july/2017.html
https://www.my.org/assets/general/july/2016.text
https://www.my.org/assets/general/july/2015.jpg
https://www.my.org/assets/general/august/2018.zip
https://www.my.org/assets/general/september/2018.zip
https://www.my.org/assets/general/october/2018.zip
https://www.my.org/assets/general/november/
https://www.my.org/assets/general/december/

Into the button load the following script (be careful of line breaks
there are 16 lines of code):

on mouseUp
   put fld 1 into tText
   set the itemDelimiter to "/"
   repeat for each line tLine in tText
      if (char -1 of tLine = "/") then --usual problem with dealing
with empty last items
         put empty into tPath[tLine]
      else
         if (tPath[item 1 to -2 of tLine] = empty) then  --initial entry
            put item -1 of tLine into tPath[item 1 to -2 of tLine]
         else  --multiple entries
            put tPath[item 1 to -2 of tLine] & cr & item -1 of tLine
into tPath[item 1 to -2 of tLine]
         end if
      end if
   end repeat
   breakpoint
end mouseUp

There is breakpoint at the end so the script will pause and you can
inspect the variables.  You'll see that an array is created with each
unique path as a key and each page its element.  In the case of 'july'
you will see that four pages are all listed, one per line.

From there it should open a world of possibilities to arrange, sort
and sift through the paths.

HTH

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to find the offset of the last instance of a repeating character in a string?

dunbarxx via use-livecode
I was curious if using the itemDelimiter might work for this, so I wrote
the below code out of curiosity; but in my quick testing with single-byte
characters it was only about 30% faster than the above methods, so I didn't
bother to post it.

But Ben Rubinstein just posted about a terrible slow-down doing pretty much
this same thing for text with unicode characters. So I ran a simple test
with 8000 character long strings that start with a single unicode
character, this is about 15x faster than offset() with skip. For
100,000-character lines it's about 300x faster, so it seems to be immune to
the line-painter issues skip is subject to. So for what it's worth:

function offsetList D,S
   -- returns a comma-delimited list of the offsets of D in S
   set the itemDel to D
   repeat for each item i in S
      add length(i) + 1 to C
      put C,"" after R
   end repeat
   set the itemDel to comma
   if char -1 of S is D then return char 1 to -2 of R
   put length(C) + 1 into lenC
   put length(R) into lenR
   if lenC = lenR then return 0
   return char 1 to lenR - lenC - 1 of R
end offsetList

>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode