Searching for a word when it's more than one word

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Searching for a word when it's more than one word

Stephen MacLean via use-livecode
Hi All,

First, followed Keith Clarke’s thread and got a lot out of it, thank you all. That’s gone into my code snippets!

Now I know, the title is not technically true, if it’s 2 words, they are distinct and different. Maybe it’s because I’ve been banging my head against this and some other things too long and need to step back, but I’m having issues getting this all to work reliably.

I’m searching for town names in various text from a list of towns . Most names are one word, easy to find and count. Some names are 2 or 3 words, like East Hartford or West Palm Beach. Those go against distinct towns like Hartford and Palm Beach. Others have their names inside of other town names like Colchester and Chester.

"is among the words of” or "is among the trueWords of” works great to find single words, but only works on single words and doesn’t consider “Chester’s” to be ”Chester”, it isn't.

“is in” works great for finding multiple words like “East Hartford” and "West Palm Beach", finds “Chester” in “Chester’s” but also finds “chester” in “Colchester”.

At this point, I’ve been using different methods for single word towns vs multi-word towns and while generally effective, trying to accommodate for these and other oddities has made it a complete mess of code.

If someone has done something similar, or can point me in the right direction, it would be greatly appreciated.

TIA,

Steve MacLean





_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
Very interesting Steve, your use case is actually very close to what I’m trying to achieve, which is to identify keywords and phrases within a corpus of text - think prioritised ’tag cloud’ metadata.

My original plan (as a non-programmer) was to identify the most popular unique words within the corpus and then go back in to find the words either side and check their popularity, etc.

However, from what I’ve learned here, my current pseudo-logic is:

1. Parse the whole source into 1, 2, 3 and 4 trueWord chunks (ideally in one pass but I’m still struggling with my array learning curve, so probably via lists & fields so I can see my workings)  
2. Remove lines containing noise words and any punctuation that would, by definition terminate the keyword/phrase
3. Count & deduplicate the remaining lines
4. Sense-check against a ‘current keywords’ list (which appears to resonate with your town names problem?)

From the unique words results I’ve found, I also note issues around singular/plural, synonyms, alternative spelling, etc. - which speak to ‘fuzzy logic’ or dare one mention NLP (as inNatural Language Processing) capabilities.

I wonder if anyone has experimented with LiveCode accessing / using any libraries for this kind of language processing - probably another Pandora’s box containing infinity + 1 cans of worms! :-)      

Back to basics, I’ll share my workings as I blunder forward and would welcome any insights the community experts have to offer.
Best,
Keith    

> On 1 Sep 2018, at 05:48, Stephen MacLean via use-livecode <[hidden email]> wrote:
>
> Hi All,
>
> First, followed Keith Clarke’s thread and got a lot out of it, thank you all. That’s gone into my code snippets!
>
> Now I know, the title is not technically true, if it’s 2 words, they are distinct and different. Maybe it’s because I’ve been banging my head against this and some other things too long and need to step back, but I’m having issues getting this all to work reliably.
>
> I’m searching for town names in various text from a list of towns . Most names are one word, easy to find and count. Some names are 2 or 3 words, like East Hartford or West Palm Beach. Those go against distinct towns like Hartford and Palm Beach. Others have their names inside of other town names like Colchester and Chester.
>
> "is among the words of” or "is among the trueWords of” works great to find single words, but only works on single words and doesn’t consider “Chester’s” to be ”Chester”, it isn't.
>
> “is in” works great for finding multiple words like “East Hartford” and "West Palm Beach", finds “Chester” in “Chester’s” but also finds “chester” in “Colchester”.
>
> At this point, I’ve been using different methods for single word towns vs multi-word towns and while generally effective, trying to accommodate for these and other oddities has made it a complete mess of code.
>
> If someone has done something similar, or can point me in the right direction, it would be greatly appreciated.
>
> TIA,
>
> Steve MacLean
>


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
In reply to this post by Stephen MacLean via use-livecode
On 2018-09-01 06:48, Stephen MacLean via use-livecode wrote:

> Hi All,
>
> First, followed Keith Clarke’s thread and got a lot out of it, thank
> you all. That’s gone into my code snippets!
>
> Now I know, the title is not technically true, if it’s 2 words, they
> are distinct and different. Maybe it’s because I’ve been banging my
> head against this and some other things too long and need to step
> back, but I’m having issues getting this all to work reliably.
>
> I’m searching for town names in various text from a list of towns .
> Most names are one word, easy to find and count. Some names are 2 or 3
> words, like East Hartford or West Palm Beach. Those go against
> distinct towns like Hartford and Palm Beach. Others have their names
> inside of other town names like Colchester and Chester.

So the problem you are trying to solve sounds like this:

Given a source text TEXT, and a list of multi-word phrases PHRASES, find
the longest elements of PHRASES which occur in TEXT when reading from
left to right.

One way to do this is to preprocess the source TEXT and PHRASES, and
then iterate over it with back-tracking attempting to match each phrase
in the list.

Preprocessing can be done like this:

   // pText is arbitrary language text, where it presumed 'trueWord' will
extract
   // the words we can match against those in PHRASES
   command preprocessText pText, @rWords
     local tWords
     repeat for each trueWord tWord in pText
       -- normalize word variants - e.g. turn Chester's into Chester
       if tWord ends with "'s" then
         put char 1 to -3 of tWord into tWord
       else if ... then
         ...
       else if ... then
         ...
       end if
       put tWord into tWords[the number of elements in tWords + 1]
     end repeat
     put tWords into rWords
   end preprocessText

This gives a sequence of words, in order - where word variants have been
normalized to the 'root' word (the general operation here is called
'stemming' - in your case as you are dealing with fragments of proper
nouns - 's / s suffixes are probably good enough).

The processing for PHRASES is needed to ensure that they all follow a
consistent form:

   // pPhrases is presumed to be a return-delimited list of phrases
   command preprocessPhrases pPhrases, @rPhrases
     -- We accumulate phrases as the keys of tPhrasesA to eliminate
duplicates
     local tPhrasesA
     put empty into tPhrasesA

     local tPhrases
     repeat for each line tPhrase in pPhrases
       local tPhrase
       put empty into tPhrase
       repeat for each trueWord tWord in tPhrase
         put tWord & space after tPhrase
       end repeat
       delete the last char of tPhrase
       put true into tPhrasesA[tPhrase]
     end repeat

     put the keys of tPhrasesA into rPhrases
   end preprocessPhrases

This produces a return-delimited list of phrases, where the individual
words in each phrase are separated by a *single* space with all
punctuation stripped, and no phrase appears twice.

With this pre-processing (not the PHRASES pre-processing only needs to
be done once for any set of PHRASES to match). A naive search algorithm
would be:

   // pText should be a sequence array of words to search (we use an
array here because we need fast random access)
   // pPhrases should be a line delimited string-list of multi-word
phrases to find
   // rMatches will be a string-list of phrases which have been found
   command searchTextForPhrases pText, pPhrases, @rMatches
     local tMatchesA
     put empty into tMatchesA

     -- Our phrases are single-space delimited, so set the item delimiter
     set the itemDelimiter to space

     -- Loop through pText, by default we bump tIndex by one each time
     -- however, if a match is found, then we can skip the words
constituting
     -- the matched phrase.
     local tIndex
     put 1 into tIndex
     repeat until pText[tIndex] is empty
       -- Store the current longest match we have found starting at
tIndex
       local tCurrentMatch
       put empty into tCurrentMatch

       -- Check each phrase in turn for a match.
       repeat for each line tPhrase in pPhrases
         -- Assume a match succeeds until it doesn't
         local tPhraseMatched
         put true into tPhraseMatched

         -- Iterate through the items (words) in each phrase, if the
sequence of
         -- words in the phrase is not the same as the sequence of words
in the text
         -- starting at tIndex, then tPhraseMatched will be false on exit
of the loop.
         local tSubIndex
         put tIndex into tSubIndex
         repeat for each item tWord in tPhrase
           -- Failure to match the word at tSubIndex is failure to match
the phrase
           if pText[tSubIndex] is not tWord then
             put false into tPhraseMatched
             exit repeat
           end if

           -- The current word of the phrase matches, so move to the
nbext
           add 1 to tSubIndex
         end repeat

         -- We are only interested in the longest match at any point, so
only update
         -- the current match if it is longer.
         if tPhraseMatched and the number of items in tPhrase > the
number of items in tCurrentMatch then
           put tPhrase into tCurrentMatch
         end if
       end repeat

       -- If a match was found, then we have used up those words in
pText, otherwise
       -- we start the search again at the next word in pText.
       if tCurrentMatch is not empty then
         add the number of items in tCurrentMatch to tIndex
         put true into tMatchesA[tCurrentMatch]
       else
         add 1 to tIndex
       end if
     end repeat

     -- At this point, the matched phrases are simply the keys of
tMatchesA
     put the keys of tMatchesA into rMatches
   end searchTextForPhrases

Complexity wise, the above requires roughly N*M steps - where N is the
number of words in pText, M is the number of words in pPhrases.

An immediate improvement can be made by sorting pPhrases descending by
the number of items - this then eliminates the need to check phrase
match length - the first match will always be the longest meaning once
you have matched, you don't need to keep iterating through the phrases.

     ...
     put the keys of tPhrasesA into rPhrases
     sort lines of rPhrases numeric descending by the number of items in
each
   end preprocessPhrases

     ...
     -- We are only interested in the longest match at any point, so only
update
     -- the current match if it is longer.
     if tPhraseMatched then
       put tPhrase into tCurrentMatch
       exit repeat
     end if
   end repeat

There's a lot more that can be done here to make the above a great deal
more efficient (algorithmically-wise). Indeed, the best you can achieve
is probably N*K steps for a source text containing N words - where K is
the maximum difference in length between any two phrases which share a
common prefix.

Warmest Regards,

Mark.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
Obviously, when considering names of places such as Colchester,
Rochester and Chester one has
to search for the longer names first and exclude them from later searches.

Richmond.

On 1/9/2018 12:59 pm, Mark Waddingham via use-livecode wrote:

> On 2018-09-01 06:48, Stephen MacLean via use-livecode wrote:
>> Hi All,
>>
>> First, followed Keith Clarke’s thread and got a lot out of it, thank
>> you all. That’s gone into my code snippets!
>>
>> Now I know, the title is not technically true, if it’s 2 words, they
>> are distinct and different. Maybe it’s because I’ve been banging my
>> head against this and some other things too long and need to step
>> back, but I’m having issues getting this all to work reliably.
>>
>> I’m searching for town names in various text from a list of towns .
>> Most names are one word, easy to find and count. Some names are 2 or 3
>> words, like East Hartford or West Palm Beach. Those go against
>> distinct towns like Hartford and Palm Beach. Others have their names
>> inside of other town names like Colchester and Chester.
>
> So the problem you are trying to solve sounds like this:
>
> Given a source text TEXT, and a list of multi-word phrases PHRASES,
> find the longest elements of PHRASES which occur in TEXT when reading
> from left to right.
>
> One way to do this is to preprocess the source TEXT and PHRASES, and
> then iterate over it with back-tracking attempting to match each
> phrase in the list.
>
> Preprocessing can be done like this:
>
>   // pText is arbitrary language text, where it presumed 'trueWord'
> will extract
>   // the words we can match against those in PHRASES
>   command preprocessText pText, @rWords
>     local tWords
>     repeat for each trueWord tWord in pText
>       -- normalize word variants - e.g. turn Chester's into Chester
>       if tWord ends with "'s" then
>         put char 1 to -3 of tWord into tWord
>       else if ... then
>         ...
>       else if ... then
>         ...
>       end if
>       put tWord into tWords[the number of elements in tWords + 1]
>     end repeat
>     put tWords into rWords
>   end preprocessText
>
> This gives a sequence of words, in order - where word variants have
> been normalized to the 'root' word (the general operation here is
> called 'stemming' - in your case as you are dealing with fragments of
> proper nouns - 's / s suffixes are probably good enough).
>
> The processing for PHRASES is needed to ensure that they all follow a
> consistent form:
>
>   // pPhrases is presumed to be a return-delimited list of phrases
>   command preprocessPhrases pPhrases, @rPhrases
>     -- We accumulate phrases as the keys of tPhrasesA to eliminate
> duplicates
>     local tPhrasesA
>     put empty into tPhrasesA
>
>     local tPhrases
>     repeat for each line tPhrase in pPhrases
>       local tPhrase
>       put empty into tPhrase
>       repeat for each trueWord tWord in tPhrase
>         put tWord & space after tPhrase
>       end repeat
>       delete the last char of tPhrase
>       put true into tPhrasesA[tPhrase]
>     end repeat
>
>     put the keys of tPhrasesA into rPhrases
>   end preprocessPhrases
>
> This produces a return-delimited list of phrases, where the individual
> words in each phrase are separated by a *single* space with all
> punctuation stripped, and no phrase appears twice.
>
> With this pre-processing (not the PHRASES pre-processing only needs to
> be done once for any set of PHRASES to match). A naive search
> algorithm would be:
>
>   // pText should be a sequence array of words to search (we use an
> array here because we need fast random access)
>   // pPhrases should be a line delimited string-list of multi-word
> phrases to find
>   // rMatches will be a string-list of phrases which have been found
>   command searchTextForPhrases pText, pPhrases, @rMatches
>     local tMatchesA
>     put empty into tMatchesA
>
>     -- Our phrases are single-space delimited, so set the item delimiter
>     set the itemDelimiter to space
>
>     -- Loop through pText, by default we bump tIndex by one each time
>     -- however, if a match is found, then we can skip the words
> constituting
>     -- the matched phrase.
>     local tIndex
>     put 1 into tIndex
>     repeat until pText[tIndex] is empty
>       -- Store the current longest match we have found starting at tIndex
>       local tCurrentMatch
>       put empty into tCurrentMatch
>
>       -- Check each phrase in turn for a match.
>       repeat for each line tPhrase in pPhrases
>         -- Assume a match succeeds until it doesn't
>         local tPhraseMatched
>         put true into tPhraseMatched
>
>         -- Iterate through the items (words) in each phrase, if the
> sequence of
>         -- words in the phrase is not the same as the sequence of
> words in the text
>         -- starting at tIndex, then tPhraseMatched will be false on
> exit of the loop.
>         local tSubIndex
>         put tIndex into tSubIndex
>         repeat for each item tWord in tPhrase
>           -- Failure to match the word at tSubIndex is failure to
> match the phrase
>           if pText[tSubIndex] is not tWord then
>             put false into tPhraseMatched
>             exit repeat
>           end if
>
>           -- The current word of the phrase matches, so move to the nbext
>           add 1 to tSubIndex
>         end repeat
>
>         -- We are only interested in the longest match at any point,
> so only update
>         -- the current match if it is longer.
>         if tPhraseMatched and the number of items in tPhrase > the
> number of items in tCurrentMatch then
>           put tPhrase into tCurrentMatch
>         end if
>       end repeat
>
>       -- If a match was found, then we have used up those words in
> pText, otherwise
>       -- we start the search again at the next word in pText.
>       if tCurrentMatch is not empty then
>         add the number of items in tCurrentMatch to tIndex
>         put true into tMatchesA[tCurrentMatch]
>       else
>         add 1 to tIndex
>       end if
>     end repeat
>
>     -- At this point, the matched phrases are simply the keys of
> tMatchesA
>     put the keys of tMatchesA into rMatches
>   end searchTextForPhrases
>
> Complexity wise, the above requires roughly N*M steps - where N is the
> number of words in pText, M is the number of words in pPhrases.
>
> An immediate improvement can be made by sorting pPhrases descending by
> the number of items - this then eliminates the need to check phrase
> match length - the first match will always be the longest meaning once
> you have matched, you don't need to keep iterating through the phrases.
>
>     ...
>     put the keys of tPhrasesA into rPhrases
>     sort lines of rPhrases numeric descending by the number of items
> in each
>   end preprocessPhrases
>
>     ...
>     -- We are only interested in the longest match at any point, so
> only update
>     -- the current match if it is longer.
>     if tPhraseMatched then
>       put tPhrase into tCurrentMatch
>       exit repeat
>     end if
>   end repeat
>
> There's a lot more that can be done here to make the above a great
> deal more efficient (algorithmically-wise). Indeed, the best you can
> achieve is probably N*K steps for a source text containing N words -
> where K is the maximum difference in length between any two phrases
> which share a common prefix.
>
> Warmest Regards,
>
> Mark.
>

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
On 2018-09-01 12:05, Richmond Mathewson via use-livecode wrote:
> Obviously, when considering names of places such as Colchester,
> Rochester and Chester one has
> to search for the longer names first and exclude them from later
> searches.

The 'substring' problem (i.e. Chester being 'in' Rochester) isn't
relevant in the above algorithm because we are 'tokenising' input and
phrases - essentially changing the alphabet.

i.e. "Rochester Chester Colchester" is turned into ABC, and we match A,
B or C as atomic units.

I should perhaps point out that the 'processText' operation probably
needs to be a little better in practice - to at least include a 'stop'
token for punctuation. For example:

   "The man walked starting from East Hartford, West Hartford could be
seen in the distance."

In the case where 'Hartford West' and 'Hartford' are the 'known' towns
(and not 'East Hartford') - the proposed tokenization would result in:

   
The,man,walked,starting,from,East,Hartford,West,Hartford,could,be,seen,in,the,distance

Which means you'd get "Hartford West" and "Hartford" - when you should
only get "Hartford" (assuming you care about the linguistic structure of
the text, at least).

Indeed, the above actually means in preprocessing the text, you can
actually vastly reduce the number of words to search - any sequences of
words which aren't in any pharse (or important punctuation) can be
replaced by "*" say. So the above would become:

   *,East,Hartford,*,West,Hartford,*

The "*" tokens block matching multi-word phrases.

Warmest Regards,

Mark.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
That's because you lot tend to use a silver teaspoon while I tend to use
a great big shovel:

https://www.dropbox.com/s/00t8oftb1ydm8ni/Text%20analyzer%20X.livecode.zip?dl=0

Richmond.

On 1/9/2018 1:29 pm, Mark Waddingham via use-livecode wrote:

> On 2018-09-01 12:05, Richmond Mathewson via use-livecode wrote:
>> Obviously, when considering names of places such as Colchester,
>> Rochester and Chester one has
>> to search for the longer names first and exclude them from later
>> searches.
>
> The 'substring' problem (i.e. Chester being 'in' Rochester) isn't
> relevant in the above algorithm because we are 'tokenising' input and
> phrases - essentially changing the alphabet.
>
> i.e. "Rochester Chester Colchester" is turned into ABC, and we match
> A, B or C as atomic units.
>
> I should perhaps point out that the 'processText' operation probably
> needs to be a little better in practice - to at least include a 'stop'
> token for punctuation. For example:
>
>   "The man walked starting from East Hartford, West Hartford could be
> seen in the distance."
>
> In the case where 'Hartford West' and 'Hartford' are the 'known' towns
> (and not 'East Hartford') - the proposed tokenization would result in:
>
> The,man,walked,starting,from,East,Hartford,West,Hartford,could,be,seen,in,the,distance
>
> Which means you'd get "Hartford West" and "Hartford" - when you should
> only get "Hartford" (assuming you care about the linguistic structure
> of the text, at least).
>
> Indeed, the above actually means in preprocessing the text, you can
> actually vastly reduce the number of words to search - any sequences
> of words which aren't in any pharse (or important punctuation) can be
> replaced by "*" say. So the above would become:
>
>   *,East,Hartford,*,West,Hartford,*
>
> The "*" tokens block matching multi-word phrases.
>
> Warmest Regards,
>
> Mark.
>

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
On 2018-09-01 12:35, Richmond Mathewson via use-livecode wrote:
> That's because you lot tend to use a silver teaspoon while I tend to
> use a great big shovel:
>
> https://www.dropbox.com/s/00t8oftb1ydm8ni/Text%20analyzer%20X.livecode.zip?dl=0

Heh, great big shovels are great for coarse work - e.g. for the problem
of finding occurrences of SINGLE WORD towns in the source text - as you
are in your stack.

However, in this case, that wasn't what was asked for - the problem was
to find multi-word town names with the constraints that first and
longest match always wins with no overlap (i.e. as a human would read
them):

i.e. East Hartford West Palm Beach Colchester Newchester West Chester

With a town list of

    East Hartford
    Hartford West
    West Palm Beach
    Palm Beach
    Chester
    West Chester

Should return:

    East Hartford
    West Palm Beach
    West Chester

Warmest Regards,

Mark.

P.S. The problem is actually exactly the same - in the single-word case
your alphabet are the characters in the language. In the multi-word
case, your alphabet is the set of words in all phrases, with a 'stop'
word.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
In reply to this post by Stephen MacLean via use-livecode
I can see that the "problem", which my stack does not address, is with 2
or 3 part place names:

The Rochester/Chester problem is easily dealt with.

While it should be realtively easy to have a subroutine to deal with
words such as "West" (after all, there are no places just called "West"),
places like a town my parents once lived in called "Haselbury Plucknett"
would cause problems.

AND, places such as "Ruyton of the Eleven Towns"
(https://en.wikipedia.org/wiki/Ruyton-XI-Towns)
would really throw a spanner in the works.

Come to think of things . . .

Unless anyone's code can cope with "Ruyton of the Eleven Towns" it won't
stand up: we could even go further and call
this the "Ruyton of the Eleven Towns Test".

More muffled background noises.

Richmond.

On 1/9/2018 1:29 pm, Mark Waddingham via use-livecode wrote:

> On 2018-09-01 12:05, Richmond Mathewson via use-livecode wrote:
>> Obviously, when considering names of places such as Colchester,
>> Rochester and Chester one has
>> to search for the longer names first and exclude them from later
>> searches.
>
> The 'substring' problem (i.e. Chester being 'in' Rochester) isn't
> relevant in the above algorithm because we are 'tokenising' input and
> phrases - essentially changing the alphabet.
>
> i.e. "Rochester Chester Colchester" is turned into ABC, and we match
> A, B or C as atomic units.
>
> I should perhaps point out that the 'processText' operation probably
> needs to be a little better in practice - to at least include a 'stop'
> token for punctuation. For example:
>
>   "The man walked starting from East Hartford, West Hartford could be
> seen in the distance."
>
> In the case where 'Hartford West' and 'Hartford' are the 'known' towns
> (and not 'East Hartford') - the proposed tokenization would result in:
>
> The,man,walked,starting,from,East,Hartford,West,Hartford,could,be,seen,in,the,distance
>
> Which means you'd get "Hartford West" and "Hartford" - when you should
> only get "Hartford" (assuming you care about the linguistic structure
> of the text, at least).
>
> Indeed, the above actually means in preprocessing the text, you can
> actually vastly reduce the number of words to search - any sequences
> of words which aren't in any pharse (or important punctuation) can be
> replaced by "*" say. So the above would become:
>
>   *,East,Hartford,*,West,Hartford,*
>
> The "*" tokens block matching multi-word phrases.
>
> Warmest Regards,
>
> Mark.
>

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
In reply to this post by Stephen MacLean via use-livecode
Yup: indeed: fairly coarse.

However, see my next posting re "Ruyton of the Eleven Towns"

that should make some folk feel that they need a set of sewing needles
rather than "just" a silver teaspoon.

Richmond.

On 1/9/2018 1:45 pm, Mark Waddingham via use-livecode wrote:

> On 2018-09-01 12:35, Richmond Mathewson via use-livecode wrote:
>> That's because you lot tend to use a silver teaspoon while I tend to
>> use a great big shovel:
>>
>> https://www.dropbox.com/s/00t8oftb1ydm8ni/Text%20analyzer%20X.livecode.zip?dl=0 
>>
>
> Heh, great big shovels are great for coarse work - e.g. for the
> problem of finding occurrences of SINGLE WORD towns in the source text
> - as you are in your stack.
>
> However, in this case, that wasn't what was asked for - the problem
> was to find multi-word town names with the constraints that first and
> longest match always wins with no overlap (i.e. as a human would read
> them):
>
> i.e. East Hartford West Palm Beach Colchester Newchester West Chester
>
> With a town list of
>
>    East Hartford
>    Hartford West
>    West Palm Beach
>    Palm Beach
>    Chester
>    West Chester
>
> Should return:
>
>    East Hartford
>    West Palm Beach
>    West Chester
>
> Warmest Regards,
>
> Mark.
>
> P.S. The problem is actually exactly the same - in the single-word
> case your alphabet are the characters in the language. In the
> multi-word case, your alphabet is the set of words in all phrases,
> with a 'stop' word.
>

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
On 2018-09-01 12:50, Richmond Mathewson via use-livecode wrote:
> Yup: indeed: fairly coarse.
>
> However, see my next posting re "Ruyton of the Eleven Towns"
>
> that should make some folk feel that they need a set of sewing needles
> rather than "just" a silver teaspoon.

I think you'll find my 'silver teaspoon' approach (as you put it) deals
with all those cases :D

Interestingly, as I said, the multi-word match problem can be reduced to
your 'shovel' - with pre and post processing.

Let's say that the phrase list is:

   Ruyton of the Eleven Towns
   East Hartfordshire
   Colchester
   Chester

First create a mapping from phrase words to individual characters (the
choice of character is arbitrary):

   Ruyton <-> A
   of <-> B
   the <-> C
   Eleven <-> D
   Towns <-> E
   East <-> F
   Hartfordshire <-> G
   Colchester <-> H
   Chester <-> I

Now iterate through the source text, generating an output source text
consisting of words from the new alphabet, and a 'unknown' letter '*'.
For example:

   The man from Ruyton of the Eleven Towns, who is of the order of
shovels, travelled from Chester to Colchester via the towns in East
Hartfordshire

Would become:

   C**ABCDE**BC*B***I*H**E*FG

The original phrase list is processed similarly to give:

   ABCDE
   FG
   H
   I

Searching the transformed source text using your algorithm with the list
of transformed phrases would give the correct set of found phrases as
required by the original problem.

Warmest Regards,

Mark.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
I've already shovelled Ruyton of the Eleven Towns quite effectively:

https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0

No tokenising, in fact very basic stuff indeed.

Not wishing to bang on about over-complcating things . . . . .

Probably time for both Thee and Me to get out and get some fresh air
before we ruin our weekends.

Richmond.

On 1/9/2018 2:05 pm, Mark Waddingham via use-livecode wrote:

> On 2018-09-01 12:50, Richmond Mathewson via use-livecode wrote:
>> Yup: indeed: fairly coarse.
>>
>> However, see my next posting re "Ruyton of the Eleven Towns"
>>
>> that should make some folk feel that they need a set of sewing needles
>> rather than "just" a silver teaspoon.
>
> I think you'll find my 'silver teaspoon' approach (as you put it)
> deals with all those cases :D
>
> Interestingly, as I said, the multi-word match problem can be reduced
> to your 'shovel' - with pre and post processing.
>
> Let's say that the phrase list is:
>
>   Ruyton of the Eleven Towns
>   East Hartfordshire
>   Colchester
>   Chester
>
> First create a mapping from phrase words to individual characters (the
> choice of character is arbitrary):
>
>   Ruyton <-> A
>   of <-> B
>   the <-> C
>   Eleven <-> D
>   Towns <-> E
>   East <-> F
>   Hartfordshire <-> G
>   Colchester <-> H
>   Chester <-> I
>
> Now iterate through the source text, generating an output source text
> consisting of words from the new alphabet, and a 'unknown' letter '*'.
> For example:
>
>   The man from Ruyton of the Eleven Towns, who is of the order of
> shovels, travelled from Chester to Colchester via the towns in East
> Hartfordshire
>
> Would become:
>
>   C**ABCDE**BC*B***I*H**E*FG
>
> The original phrase list is processed similarly to give:
>
>   ABCDE
>   FG
>   H
>   I
>
> Searching the transformed source text using your algorithm with the
> list of transformed phrases would give the correct set of found
> phrases as required by the original problem.
>
> Warmest Regards,
>
> Mark.
>

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
> I've already shovelled Ruyton of the Eleven Towns quite effectively:
>
> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0
>
> No tokenising, in fact very basic stuff indeed.
>
> Not wishing to bang on about over-complcating things . . . . .

Your revised approach is fine - as long as the names of all the towns
are distinct in terms of no one town's name is contained within another.

Add 'Palm Beach West' and 'Palm Beach' to your placeNames list; then
modify your source text to end 'or Palm Beach West' - and you algorithm
does not perform the requested operation.

It reports Palm Beach West *and* Palm Beach as being present - whereas,
only 'Palm Beach West' is present :D

Warmest Regards,

Mark.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode


On 1/9/2018 2:25 pm, Mark Waddingham via use-livecode wrote:

> On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
>> I've already shovelled Ruyton of the Eleven Towns quite effectively:
>>
>> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0 
>>
>>
>> No tokenising, in fact very basic stuff indeed.
>>
>> Not wishing to bang on about over-complcating things . . . . .
>
> Your revised approach is fine - as long as the names of all the towns
> are distinct in terms of no one town's name is contained within another.

Blast!

Of course "my next trick" is to work out how to delete multi-word names
(i.e. phrases) from a textField.

Richmond.

>
> Add 'Palm Beach West' and 'Palm Beach' to your placeNames list; then
> modify your source text to end 'or Palm Beach West' - and you
> algorithm does not perform the requested operation.
>
> It reports Palm Beach West *and* Palm Beach as being present -
> whereas, only 'Palm Beach West' is present :D
>
> Warmest Regards,
>
> Mark.
>


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
In reply to this post by Stephen MacLean via use-livecode
On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
> I've already shovelled Ruyton of the Eleven Towns quite effectively:
>
> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0
>
> No tokenising, in fact very basic stuff indeed.
>
> Not wishing to bang on about over-complcating things . . . . .

There is actually a 'correct' more shovelistic approach (at least I
*think* this is correct):

-- Ensure all punctuation is surrounded by space
repeat for each char tPuncChar in ",.';:()[]{}<>!@£$%^&*-_+=~`?/\|#€" &
quote
   replace tPuncChar with space & tPuncChar & space in tText
end repeat

-- Ensure all whitespace is space
replace return with space in tText
replace tab with space in tText

-- Ensure there is never two spaces next to each other in tText
repeat while tText contains "  "
   replace "  " with " " in tText
end repeat

-- Ensure there is only ever one space between words in phrases
repeat while tPhrases contains "  "
   replace "  " with " " in tPhrases
end repeat

-- We can now use an itemDelimiter of space
set the itemDelimiter to space

-- Sort the phrases by descending word length.
sort lines of tPhrases descending numeric by the number of items in each

-- Now check for, and remove each phrase from the source text in turn
set the wholeMatches to true
repeat for each line tPhrase in tPhrases
   -- If the phrase is not present then skip to the next
   if itemOffset(tPhrase, tText) is 0 then
     next repeat
   end if

   -- Accumulate the phrase on the output list
   put tPhrase & return after tFoundPhrases

   -- Remove the phrase from the input text (we assume here that * does
not appear in any phrase)
   replace tPhrase with "*" in tText
end repeat

Warmest Regards,

Mark.

P.S. The above will be reasonable quick for small sets of phrases /
small source texts - but I think as the size of either increases it will
get very slow, very quickly!

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
In reply to this post by Stephen MacLean via use-livecode
It didn't like this:

on mouseDown
    put empty into fld "zText"
    if fld "xText" contains "Ruyton of the Eleven Towns." then
       put fld "xText" into fld "zText"
       put "Ruyton of the Eleven Towns." into CHUNNK
put empty into CHUNNK of fld "zText"
       end if
*end mouseDown**
**
**Richmond.*

On 1/9/2018 2:25 pm, Mark Waddingham via use-livecode wrote:

> On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
>> I've already shovelled Ruyton of the Eleven Towns quite effectively:
>>
>> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0 
>>
>>
>> No tokenising, in fact very basic stuff indeed.
>>
>> Not wishing to bang on about over-complcating things . . . . .
>
> Your revised approach is fine - as long as the names of all the towns
> are distinct in terms of no one town's name is contained within another.
>
> Add 'Palm Beach West' and 'Palm Beach' to your placeNames list; then
> modify your source text to end 'or Palm Beach West' - and you
> algorithm does not perform the requested operation.
>
> It reports Palm Beach West *and* Palm Beach as being present -
> whereas, only 'Palm Beach West' is present :D
>
> Warmest Regards,
>
> Mark.
>

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
In reply to this post by Stephen MacLean via use-livecode


On 1/9/2018 2:50 pm, Mark Waddingham via use-livecode wrote:

> On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
>> I've already shovelled Ruyton of the Eleven Towns quite effectively:
>>
>> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0 
>>
>>
>> No tokenising, in fact very basic stuff indeed.
>>
>> Not wishing to bang on about over-complcating things . . . . .
>
> There is actually a 'correct' more shovelistic approach (at least I
> *think* this is correct):
>
> -- Ensure all punctuation is surrounded by space
> repeat for each char tPuncChar in ",.';:()[]{}<>!@£$%^&*-_+=~`?/\|#€"
> & quote
>   replace tPuncChar with space & tPuncChar & space in tText
> end repeat

Thats a "point" (pun intended) as I just fell foul of a full stop.

>
> -- Ensure all whitespace is space
> replace return with space in tText
> replace tab with space in tText
>
> -- Ensure there is never two spaces next to each other in tText
> repeat while tText contains "  "
>   replace "  " with " " in tText
> end repeat
>
> -- Ensure there is only ever one space between words in phrases
> repeat while tPhrases contains "  "
>   replace "  " with " " in tPhrases
> end repeat
>
> -- We can now use an itemDelimiter of space
> set the itemDelimiter to space
>
> -- Sort the phrases by descending word length.
> sort lines of tPhrases descending numeric by the number of items in each
>
> -- Now check for, and remove each phrase from the source text in turn
> set the wholeMatches to true
> repeat for each line tPhrase in tPhrases
>   -- If the phrase is not present then skip to the next
>   if itemOffset(tPhrase, tText) is 0 then
>     next repeat
>   end if
>
>   -- Accumulate the phrase on the output list
>   put tPhrase & return after tFoundPhrases
>
>   -- Remove the phrase from the input text (we assume here that * does
> not appear in any phrase)
>   replace tPhrase with "*" in tText
> end repeat
>
> Warmest Regards,
>
> Mark.
>
> P.S. The above will be reasonable quick for small sets of phrases /
> small source texts - but I think as the size of either increases it
> will get very slow, very quickly!
>


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
Wow, this is awesome, thank you all!!

Sorry, on the road taking my daughter to college, would love to try some of this out.

One thing to keep in mind is that as that I’m checking for names against the town list, I may not know what town I’m actually looking for. Usually i do, but not always.

Therefore i’ve been counting how many of each name I’ve come across and do some calculations at the end to make a best guess.

Really appreciate the responses!!

Thank you,

Steve

> On Sep 1, 2018, at 7:53 AM, Richmond Mathewson via use-livecode <[hidden email]> wrote:
>
>
>
>> On 1/9/2018 2:50 pm, Mark Waddingham via use-livecode wrote:
>>> On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
>>> I've already shovelled Ruyton of the Eleven Towns quite effectively:
>>>
>>> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0 
>>>
>>> No tokenising, in fact very basic stuff indeed.
>>>
>>> Not wishing to bang on about over-complcating things . . . . .
>>
>> There is actually a 'correct' more shovelistic approach (at least I *think* this is correct):
>>
>> -- Ensure all punctuation is surrounded by space
>> repeat for each char tPuncChar in ",.';:()[]{}<>!@£$%^&*-_+=~`?/\|#€" & quote
>>  replace tPuncChar with space & tPuncChar & space in tText
>> end repeat
>
> Thats a "point" (pun intended) as I just fell foul of a full stop.
>>
>> -- Ensure all whitespace is space
>> replace return with space in tText
>> replace tab with space in tText
>>
>> -- Ensure there is never two spaces next to each other in tText
>> repeat while tText contains "  "
>>  replace "  " with " " in tText
>> end repeat
>>
>> -- Ensure there is only ever one space between words in phrases
>> repeat while tPhrases contains "  "
>>  replace "  " with " " in tPhrases
>> end repeat
>>
>> -- We can now use an itemDelimiter of space
>> set the itemDelimiter to space
>>
>> -- Sort the phrases by descending word length.
>> sort lines of tPhrases descending numeric by the number of items in each
>>
>> -- Now check for, and remove each phrase from the source text in turn
>> set the wholeMatches to true
>> repeat for each line tPhrase in tPhrases
>>  -- If the phrase is not present then skip to the next
>>  if itemOffset(tPhrase, tText) is 0 then
>>    next repeat
>>  end if
>>
>>  -- Accumulate the phrase on the output list
>>  put tPhrase & return after tFoundPhrases
>>
>>  -- Remove the phrase from the input text (we assume here that * does not appear in any phrase)
>>  replace tPhrase with "*" in tText
>> end repeat
>>
>> Warmest Regards,
>>
>> Mark.
>>
>> P.S. The above will be reasonable quick for small sets of phrases / small source texts - but I think as the size of either increases it will get very slow, very quickly!
>>
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
In reply to this post by Stephen MacLean via use-livecode
There is a town in Texas called West, made infamous a few years ago by a
giant explosion. I don't think you can make assumptions about names of places.

Mark's suggestion to check for words ending in "s" will fail on many towns,
though apostrophe-s may be safe.
--
Jacqueline Landman Gay | [hidden email]
HyperActive Software | http://www.hyperactivesw.com
On September 1, 2018 5:49:30 AM Richmond Mathewson via use-livecode
<[hidden email]> wrote:

> I can see that the "problem", which my stack does not address, is with 2
> or 3 part place names:
>
> The Rochester/Chester problem is easily dealt with.
>
> While it should be realtively easy to have a subroutine to deal with
> words such as "West" (after all, there are no places just called "West"),
> places like a town my parents once lived in called "Haselbury Plucknett"
> would cause problems.
>
> AND, places such as "Ruyton of the Eleven Towns"
> (https://en.wikipedia.org/wiki/Ruyton-XI-Towns)
> would really throw a spanner in the works.
>
> Come to think of things . . .
>
> Unless anyone's code can cope with "Ruyton of the Eleven Towns" it won't
> stand up: we could even go further and call
> this the "Ruyton of the Eleven Towns Test".
>
> More muffled background noises.
>
> Richmond.
>
> On 1/9/2018 1:29 pm, Mark Waddingham via use-livecode wrote:
>> On 2018-09-01 12:05, Richmond Mathewson via use-livecode wrote:
>>> Obviously, when considering names of places such as Colchester,
>>> Rochester and Chester one has
>>> to search for the longer names first and exclude them from later
>>> searches.
>>
>> The 'substring' problem (i.e. Chester being 'in' Rochester) isn't
>> relevant in the above algorithm because we are 'tokenising' input and
>> phrases - essentially changing the alphabet.
>>
>> i.e. "Rochester Chester Colchester" is turned into ABC, and we match
>> A, B or C as atomic units.
>>
>> I should perhaps point out that the 'processText' operation probably
>> needs to be a little better in practice - to at least include a 'stop'
>> token for punctuation. For example:
>>
>> "The man walked starting from East Hartford, West Hartford could be
>> seen in the distance."
>>
>> In the case where 'Hartford West' and 'Hartford' are the 'known' towns
>> (and not 'East Hartford') - the proposed tokenization would result in:
>>
>> The,man,walked,starting,from,East,Hartford,West,Hartford,could,be,seen,in,the,distance
>>
>> Which means you'd get "Hartford West" and "Hartford" - when you should
>> only get "Hartford" (assuming you care about the linguistic structure
>> of the text, at least).
>>
>> Indeed, the above actually means in preprocessing the text, you can
>> actually vastly reduce the number of words to search - any sequences
>> of words which aren't in any pharse (or important punctuation) can be
>> replaced by "*" say. So the above would become:
>>
>> *,East,Hartford,*,West,Hartford,*
>>
>> The "*" tokens block matching multi-word phrases.
>>
>> Warmest Regards,
>>
>> Mark.
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
On 09/01/2018 08:39 AM, J. Landman Gay via use-livecode wrote:
> There is a town in Texas called West, made infamous a few years ago by a
> giant explosion. I don't think you can make assumptions about names of
> places.

And thus the distinction between West Texas and West, Texas.

--
  Mark Wieder
  [hidden email]

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Searching for a word when it's more than one word

Stephen MacLean via use-livecode
Thankfully, in my case, I do know what at least the state is:)

> On Sep 1, 2018, at 11:55 AM, Mark Wieder via use-livecode <[hidden email]> wrote:
>
>> On 09/01/2018 08:39 AM, J. Landman Gay via use-livecode wrote:
>> There is a town in Texas called West, made infamous a few years ago by a giant explosion. I don't think you can make assumptions about names of places.
>
> And thus the distinction between West Texas and West, Texas.
>
> --
> Mark Wieder
> [hidden email]
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
12