Quantcast

Searching "teh" or tihs"

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Searching "teh" or tihs"

Mark Talluto via use-livecode
Searching is important for your project?
Would you like to ask "Did you mean the?" if user searches "teh"?

I've implemented a fuzzySearch algorithm in LiveCode script:
http://forums.livecode.com/viewtopic.php?p=152202#p152202

Now if you wish to look up "the" or "this" then fuzzySearch will find
it (among others) by searching "teh" or tihs", with a penalty score of
one only for swapping the chars.


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searching "teh" or tihs"

Mark Talluto via use-livecode
hh,

This looks intriguing! I’m working on a commercial project that could use this. What is your license?

Peter Bogdanoff

On Mar 9, 2017, at 4:26 PM, hh via use-livecode <[hidden email]> wrote:

> Searching is important for your project?
> Would you like to ask "Did you mean the?" if user searches "teh"?
>
> I've implemented a fuzzySearch algorithm in LiveCode script:
> http://forums.livecode.com/viewtopic.php?p=152202#p152202
>
> Now if you wish to look up "the" or "this" then fuzzySearch will find
> it (among others) by searching "teh" or tihs", with a penalty score of
> one only for swapping the chars.
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searching "teh" or tihs"

Mark Talluto via use-livecode
In reply to this post by Mark Talluto via use-livecode
> Peter Bogdanoff wrote:
> This looks intriguing! I’m working on a commercial project that
> could use this. What is your license?

The code is based on pseudocode from
https://en.wikipedia.org/wiki/Damerau–Levenshtein_distance

From my side it's free for non-commercial use, I only wish to have
a citation.

For commercial use of my published scripts I would like to have
1. a citation
2. an "At-least-donation", one time, for the
+++++++++ CFFL = Community Fund for LiveCoders: +++++++++
For such LiveCoders who help the community such a lot in the forums
or here in the list and who really _need_ some money (I know some).

The donation for this script here should be _at least_ $10 (one time).

The fund is a new idea.
Certainly Richard Gaskin is willing to manage such a fund (assuming he
doesn't need such funding). OK Richard?



_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searching "teh" or tihs"

Mark Talluto via use-livecode
In reply to this post by Mark Talluto via use-livecode
Congratulations on the fuzzysearch. I don't know how you did it but for the
English language, I remember both soundex and its refinement called
metaphone, both algorithms are made for this kind of situation. I think
that Levenshtein distance based algos are the way to go for this stuff
these days but are a bit beyond of what I am used to developing...

On Thu, Mar 9, 2017 at 9:26 PM, hh via use-livecode <
[hidden email]> wrote:

> Searching is important for your project?
> Would you like to ask "Did you mean the?" if user searches "teh"?
>
> I've implemented a fuzzySearch algorithm in LiveCode script:
> http://forums.livecode.com/viewtopic.php?p=152202#p152202
>
> Now if you wish to look up "the" or "this" then fuzzySearch will find
> it (among others) by searching "teh" or tihs", with a penalty score of
> one only for swapping the chars.
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



--
http://www.andregarzia.com -- All We Do Is Code.
http://fon.nu -- minimalist url shortening service.
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searching "teh" or tihs"

Mark Talluto via use-livecode
In reply to this post by Mark Talluto via use-livecode
There is always the soundex() sql function. SELECT soundex('the') = soundex('teh') returns true. Not sure what the tolerance is though. Because of the arbitrary nature of languages, this really requires a lookup table for commonly mistyped words, with the ability to "learn" as corrections are made. Then you would need to be able to "uncorrect" or delete entries. Eventually you end up with something that is likely built into the OS already, so at that point it would be better to write an extension in C or Java.

Bob S


> On Mar 9, 2017, at 16:26 , hh via use-livecode <[hidden email]> wrote:
>
> Searching is important for your project?
> Would you like to ask "Did you mean the?" if user searches "teh"?
>
> I've implemented a fuzzySearch algorithm in LiveCode script:
> http://forums.livecode.com/viewtopic.php?p=152202#p152202
>
> Now if you wish to look up "the" or "this" then fuzzySearch will find
> it (among others) by searching "teh" or tihs", with a penalty score of
> one only for swapping the chars.
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Searching "teh" or tihs"

Mark Talluto via use-livecode
In reply to this post by Mark Talluto via use-livecode
1. The algorithm I implemented is for "fuzzy search" of written/typed words,
not for "similar sounding words" (soundex), mostly quite different. My demo
is scripted for looking up a (mistyped) search string in the 3233 keywords
of LCScript.

2. That's for me the true value of LiveCode:
Don't talk about possible development -- just do it. Then you have in a few
hours a solution which is working on Mac/Win/Linux, using LC 6/7/8/9, and
often fast enough even for RaspberryPi 2/3. Independent of current OS flavours.

If that solution is not good enough or not fast enough for you then you can
write C or java extensions. We have already a Java FFI available in LC 9-dp6!
I'm really looking forward to your solution.

In the meantime you can use my approach, it was updated today. I removed a
small bug in the percentage search, which wasn't sloppy enough ;-)

> Bob S. wrote:
> There is always the soundex() sql function. SELECT soundex('the') = soundex('teh') returns true. Not sure what the tolerance is though. Because of the arbitrary nature of languages, this really requires a lookup table for commonly mistyped words, with the ability to "learn" as corrections are made. Then you would need to be able to "uncorrect" or delete entries. Eventually you end up with something that is likely built into the OS already, so at that point it would be better to write an extension in C or Java.
>
> Bob S
>
>
> > hh wrote:
> >
> > Searching is important for your project?
> > Would you like to ask "Did you mean the?" if user searches "teh"?
> >
> > I've implemented a fuzzySearch algorithm in LiveCode script:
> > http://forums.livecode.com/viewtopic.php?p=152202#p152202
> >
> > Now if you wish to look up "the" or "this" then fuzzySearch will find
> > it (among others) by searching "teh" or tihs", with a penalty score of
> > one only for swapping the chars.
>

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Loading...