Finding common words and phrases in a large block of text?

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Numbering lines

Geoff Canyon via use-livecode
What's interesting with that is once again a comparison LC 6 against LC 9.

I tested with 10000 lines of text (King James bible, Genesis up to 5|16) and
*non-wrapping fields*, separator tab:

LC 9.0.1  needs in average 370 ms for numbering, 330 ms for denumbering,
LC 6.7.11 needs in average 170 ms for numbering, 140 ms for denumbering.

Without field update, only the text handling, this reduces to:

LC 9.0.1  needs in average 75 ms for numbering, 75 ms for denumbering,
LC 6.7.11 needs in average 45 ms for numbering, 18 ms for denumbering.

So, LC 9 has become pretty fast, but reaches, at least with arrays and field
updates, not yet the speed of LC 6 (hardware used: Mac mini, 2.5 GHz).



_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Numbering lines

Geoff Canyon via use-livecode
In reply to this post by Geoff Canyon via use-livecode
My apologies Hermann. I had not been following the original thread
closely, and got confused by the embedded quoting in the later messages.
I was looking at Geoff's code - not yours - and he explicitly said "

And of course if retaining the order isn't critical "


So I agree your array method does indeed work properly. However, it is
rather 'splendidly' slow compared to the simpler method as recommended
by Mark H and myself :-); it takes almost twice as long on my test cases
(between 30,000 and 300,000 lines of moderate length, highly repetitive,
9.0.0, Mac Book Pro circa 2011).
Full code below ....

-- Alex
on mouseUp
    local tData1, tData2, time1, temp, tText
    repeat 30000 times
       put "this is a medium length line that can be updated" &CR after
tText
    end repeat

    local tWith
    put prependLineNumbers(tText) into tWith
    put prependLineNumbers2(tText) into tWith

    --   put prependLineNumbersProgress(tText) into tWith

end mouseup
function prependLineNumbers pText
    local timeLastUpdated, time1, time2, temp
    local tCount
    put the number of lines in pText into tCount
    put the millisecs into time1
    local I
    repeat for each line L in pText
       add 1 to I
       put I && L &CR after temp
    end repeat
    put tCount && the millisecs - time1 &CR after msg
    return temp
end prependLineNumbers

function prependLineNumbers2 pText
    local timeLastUpdated, time1, time2, temp
    local tCount, S, K
    put the number of lines in pText into tCount
    put the millisecs into time1

    split pText by return
    put the keys of pText into K
    sort K numeric
    repeat for each line L in K
       put cr & L && pText[L] after S --> change separator here
    end repeat

    put tCount && the millisecs - time1 &CR after msg
    return char 2 to -1 of S
end prependLineNumbers2



On 28/10/2018 20:06, hh via use-livecode wrote:

>> Alex T. wrote:
>> You require to keep the line ordering completely unchanged -
>> and Hermann's superfast method can't meet that need.
>> JLG wrote:
>> You're right, split deletes duplicates. In fact, I use it as a quick way
>> to do just that.
> You are both spendidly wrong:
> Could you please simply try my functions and read the dictionary in order
> to understand why you are wrong? Please!
>
> Is it not yet Halloween ...
>
> -- D is the separator for numbers and text lines, usually space or ": "
> -- T is the input text, delimited with return
> -- prepends the number and separator to each line:
> function addLineNumbers D,T
>    split T by return
>    put the keys of T into K
>    sort K numeric
>    repeat for each line L in K
>      put cr & L & D & T[L] after S
>    end repeat
>    return char 2 to -1 of S
> end addLineNumbers
>
> -- D is the separator for numbers and text lines, usually space or ": "
> -- T is the input text, delimited with return
> -- removes the number and separator from each line:
> function removeLineNumbers D,T
>    split T by return and D
>    put the keys of T into K
>    sort K numeric
>    repeat for each line L in K
>      put cr & T[L] after S
>    end repeat
>    return char 2 to -1 of S
> end removeLineNumbers
>
>
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Numbering lines

Geoff Canyon via use-livecode
On Sun, Oct 28, 2018 at 5:27 PM Alex Tweedly via use-livecode <
[hidden email]> wrote:

> My apologies Hermann. I had not been following the original thread
> closely, and got confused by the embedded quoting in the later messages.
> I was looking at Geoff's code - not yours - and he explicitly said "
>
> And of course if retaining the order isn't critical "
>

We have gone wildly down a troublesome path, but for the record, I posted
two different sets of code. The first bit was functions based on Hermann's
original code, and retained the order given. The second code was the
simplified and not-order-retaining version as an alternative.
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Numbering lines

Geoff Canyon via use-livecode
(That said, yeah, repeat for each line is going to be faster)

On Sun, Oct 28, 2018 at 5:37 PM Geoff Canyon <[hidden email]> wrote:

>
>
> On Sun, Oct 28, 2018 at 5:27 PM Alex Tweedly via use-livecode <
> [hidden email]> wrote:
>
>> My apologies Hermann. I had not been following the original thread
>> closely, and got confused by the embedded quoting in the later messages.
>> I was looking at Geoff's code - not yours - and he explicitly said "
>>
>> And of course if retaining the order isn't critical "
>>
>
> We have gone wildly down a troublesome path, but for the record, I posted
> two different sets of code. The first bit was functions based on Hermann's
> original code, and retained the order given. The second code was the
> simplified and not-order-retaining version as an alternative.
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Numbering lines

Geoff Canyon via use-livecode
In reply to this post by Geoff Canyon via use-livecode
Alex,

you and JLG are important LiveCoders. What you say has double weight.
From that alone you should double check what you claim to be true.

Wrong assertions are no argument against a method but speed is one,
of course.

Anyway, it is fine that David G. has now a fast way to do his work.
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Numbering lines

Geoff Canyon via use-livecode
On 10/28/18 9:15 PM, hh via use-livecode wrote:
> Alex,
>
> you and JLG are important LiveCoders. What you say has double weight.
> From that alone you should double check what you claim to be true.

Well, at least I have finally become "splendid" at something.

I can't decide whether you are being rude or just misunderstanding how
your words appear in a different culture. To an American, the above is
either an insult or a lecture.

You should probably not assign so much weight to what I say here.
Sometimes I'm just human.

--
Jacqueline Landman Gay         |     [hidden email]
HyperActive Software           |     http://www.hyperactivesw.com

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Numbering lines

Geoff Canyon via use-livecode
In reply to this post by Geoff Canyon via use-livecode
> JLG wrote:
>
> > hh wrote:
> > Alex,
> > you and JLG are important LiveCoders. What you say has double weight.
> > From that alone you should double check what you claim to be true.
>
> Well, at least I have finally become "splendid" at something.
>
> I can't decide whether you are being rude or just misunderstanding how
> your words appear in a different culture. To an American, the above is
> either an insult or a lecture.

The first two sentences are a praise, in every culture.

The third sentence is a simple advice, no "lecture".
If it is an insult for you (or Alex) then I deeply apologize.

Although I don't understand what's "insulting" with that.
It can't be the fact that I gave that advice: What's objectively
wrong that's wrong, no matter who said it.

> You should probably not assign so much weight to what I say here.
> Sometimes I'm just human.

You don't really want that ...


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Numbering lines

Geoff Canyon via use-livecode
Thanks to everyone who helped me on this.  Apologies if I misdirected the discussion, but even that was very instructive.

My mouth hung open when I saw the cost of updating the progress bar every time through a loop.  

I guess I have become sloppy about time saving because most operations seem ‘quick enough’ to me …and also it seems counterintuitive that adding a conditional line inside the loop would speed things up.  Useful information on arrays too.

Cheers,

David G
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Numbering lines

Geoff Canyon via use-livecode
In reply to this post by Geoff Canyon via use-livecode
Yeah, I said something that was wrong - and I did apologize and will
happily do so again. And I managed to get "two people with one stone"
and mis-describe both your and Geoff's valuable inputs.

So I apologize again to both of you, and anyone else I inadvertently
knocked on the way past.

But, as you say, the important thing is that David got a couple of good
answers, and anyone searching this thread in a few years time will see
the answer, and not the human frailty of communication :-)

Alex.

On 29/10/2018 02:15, hh via use-livecode wrote:

> Alex,
>
> you and JLG are important LiveCoders. What you say has double weight.
>  From that alone you should double check what you claim to be true.
>
> Wrong assertions are no argument against a method but speed is one,
> of course.
>
> Anyway, it is fine that David G. has now a fast way to do his work.
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Numbering lines

Geoff Canyon via use-livecode
In reply to this post by Geoff Canyon via use-livecode
Next goal, trying to be magnificant. :-)

Bob S


> On Oct 28, 2018, at 21:17 , J. Landman Gay via use-livecode <[hidden email]> wrote:
>
> Well, at least I have finally become "splendid" at something.


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
12