Translate metadata to field content

classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
Sorry, forgot that some html entities are not displayed in the list:

So assuming (unusual) should read

-- [1] you use the following for a link target
--     <p hidden>Xtarget10X<p>

where X is ("&" & "#1;") ..., the html translation of numToChar(1).

-- LC translates numToChar(1) to " " should read

-- LC translates numToChar(1) to "X", where X is ("&" & "#1;")
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
In reply to this post by matthias rebbe via use-livecode
So glad you chimed in, Mark. This is pretty impressive. I'll need to use the "for each element"
structure because my tags are not unique, but it still is much faster. When clicking a tag at
the top of the document that links to the last anchor at the bottom of the text, I get a timing
of about 25ms. If I omit the timing for loading the htmltext and the selection of the text at
the end of the handler it brings the timing to almost 0. The test text is long, but not nearly
as long as Bernd's sample.

I need to select the entire range of text covered by the metadata span, not just a single word.
I've got that working, but since we're on a roll here, I wonder if there's a more optimal way
to do it.

I'm using chars instead of codepoints because when I tried it, they both gave the same number.
Should I change that?

   put the styledText of fld 1 into tDataA
   put 0 into tTotalChars
   put 0 into tStartChar
   repeat with i = 1 to the number of elements in tDataA
     put tDataA[i]["runs"] into tRunsA
     repeat with j = 1 to the number of elements in tRunsA
       put tRunsA[j] into tRunA
       add the num of chars in tRunA["text"] to tTotalChars
       if tRunA["metadata"] is pTag then
         if tStartChar = 0 then
           put tTotalChars - len(tRunA["text"]) + 3 into tStartChar
         end if
       else if tStartChar > 0 then
         put tTotalChars - len(tRunA["text"]) into tEndChar
         select char tStartChar to tEndChar of fld 1
         select empty
         set the backcolor of char tStartChar to tEndChar of fld 1 to "yellow"
         return tStartChar & comma & tEndChar
       end if
     end repeat
   end repeat

Also, I had to add 3 to tStartChar to get the right starting point but I can't figure out why.
Otherwise it selects the last character before the metadata span as the starting point.


On 2/20/20 2:13 AM, Mark Waddingham via use-livecode wrote:

> Of course *all* three of my suggested approaches are wrong - I messed up the inner loop in each...
>
> On 2020-02-20 07:56, Mark Waddingham via use-livecode wrote:
>> NON-UNIQUE ANCHORS
>> repeat with i = 1 to the number of elements in tDataA
>>   local tRunsA
>>   put tDataA[i]["runs"] into tRunsA
>>   repeat with j = 1 to the number of elements in tRunsA
>>     if tRunsA[j]["metadata"] is tSearchText then
>>       repeat with m = 1 to j
>>         add the number of words of tRunsA[m]["text"] to tNumWords
>>         put true into tFlagExit
>>         exit repeat
>>       end repeat
>>     end if
>>   end repeat
>>   if tFlagExit then
>>     exit repeat
>>   end if
>> end repeat
>> select word tNumWords of line i of field "x"
>
> Should be:
>
>   repeat with i = 1 to the number of elements in tDataA
>     local tRunsA
>     put tDataA[i]["runs"] into tRunsA
>     repeat with j = 1 to the number of elements in tRunsA
>       if tRunsA[j]["metadata"] is tSearchText then
>         repeat with m = 1 to j
>           add the number of words of tRunsA[m]["text"] to tNumWords
>         end repeat
>         put true into tFlagExit
>         exit repeat
>       end if
>     end repeat
>     if tFlagExit then
>       exit repeat
>     end if
>   end repeat
>   select word tNumWords of line i of field "x"
>
>> UNIQUE ANCHORS
>
>> repeat for each key i in tDataA
>>   local tRunsA
>>   put tDataA[i]["runs"] into tRunsA
>>   repeat for each key j in tRunsA
>>     if tRunsA[j]["metadata"] is tSearchText then
>>       repeat with m = 1 to j
>>         add the number of words of tRunsA[m]["text"] to tNumWords
>>         put true into tFlagExit
>>         exit repeat
>>       end repeat
>>     end if
>>   end repeat
>>   if tFlagExit then
>>     exit repeat
>>   end if
>> end repeat
>> select word tNumWords of line i of field "x"
>
> Should be:
>
>   repeat for each key i in tDataA
>     local tRunsA
>     put tDataA[i]["runs"] into tRunsA
>     repeat for each key j in tRunsA
>       if tRunsA[j]["metadata"] is tSearchText then
>         repeat with m = 1 to j
>           add the number of words of tRunsA[m]["text"] to tNumWords
>         end repeat
>         put true into tFlagExit
>         exit repeat
>       end if
>     end repeat
>     if tFlagExit then
>       exit repeat
>     end if
>   end repeat
>   select word tNumWords of line i of field "x"
>
>> RUN WITH METADATA DEFINES SELECTION - NON-UNIQUE SEARCH
>>
>> repeat with i = 1 to the number of elements in tDataA
>>   local tRunsA
>>   put tDataA[i]["runs"] into tRunsA
>>   repeat with j = 1 to the number of elements in tRunsA
>>     local tRunA
>>     put tRunsA[j] into tRunA
>>     if tRunA["metadata"] is tSearchText then
>>       repeat with m = 1 to j - 1
>>         add the number of codeunits of tRunsA[m]["text"] to tNumCodeunitsBefore
>>         put the number of codeunits in tRunA["text"] into tNumCodeunits
>>         put true into tFlagExit
>>         exit repeat
>>       end repeat
>>     end if
>>   end repeat
>>   if tFlagExit then
>>     exit repeat
>>   end if
>> end repeat
>> select codeunit tNumCodeunitsBefore to tNumCodeunitsBefore +
>> tNumCodeunits - 1 of line i of field "x"
>
> Should be:
>
>   repeat with i = 1 to the number of elements in tDataA
>     local tRunsA
>     put tDataA[i]["runs"] into tRunsA
>     repeat with j = 1 to the number of elements in tRunsA
>       local tRunA
>       put tRunsA[j] into tRunA
>       if tRunA["metadata"] is tSearchText then
>         repeat with m = 1 to j - 1
>           add the number of codeunits of tRunsA[m]["text"] to tNumCodeunitsBefore
>         end repeat
>         put the number of codeunits in tRunA["text"] into tNumCodeunits
>         put true into tFlagExit
>         exit repeat
>       end if
>     end repeat
>     if tFlagExit then
>       exit repeat
>     end if
>   end repeat
>   select codeunit tNumCodeunitsBefore to tNumCodeunitsBefore + tNumCodeunits - 1 of line i of
> field "x"
>
> Oops!
>
> Mark.
>


--
Jacqueline Landman Gay         |     [hidden email]
HyperActive Software           |     http://www.hyperactivesw.com


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
On 2020-02-21 00:29, J. Landman Gay via use-livecode wrote:
> So glad you chimed in, Mark. This is pretty impressive. I'll need to
> use the "for each element" structure because my tags are not unique,
> but it still is much faster. When clicking a tag at the top of the
> document that links to the last anchor at the bottom of the text, I
> get a timing of about 25ms. If I omit the timing for loading the
> htmltext and the selection of the text at the end of the handler it
> brings the timing to almost 0. The test text is long, but not nearly
> as long as Bernd's sample.

Glad I could help - although to be fair, all I did was optimize what
Bernd (and Richard) had already proposed.

One thing I did notice through testing was that the actual styled
content
makes a great deal of difference to performance. I also tried against
the
DataGrid behavior (replicated several times), and then also against some
styled 'Lorem Ipsum' (https://loripsum.net/) of about the same length
(around
8Mb of htmlText, with the anchor being search for on the last word). The
difference is that the DG has many more style runs (unsurprisingly) and
almost all are single words. So timings need to be taken against a
representative sample of the data you are actually working with.

> I need to select the entire range of text covered by the metadata
> span, not just a single word. I've got that working, but since we're
> on a roll here, I wonder if there's a more optimal way to do it.

I did wonder if that would be the case...

> I'm using chars instead of codepoints because when I tried it, they
> both gave the same number. Should I change that?

Both characters and codepoints run the risk of requiring a linear scan
of
the string to calculate the length - strictly speaking his will occur if
the engine is not sure whether character / codepoint have a 1-1 map to
codeunits (for example if your string has Unicode chars and it hasn't
analysed it). Therefore you should definitely use codeunits.

> Also, I had to add 3 to tStartChar to get the right starting point but
> I can't figure out why. Otherwise it selects the last character before
> the metadata span as the starting point.

Was the anchor in the third paragraph by any chance?

The styledText representation makes the paragraph separator (return
char)
implicit (as it is in the field object internally) - so you need to bump
the tTotalChars by one before the final end repeat to account for that
(as the
codeunit ranges the field uses *include* the implicit return char)

So I couldn't help but fettle with this a little more. You mention that
your
'anchors' are not unique in a document. This raises the question of what
happens if there is more than one match...

This handler finds all occurrences of a given anchor in the text. As we
are
searching for all of them, it can use repeat for each key iteration in
both
loops:

function FindAllAnchors pStyledText, pAnchor
    /* Return-delimited list of results, each line is of the form:
    *     start,finish,line
    * Each of these corresponds to a chunk of the form:
    *      CODEUNIT start TO finish OF LINE line OF field
    */
    local tResults

    /* Iterate over the lines of the text in arbitrary order - the order
doesn't
    * matter as we keep the reference to the line any match is in. */
    repeat for each key tLineIndex in pStyledText
       /* Fetch the runs in the line, so we don't have to keep looking it
up */
       local tRuns
       put pStyledText[tLineIndex]["runs"] into tRuns

       /* Iterate over the runs in arbitrary order - assuming that the
number
       * of potentially matching runs is miniscule compared to the number
of
       * non-matching runs, it is faster to iterate in hash-order. */
       repeat for each key tRunIndex in tRuns
          /* If we find a match, work out its offset in the line */
          if tRuns[tRunIndex]["metadata"] is pAnchor then
             /* Calculate the number of codeunits before this run */
             local tCodeunitCount
             put 0 into tCodeunitCount
             repeat with tPreviousRunIndex = 1 to tRunIndex - 1
                add the number of codeunits in
tRuns[tPreviousRunIndex]["text"] to tCodeunitCount
             end repeat

             /* Append the result to the results list. */
             put tCodeunitCount + 1, \
                   tCodeunitCount + the number of codeunits in
tRuns[tRunIndex]["text"], \
                   tLineIndex & \
                   return after tResults
          end if
       end repeat
    end repeat

    /* We want the results sorted first by line index, then by starting
codeunit
    * within the line (so we get a top-to-bottom, left-to-right order).
As the
    * 'sort' command is stable, we can do this by first sorting by the
secondary
    * factor (codeunit start), then sorting again by the primary factor
(line
    * index). */
    sort lines of tResults ascending numeric by item 1 of each
    sort lines of tResults ascending numeric by item 3 of each

    /* Return the set of results. */
    return tResults
end FindAllAnchors

Testing this on 8Mb of styled Lorem Ipsum text, with the same anchor at:
   word 1
   word 1000
   the middle word
   word -1000
   word -1

Then this handler takes slightly less time then searching for a single
anchor
at word -1 of the field using 'repeat with' loops.

Whether this is helpful or not depends if you need to 'do something'
when there
is more than one matching anchor in a document :)

Warmest Regards,

Mark.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
In reply to this post by matthias rebbe via use-livecode
Hi Jacque,

Jacque wrote:


 > put the styledText of fld 1 into tDataA
 > put 0 into tTotalChars
 > put 0 into tStartChar
  >repeat with i = 1 to the number of elements in tDataA
    >put tDataA[i]["runs"] into tRunsA
    >repeat with j = 1 to the number of elements in tRunsA
    > put tRunsA[j] into tRunA
      >add the num of chars in tRunA["text"] to tTotalChars
     > if tRunA["metadata"] is pTag then
        >if tStartChar = 0 then
         > put tTotalChars - len(tRunA["text"]) + 3 into tStartChar
       > end if
      >else if tStartChar > 0 then
        >put tTotalChars - len(tRunA["text"]) into tEndChar
        >select char tStartChar to tEndChar of fld 1
        >select empty
        >set the backcolor of char tStartChar to tEndChar of fld 1 to "yellow"
        >return tStartChar & comma & tEndChar
     >end if
    >end repeat
  >end repeat


the styledArray does not include the returns at the end of a line. You have to add them if you address chars/codeUnits of the whole text. Initializing tTotalChars with -1 lets you add 1 to tTotalChars in each iterations of the outer repeat loop. -1 because the first line is not has no preceding return.
Also add 1 to calculate tStartChar otherwise you point to the last char of preceding run.

  put -1 into tTotalChars -- note -1
  put 0 into tStartChar
  repeat with i = 1 to the number of elements in tDataA
    add 1 to tTotalChars -- account for returns
    put tDataA[i]["runs"] into tRunsA

-- note add 1
put tTotalChars - len(tRunA["text"]) +1 into tStartChar -- mark char 1 of target

Additionally in your implementation if the target run with the metadata you look for is the last run of the array nothing is returned.

Kind regards
Bernd
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
In reply to this post by matthias rebbe via use-livecode
Is anyone maintaining the MasterLibrary? Stuff like this really should be added to it when the kinks are worked out.

Bob S


> On Feb 21, 2020, at 24:51 , Mark Waddingham via use-livecode <[hidden email]> wrote:
>
> On 2020-02-21 00:29, J. Landman Gay via use-livecode wrote:
>> So glad you chimed in, Mark. This is pretty impressive. I'll need to
>> use the "for each element" structure because my tags are not unique,
>> but it still is much faster. When clicking a tag at the top of the
>> document that links to the last anchor at the bottom of the text, I
>> get a timing of about 25ms. If I omit the timing for loading the
>> htmltext and the selection of the text at the end of the handler it
>> brings the timing to almost 0. The test text is long, but not nearly
>> as long as Bernd's sample.
>
> Glad I could help - although to be fair, all I did was optimize what
> Bernd (and Richard) had already proposed.


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
In reply to this post by matthias rebbe via use-livecode
Mark Waddingham wrote:

 >> I'm using chars instead of codepoints because when I tried it, they
 >> both gave the same number. Should I change that?
 >
 > Both characters and codepoints run the risk of requiring a linear scan
 > of the string to calculate the length - strictly speaking his will
 > occur if the engine is not sure whether character / codepoint have a
 > 1-1 map to codeunits (for example if your string has Unicode chars and
 > it hasn't analysed it). Therefore you should definitely use codeunits.

This is an interesting detail.  Is it safe to surmise from this that in
cases where speed is important we should consider using codeunits
instead of chars?

How might we use codeunits with offset()?

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  [hidden email]                http://www.FourthWorld.com


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
On 2020-02-21 17:22, Richard Gaskin via use-livecode wrote:
> This is an interesting detail.  Is it safe to surmise from this that
> in cases where speed is important we should consider using codeunits
> instead of chars?

Yes - especially if searching for non-letter chars as delimiters (e.g.
return, space, ':' etc.).

> How might we use codeunits with offset()?

You wouldn't - you would use codeunitOffset instead.

Note: The dictionary entry for codeunitOffset is heinously wrong! The
needle string can be any length, and the return value is *always*
relative to the start of the string (its not quite the same as offset):

e.g. codeunitOffset("foo", "barfoo", 2) = 4 (not 2 - as would be the
case with offset).

Another Note: In the general case codeunit counts <> codepoint counts <>
character counts - although for native strings they are all the same,
though.

Warmest Regards,

Mark.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
In reply to this post by matthias rebbe via use-livecode
Aha! Of course. I should have thought of that. Mark pointed out the same
thing. (And yes, my brief test had the metadata in the third paragraph.)

I'll try his handler when I get back to my Mac. If my guess is correct, the
search won't take any measurable time at all and the primary delay will be
loading the htmltext into a variable.

You guys are great.
--
Jacqueline Landman Gay | [hidden email]
HyperActive Software | http://www.hyperactivesw.com
On February 21, 2020 4:36:42 AM "Niggemann, Bernd via use-livecode"
<[hidden email]> wrote:

> Hi Jacque,
>
> Jacque wrote:
>
>
> > put the styledText of fld 1 into tDataA
> > put 0 into tTotalChars
> > put 0 into tStartChar
>  >repeat with i = 1 to the number of elements in tDataA
>    >put tDataA[i]["runs"] into tRunsA
>    >repeat with j = 1 to the number of elements in tRunsA
>    > put tRunsA[j] into tRunA
>      >add the num of chars in tRunA["text"] to tTotalChars
>     > if tRunA["metadata"] is pTag then
>        >if tStartChar = 0 then
>         > put tTotalChars - len(tRunA["text"]) + 3 into tStartChar
>       > end if
>      >else if tStartChar > 0 then
>        >put tTotalChars - len(tRunA["text"]) into tEndChar
>        >select char tStartChar to tEndChar of fld 1
>        >select empty
>        >set the backcolor of char tStartChar to tEndChar of fld 1 to "yellow"
>        >return tStartChar & comma & tEndChar
>     >end if
>    >end repeat
>  >end repeat
>
>
> the styledArray does not include the returns at the end of a line. You have
> to add them if you address chars/codeUnits of the whole text. Initializing
> tTotalChars with -1 lets you add 1 to tTotalChars in each iterations of the
> outer repeat loop. -1 because the first line is not has no preceding return.
> Also add 1 to calculate tStartChar otherwise you point to the last char of
> preceding run.
>
>  put -1 into tTotalChars -- note -1
>  put 0 into tStartChar
>  repeat with i = 1 to the number of elements in tDataA
>    add 1 to tTotalChars -- account for returns
>    put tDataA[i]["runs"] into tRunsA
>
> -- note add 1
> put tTotalChars - len(tRunA["text"]) +1 into tStartChar -- mark char 1 of
> target
>
> Additionally in your implementation if the target run with the metadata you
> look for is the last run of the array nothing is returned.
>
> Kind regards
> Bernd
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
In reply to this post by matthias rebbe via use-livecode
I thought Michael Doub was handling the Master Library or
are you talking about something else?

JB

> On Feb 21, 2020, at 7:50 AM, Bob Sneidar via use-livecode <[hidden email]> wrote:
>
> Is anyone maintaining the MasterLibrary? Stuff like this really should be added to it when the kinks are worked out.
>
> Bob S
>
>
>> On Feb 21, 2020, at 24:51 , Mark Waddingham via use-livecode <[hidden email]> wrote:
>>
>> On 2020-02-21 00:29, J. Landman Gay via use-livecode wrote:
>>> So glad you chimed in, Mark. This is pretty impressive. I'll need to
>>> use the "for each element" structure because my tags are not unique,
>>> but it still is much faster. When clicking a tag at the top of the
>>> document that links to the last anchor at the bottom of the text, I
>>> get a timing of about 25ms. If I omit the timing for loading the
>>> htmltext and the selection of the text at the end of the handler it
>>> brings the timing to almost 0. The test text is long, but not nearly
>>> as long as Bernd's sample.
>>
>> Glad I could help - although to be fair, all I did was optimize what
>> Bernd (and Richard) had already proposed.
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
Yes, that's it, but not sure if some of these recent gems are getting into the library.

Bob S


> On Feb 21, 2020, at 10:22 , JB via use-livecode <[hidden email]> wrote:
>
> I thought Michael Doub was handling the Master Library or
> are you talking about something else?
>
> JB


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
Yes, any additions are good to have.
I hope someone updates it.

JB

> On Feb 21, 2020, at 10:41 AM, Bob Sneidar via use-livecode <[hidden email]> wrote:
>
> Yes, that's it, but not sure if some of these recent gems are getting into the library.
>
> Bob S
>
>
>> On Feb 21, 2020, at 10:22 , JB via use-livecode <[hidden email]> wrote:
>>
>> I thought Michael Doub was handling the Master Library or
>> are you talking about something else?
>>
>> JB
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
In reply to this post by matthias rebbe via use-livecode
Mark Waddingham wrote:

 > On 2020-02-21 17:22, Richard Gaskin via use-livecode wrote:
 >> This is an interesting detail.  Is it safe to surmise from this that
 >> in cases where speed is important we should consider using codeunits
 >> instead of chars?
 >
 > Yes - especially if searching for non-letter chars as delimiters (e.g.
 > return, space, ':' etc.).

Super - thanks.  Any faster than byteoffset?


 >> How might we use codeunits with offset()?
 >
 > You wouldn't - you would use codeunitOffset instead.

OMG! How did I miss that addition?  Thank you!  That's going into use
this weekend.


 > Note: The dictionary entry for codeunitOffset is heinously wrong! The
 > needle string can be any length, and the return value is *always*
 > relative to the start of the string (its not quite the same as
 > offset):
 >
 > e.g. codeunitOffset("foo", "barfoo", 2) = 4 (not 2 - as would be the
 > case with offset).
 >
 > Another Note: In the general case codeunit counts <> codepoint counts
 > <> character counts - although for native strings they are all the
 > same, though.

Good to know - thanks.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  [hidden email]                http://www.FourthWorld.com


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
In reply to this post by matthias rebbe via use-livecode
Welcome to the party Hermann. :) Unfortunately the HTML isn't under my control and may change
periodically. It's retrieved from a server on demand. The metadata I'm looking for isn't a
link, it's a text property, and is already hidden when displaying text in a field. It would be
similar to looking for a specific instance of bolded text.

This would be good for other uses though, so thanks for the idea.

On 2/20/20 3:12 PM, hh via use-livecode wrote:

> As others try to optimize ("ping") I'll try an improvement too ("pong")
> with using another method that requires to change your link targets ONCE:
>
> Instead of unique targets <a name="target10"> write in your field
>
> "<p hidden>"&numTochar(1)&"target10"&numTochar(1)&"</p>""
>
> Handler replaceTargets below does it (slowly) but you probably don't
> need it on mobile.
>
> So assuming (unusual)
>
> -- [1] you use the following for a link target
> --     <p hidden>&#1;target10&#1;<p>
> -- [2] you use the following for a local page link
> --     <a href="#target10">Target10</a>
> -- [3] you don't use <p hidden> elsewhere. Else add
> --     an additional marker to it to differentiate.
>
>
> Then script your field with the following simple handler:
>
> on linkClicked pUrl
>    put the milliseconds into m1
>    lock messages; lock screen
>    if pUrl begins with "#" then
>      put numToChar(1)& (char 2 to -1 of pUrl) &numToChar(1) into tTarget
>      put 1+offset(tTarget,me)+length(tTarget) into tOff
>      select char tOff to tOff+3 of me -- see it in locked field
>      -- select char tOff of me -- variant
>      scrollSelectionIntoView -- optional
>    end if
>    put the millisecs -m1 into fld "timing1"
> end linkClicked
>
> ---- helpers (optionally needed)
>
> -- Note. LC adds also an additional "<p hidden></p>", we don't mind.
> on replaceTargets -- should be optimized if used often
>    put the millisecs into m1
>    lock messages; lock screen
>    put the htmltext of fld 1 into tHTML
>    set linedel to "<a name="&quote
>    set itemdel to "</a>"
>    put line 1 of tHtml into tI2
>    put numToChar(1) into b1
>    repeat for each line L in (line 2 to -1 of tHtml)
>      put "<p hidden>"&b1&char 1 to offset(quote,L)-1 of L into item 1 of L
>      put offset("</a>",L) into o1
>      put b1&"</p>" into char o1 to o1+3 of L
>      put L after tI2
>    end repeat
>    set htmlText of fld 1 to tI2 -- LC translates numToChar(1) to "&#1;"
>    put the millisecs-m1 into fld "timing2"
> end mouseUp
>
> on scrollSelectionIntoView
>    put the selectedLoc into tSL
>    put the vscroll of me into tV
>    put item 2 of tSL - the top of me into tDiff
>    if tDiff > 0.75*the height of me then
>      set vscroll of me to tV + 0.4*the height of me
>    else if tDiff < 0.25*the height of me then
>      set vscroll of me to tV - 0.4*the height of me
>    end if
> end scrollSelectionIntoView
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>


--
Jacqueline Landman Gay         |     [hidden email]
HyperActive Software           |     http://www.hyperactivesw.com

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
In reply to this post by matthias rebbe via use-livecode
On 2/21/20 2:51 AM, Mark Waddingham via use-livecode wrote:
> Both characters and codepoints run the risk of requiring a linear scan of
> the string to calculate the length - strictly speaking his will occur if
> the engine is not sure whether character / codepoint have a 1-1 map to
> codeunits (for example if your string has Unicode chars and it hasn't
> analysed it). Therefore you should definitely use codeunits.

Right now the text is all Roman but I'll use condeunits anyway to make it future-proof.

> The styledText representation makes the paragraph separator (return char)
> implicit (as it is in the field object internally) - so you need to bump
> the tTotalChars by one before the final end repeat to account for that (as the
> codeunit ranges the field uses *include* the implicit return char)

I should have noticed that, it seems so obvious now. There was no elipsis in the variable
watcher, which there would have been if a return character was there.

>
> So I couldn't help but fettle with this a little more. You mention that your
> 'anchors' are not unique in a document. This raises the question of what
> happens if there is more than one match...
>
> This handler finds all occurrences of a given anchor in the text. As we are
> searching for all of them, it can use repeat for each key iteration in both
> loops:
>
> function FindAllAnchors pStyledText, pAnchor
>     /* Return-delimited list of results, each line is of the form:
>     *     start,finish,line
>     * Each of these corresponds to a chunk of the form:
>     *      CODEUNIT start TO finish OF LINE line OF field
>     */
>     local tResults
>
>     /* Iterate over the lines of the text in arbitrary order - the order doesn't
>     * matter as we keep the reference to the line any match is in. */
>     repeat for each key tLineIndex in pStyledText
>        /* Fetch the runs in the line, so we don't have to keep looking it up */
>        local tRuns
>        put pStyledText[tLineIndex]["runs"] into tRuns
>
>        /* Iterate over the runs in arbitrary order - assuming that the number
>        * of potentially matching runs is miniscule compared to the number of
>        * non-matching runs, it is faster to iterate in hash-order. */
>        repeat for each key tRunIndex in tRuns
>           /* If we find a match, work out its offset in the line */
>           if tRuns[tRunIndex]["metadata"] is pAnchor then
>              /* Calculate the number of codeunits before this run */
>              local tCodeunitCount
>              put 0 into tCodeunitCount
>              repeat with tPreviousRunIndex = 1 to tRunIndex - 1
>                 add the number of codeunits in tRuns[tPreviousRunIndex]["text"] to tCodeunitCount
>              end repeat
>
>              /* Append the result to the results list. */
>              put tCodeunitCount + 1, \
>                    tCodeunitCount + the number of codeunits in tRuns[tRunIndex]["text"], \
>                    tLineIndex & \
>                    return after tResults
>           end if
>        end repeat
>     end repeat
>
>     /* We want the results sorted first by line index, then by starting codeunit
>     * within the line (so we get a top-to-bottom, left-to-right order). As the
>     * 'sort' command is stable, we can do this by first sorting by the secondary
>     * factor (codeunit start), then sorting again by the primary factor (line
>     * index). */
>     sort lines of tResults ascending numeric by item 1 of each
>     sort lines of tResults ascending numeric by item 3 of each
>
>     /* Return the set of results. */
>     return tResults
> end FindAllAnchors
>
> Testing this on 8Mb of styled Lorem Ipsum text, with the same anchor at:
>    word 1
>    word 1000
>    the middle word
>    word -1000
>    word -1
>
> Then this handler takes slightly less time then searching for a single anchor
> at word -1 of the field using 'repeat with' loops.

Fantastic, it got the timing down to about 6ms give or take, not counting loading the
styledtext or selecting it after.

>
> Whether this is helpful or not depends if you need to 'do something' when there
> is more than one matching anchor in a document :)

All I require is to scroll to the correct position in the text and briefly hilite the metadata
span to draw the user's attemtion to the found text. I can compare the results returned from
your function to find the earliest and latest numbered instances and work out the hiliting from
there. That's possible because the duplicate metadata instances are all grouped together rather
than scattered around.

The only reason I have more than one instance is because there are href links inside the
metadata spans, and LC translates that into separate metadata spans if there is more than one
link, or there's a line break. If it would honor the entire span regardless of those, then each
metadata tag would be unique. Some of my metadata needs to span more than one line, and/or
contain multiple inner links.

That's also why, in my initial attempt using counters, I could exit the loop as soon as I found
a non-match after locating the initial one. When going sequentially through the text, there
won't be any other duplicates as soon as the metadata changes.

--
Jacqueline Landman Gay         |     [hidden email]
HyperActive Software           |     http://www.hyperactivesw.com


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
On 2/21/20 2:10 PM, J. Landman Gay via use-livecode wrote:
> The only reason I have more than one instance is because there are href links inside the
> metadata spans, and LC translates that into separate metadata spans if there is more than one
> link, or there's a line break. If it would honor the entire span regardless of those, then each
> metadata tag would be unique. Some of my metadata needs to span more than one line, and/or
> contain multiple inner links.

Here's a mockup of the type of htmltext I'm working with. This one has three duplicate
instances of the metadata:

<p spaceabove="3"><span metadata="12345">Suspendisse nulla neque, dapibus quis sapien vitae <a
href="#fn4"><font size="20">* </font></a></span><span metadata="12345">in est metus<a
href="#fn5"> porttitor ligula augue,<font size="20">* </font></a></span><span
metadata="12345">tortor vestibulum adipiscing dignissim<a href="#en31"> nulla.&deg; </a></span></p>

--
Jacqueline Landman Gay         |     [hidden email]
HyperActive Software           |     http://www.hyperactivesw.com

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
In reply to this post by matthias rebbe via use-livecode
This is possibly your problem if I understand correctly what
you are doing (Browser source -> LC htmltext -> LC styledText).

You try to work, using one LC method (styledText), around
problems that another LC method (htmltext) has generated.

One way to solve this could be to avoid LC's htmltext at all.
Instead apply JavaScript with its powerful regular expression
methods or, even better, the regex external "sunnYrex" of
Thierry Douez to your input-source from the browser.

That is: Browser source -> REGEX -> LC-styledText


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Translate metadata to field content

matthias rebbe via use-livecode
Actually I did start with a browser widget but there were too many things I
need to do which aren't easy there. I need to get the clickchunk, color
multiple sentences differently in the same field, get user-hilited text,
etc. But the primary reason for switching to a LC field is that I need to
display other controls on top of it.

--
Jacqueline Landman Gay | [hidden email]
HyperActive Software | http://www.hyperactivesw.com
On February 22, 2020 7:42:32 AM hh via use-livecode
<[hidden email]> wrote:

> This is possibly your problem if I understand correctly what
> you are doing (Browser source -> LC htmltext -> LC styledText).
>
> You try to work, using one LC method (styledText), around
> problems that another LC method (htmltext) has generated.
>
> One way to solve this could be to avoid LC's htmltext at all.
> Instead apply JavaScript with its powerful regular expression
> methods or, even better, the regex external "sunnYrex" of
> Thierry Douez to your input-source from the browser.
>
> That is: Browser source -> REGEX -> LC-styledText
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
12