LC7 and 8 - Non responsive processing large text files

classic Classic list List threaded Threaded
21 messages Options
12
RH
Reply | Threaded
Open this post in threaded view
|

LC7 and 8 - Non responsive processing large text files

RH
This issue was addressed before within another context, but I am running
into the same problems with all versions of 7 and 8 including the latest
rc1 (using Windows 8.1 to Windows 10).

The question is important as it is related to a planned project for very
serious development and a product to be developed with LiveCode 8.0 (stable
version).

Imagine Gibabytes big text files. Maybe even it could be Terabytes!

NON-RESPONSIVENESS USING "UNTIL" READ

I found opening a very large text file (my file is 26 GB), simply reading
from it up to 90 MB of data in each iteration, using an offset () function
processing the read junk of data, and iterating through the file, is not a
problem and performs acceptably.

But reading something into memory using "read from file <filename> UNTIL
<string>" and doing this many times over in such large text file creates
non-responsiveness of LC (tested on 7 and 8).

So, what do I do?

1. Open file in read only mode and for binary read
2. Reading until <string> (which separates pieces of data where each piece
is not bigger than 30 MB)
3. Repeating the reading 10x or 100,000x or more... (depends on file size)
4. Performing some additional processing on such read pieces of data in
memory
5. Eventually placing data into a field, or storing it away somewhere else
(max 30 MB in my case)
6. Closing the file

LC eventually turns back to normal state when user is waiting long enough
and the process is finished. But LC is completely locked away in such
non-responsive state, and it is not acceptable to our project.

Maybe with each loop I have to send a waiting message? I tried sending with
10 milliseconds, but nothing changed. I tried several other options.

One more note: Additionally I find that it can take significant delays when
placing large blocks of text into a field. Is there an idea of improving
this?

Kind greetings
Roland
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Roland Huettmann - Babanin GmbH - Switzerland www.babanin.com / roh@babanin.com
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Mark Talluto-3

> On Apr 13, 2016, at 3:03 AM, Roland Huettmann <[hidden email]> wrote:

…snip
> But reading something into memory using "read from file <filename> UNTIL
> <string>" and doing this many times over in such large text file creates
> non-responsiveness of LC (tested on 7 and 8).
>
> So, what do I do?
>
> One more note: Additionally I find that it can take significant delays when
> placing large blocks of text into a field. Is there an idea of improving
> this?

Hi Roland,

This is more of a curiosity than a solution. What happens if you run your code in LC version 6? Do you still see the same timing issues?

Version 7/8 are slower than version 6 in almost every way. Here are a couple of bug reports on the issue:

http://quality.livecode.com/show_bug.cgi?id=16387 <http://quality.livecode.com/show_bug.cgi?id=16387>
http://quality.livecode.com/show_bug.cgi?id=15711 <http://quality.livecode.com/show_bug.cgi?id=15711>

There are others out there. The team has been trying to make LC 8 more performant though. So our reports are being worked on.

Best regards,

Mark Talluto
livecloud.io <http://www.livecloud.io/>
canelasoftware.com <http://www.canelasoftware.com/>



_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Monte Goulding-2

> On 14 Apr 2016, at 3:25 PM, Mark Talluto <[hidden email]> wrote:
>
>> But reading something into memory using "read from file <filename> UNTIL
>> <string>" and doing this many times over in such large text file creates
>> non-responsiveness of LC (tested on 7 and 8).

It would be interesting to know if the engine were doing lots of encoding conversions in order to test for the string. I must have missed the first email but I’d be interested to know if the file was opened for text or binary read and if an encoding was provided in the command if it was text. If it was binary is performance any better if the until <string> a variable with the same encoding as the binary file? The other thing worth pointing out is if you are doing a text read on an 8 bit encoding the size of your data doubles once read because each character is represented by 16 bits. When you consider the amount of data we are talking about here that will cause a serious slowdown.

Cheers

Monte
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
RH
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

RH
Thanks a lot for replies Mark and Monte and for everyone )

I like to stress that this is planned to be a version for a real product to
everyone out there... working with large files.

The anomaly starts immediately after calling the handler whether or not an
individual block of data is parsed.

HERE IS THE CODE SNIPPET :(shortened)

## File myFile is a very large text file containing messages including
encoded images etc. > 20 GB
...
// Opening the large file
open file myFile for binary read

put CR & XYZ" into tString
put 999999 into n

// Reading from file
...repeat n
......read from file myFile until tString
......put the result into tResult
......add 1 to i
...... if i mod 1000 is 0 then
.........if the commandkey is down then exit repeat
.........-- showStatus i,s -- separate handler
.........wait .001 sec
.......end if

// I disabled parsing for testing purpose
.......-- if tparse is true then
.........-- put parseMsg ( it ) into a -- separate handler to parse
....... -- put a["Sub"] & tab & a["Fro"] & tab & a["Da2"] & tab after
gData["lst"]
......-- end if

...... if tResult = "eof" then exit repeat
... end repeat

close file f
...
function parseMsg tMsg
--- do some processing and return parsed
end parseMsg

In LC 7 or 8 (not yet tested in 6, but why to test there since it is end of
life soon...?) on very larges files this makes LiveCode become
un-responsive..

If I set n = 3, or 10, or 20 or some small number then it also turns LC
into non-responsive state but  after processing LC becomes responsive again.

Setting n to 1000 or more for example it becomes unresponsive for such a
long time that I kill the application using Task Manager.

Also there is no way using the command key as defined in the script to stop
the process after 1000 iterations since LC is unresponsive.

So, in my opinion, the problem has nothing to do with parsing, it has to do
with reading the file. Any parsing and further processing will slow down,
but should not result with anything unexpected.

I will try to reinstall latest version 6 to see how that behaves.

LC should not become unresponsive in any case. Right? Or do I miss
something? Something wrong with the script?

And a more generalized question also discussed before: What exactly happens
when LC tries to read from a very very large file? Maybe it is Gigabyte or
even Terabyte file? It could just be too big to read and it should then
still not return empty or become unresponsive, but return some error
message.

And what happens when very large amounts of data are read into memory,
processed there, and placed into a field? Is there anything preventing
unresponsiveness?

Thanks again for feedback... )))) VERY MUCH APPRECIATED.

Roland




On 14 April 2016 at 07:55, Monte Goulding <[hidden email]> wrote:

>
> > On 14 Apr 2016, at 3:25 PM, Mark Talluto <[hidden email]>
> wrote:
> >
> >> But reading something into memory using "read from file <filename> UNTIL
> >> <string>" and doing this many times over in such large text file creates
> >> non-responsiveness of LC (tested on 7 and 8).
>
> It would be interesting to know if the engine were doing lots of encoding
> conversions in order to test for the string. I must have missed the first
> email but I’d be interested to know if the file was opened for text or
> binary read and if an encoding was provided in the command if it was text.
> If it was binary is performance any better if the until <string> a variable
> with the same encoding as the binary file? The other thing worth pointing
> out is if you are doing a text read on an 8 bit encoding the size of your
> data doubles once read because each character is represented by 16 bits.
> When you consider the amount of data we are talking about here that will
> cause a serious slowdown.
>
> Cheers
>
> Monte
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Roland Huettmann - Babanin GmbH - Switzerland www.babanin.com / roh@babanin.com
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Mark Waddingham-2
In reply to this post by RH
Hi Roland,

On 2016-04-13 12:03, Roland Huettmann wrote:

> NON-RESPONSIVENESS USING "UNTIL" READ
>
> I found opening a very large text file (my file is 26 GB), simply
> reading
> from it up to 90 MB of data in each iteration, using an offset ()
> function
> processing the read junk of data, and iterating through the file, is
> not a
> problem and performs acceptably.
>
> But reading something into memory using "read from file <filename>
> UNTIL
> <string>" and doing this many times over in such large text file
> creates
> non-responsiveness of LC (tested on 7 and 8).

When you say 'non-responsiveness' I take it you mean that Windows thinks
that the application has 'hung'? (i.e. the windows go slightly opaque).

If that is the case then the problem here is probably that the 'read
until' command is running in a very tight loop without 'tickling' the
event loop. The first thing to try is to open the file for binary read,
and make sure you encode '<string>' in the appropriate encoding for the
text file you are reading and see if that helps.

If that doesn't improve matters then I'd guess that the string being
searched for is not that common, and so the engine is having to wade
through large sections of the file on each command invocation. This is
probably taking longer than Windows will tolerate before thinking the
app is non-responsive.

It isn't entirely clear to me at the moment how we could 'fix' this in
the engine right now with the way the file processing code currently
works. However, you might find replacing 'read until' with a script
solution might make things work better:

------

on test
    local tFile
    bufferedFileOpen "~/Desktop/largefile.txt", "binary", tFile
    put empty into field 1
    repeat forever
       local tLine, tIsEof
       bufferedFileReadUntilExact tFile, numToChar(13), tLine
       if the result is "eof" then
          put tLine & return after field 1
          exit repeat
       end if
       put tLine & return after field 1
    end repeat
    bufferedFileClose tFile
end test

command bufferedFileOpen pFilename, pEncoding, @xFileHandle
   open file pFilename for binary read
   if the result is not empty then
     throw "cannot open file"
   end if

   -- The 'file' key stores the filename to use to read from.
   put pFilename into xFileHandle["file"]

   -- The 'encoding' is used to encode a string we search for
appropriately
   put pEncoding into xFileHandle["encoding"]

   -- The 'buffer' contains data we have read but not yet consumed
   put empty into xFileHandle["buffer"]
end bufferedFileOpen

command bufferedFileClose @xFileHandle
   if xFileHandle["file"] is empty then
     exit bufferedFileClose
   end if
   close file xFileHandle["file"]
   put empty into xFileHandle
end bufferedFileClose

command bufferedFileReadUntilExact @xFileHandle, pString, @rRead
   -- First encode the string as binary data. If the encoding of
   -- the file is 'binary' then we assume pString is binary too.
   local tEncodedString
   if xFileHandle["encoding"] is "binary" then
     put pString into tEncodedString
   else
     put textEncode(pString, xFileHandle["encoding"]) into tEncodedString
   end if

   -- Now compute the length in bytes of the string we are searching for
   local tEncodedStringLength
   put the number of bytes in tEncodedString into tEncodedStringLength

   -- We store the last position we searched up until in the current
buffer
   -- so that we aren't continually searching the same data for the
string.
   local tBytesToSkip
   put 0 into tBytesToSkip

   -- We now loop, accumulating the output string, until we find the
   -- string we are searching for.
   local tIsEof
   put false into tIsEof
   repeat forever
     -- If the amount of data in the buffer is less than the string we
     -- are searching for then read in another 64kb of data.
     if the number of bytes in xFileHandle["buffer"] <
tEncodedStringLength then
       read from file xFileHandle["file"] for 65536 bytes
       if the result is "eof" then
         put true into tIsEof
       else if the result is not empty then
         throw "error reading from file"
       end if
       put it after xFileHandle["buffer"]
     end if

     -- See if we can find the string in the buffer
     local tEncodedStringOffset
     put byteOffset(tEncodedString, xFileHandle["buffer"], tBytesToSkip)
into tEncodedStringOffset
     if tEncodedStringOffset is not 0 then
       put byte 1 to (tBytesToSkip + tEncodedStringOffset +
tEncodedStringLength - 1) of xFileHandle["buffer"] into rRead
       delete byte 1 to (tBytesToSkip + tEncodedStringOffset +
tEncodedStringLength - 1) of xFileHandle["buffer"]
       exit repeat
     end if

     -- If we failed to find the string and the file is at eof, we are
done.
     if tIsEof then
       put xFileHandle["buffer"] into rRead
       put empty into xFileHandle["buffer"]
       exit repeat
     end if

     -- We failed to find the string in the buffer so we need to
accumulate
     -- more data. As the string was not found in the current buffer, we
     -- know we can skip all bytes up to (buffer length -
tEncodedStringLength)
     put the number of bytes in xFileHandle["buffer"] -
tEncodedStringLength into tBytesToSkip
   end repeat

   if tIsEof then
     return "eof"
   end if

   return empty
end bufferedFileReadUntilExact

----

I wrote the above whilst waiting for my Windows VM to spin up...
However, having since tried the above on Windows and *still* observing a
'non-responsive' window after a while I don't think the problem you are
seeing has anything to do with processing speed of the 'read until'
command.

Windows will assume that an app is 'non responsive' if it does not
process any UI events for more than 5 seconds
(https://msdn.microsoft.com/en-gb/library/windows/desktop/dd744765(v=vs.85).aspx).
Now, the engine does periodically 'poke' the event queue to look for
'Ctrl-.' key presses which will abort repeat loops and any long running
process - however, this does not appear to be (any longer?) enough to
stop an app from becoming 'non-responsive'. What is even more strange is
that after an app does become 'non-responsive' (according to the OS),
the 'Ctrl-.' key press will no longer appear to work. At this stage, I'm
not entirely sure what can be done to prevent Windows from marking an
app as 'non-responsive' without explicit script being used (the Ctrl-.
not working whilst 'non-responsive' is potentially fixable although it
is a little tricky to work out what is going on).

One thing to try is to not use 'wait', but 'wait with messages' at
reasonable intervals in your top-level processing loop. This does mean
you will have to disable the UI whilst your processing is taking place,
and balance the periodic calling of 'wait with messages' with the speed
of the processing loop. (You don't want to call it too often because
otherwise it will impact performance).

Warmest Regards,

Mark.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Paul Dupuis
On 4/14/2016 5:43 AM, Mark Waddingham wrote:
> When you say 'non-responsiveness' I take it you mean that Windows
> thinks that the application has 'hung'? (i.e. the windows go slightly
> opaque).

This may or may not be related, but I have some code that does lots of
data manipulation in a big loop  (not from a file though) that was
causing Windows 8.1 to think the app was unresponsive, I added a:

wait 0 with messages

to the loop to allow the engine and OS time to check events and while my
loop still take a LONG time to complete, Windows no longer things the
App has hung.

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Richard Gaskin
In reply to this post by Mark Waddingham-2
Mark Waddingham wrote:

 > I wrote the above whilst waiting for my Windows VM to spin up...

One more reason to consider VirtualBox.  Being free is just icing:  I
switched because I found it much faster in restoring VM sessions than
any other VM software I've used.

--
Richard Gaskin
Fourth World Systems
Software Design and Development for the Desktop, Mobile, and the Web
____________________________________________________________________
[hidden email] http://www.FourthWorld.com



_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Richard Gaskin
In reply to this post by Mark Talluto-3
Mark Talluto wrote:

 > Version 7/8 are slower than version 6 in almost every way. Here are a
 > couple of bug reports on the issue:
 >
 > <http://quality.livecode.com/show_bug.cgi?id=16387>
 > <http://quality.livecode.com/show_bug.cgi?id=15711>
 >

You may want to add this enhancement request to that list:
<http://quality.livecode.com/show_bug.cgi?id=17210>

It deals specifically with reading/parsing large files, using two
methods: the common "read until <char> form, and one using a custom
buffer similar to Mark's email earlier today.  Both forms use binary
mode, so encodings don't come into play.


 > There are others out there. The team has been trying to make LC 8
 > more performant though. So our reports are being worked on.

I discussed the performance concerns with Peter in our call last week.
As you noted, they've been working on them, and in some cases have had
significant success.

Some chunk expressions, esp. related to lineoffset, and now nearly as
fast as in v6.x, while also retaining all the features Unicode now offers.

Other forms of chunk expressions may benefit from further performance
enhancement, and similarly with arrays some performance gain have been
made in in recent versions while others are still on the to-do list.

File I/O remains an area slated for review in an upcoming version, and I
haven't yet tested socket I/O so I don't know if performance there has
changed but is obviously just as important.

Given that v8.0 is now in RC, it would seem extremely unlikely that any
enhancements not already close to completion will land in v8.0 Stable.

That said, the team seems eager to increase performance as much as
practical, and with the progress made thus far -- with v8 being much
faster in many operations than v7 -- I'm confident we'll see a version
after v8.0 that will continue this trend.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  [hidden email]                http://www.FourthWorld.com


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Richard Gaskin
In reply to this post by RH
Roland Huettmann wrote:

 > And a more generalized question also discussed before: What exactly
 > happens when LC tries to read from a very very large file? Maybe it
 > is Gigabyte or even Terabyte file? It could just be too big to read
 > and it should then still not return empty or become unresponsive,
 > but return some error message.

The responsiveness has to do with the tight loop not surrendering enough
time for Windows' current liking.  He covered that well in his earlier post:
<http://lists.runrev.com/pipermail/use-livecode/2016-April/225896.html>

This is independent of the size of the file, and really independent any
file I/O operations at all.  Other tasks in tight loops can also trigger
Windows to consider an app "unresponsive" even though it's running.

Hopefully they'll be able to to massage the event loop to handle the
latest Win API expectations, which have apparently changed in recent
versions.

As for large files, I've had very good experiences parsing files larger
than 6 GB.  Given that this is well outside of any internal memory
addressing, I'm assuming LC would work equally well on any size of file
the local file system can handle.

This requires, of course, that we write our apps the way most apps are
written:  with very large files, rather than read the whole thing into
RAM and expect one giant memcopy, we read in chunks and process each
chunk separately, as your script does.

LC's internal addressing allows for a single block of memory to up to
about 1 GB IIRC (Mark, please correct me if that's not right), which is
far larger than most operations will be able to handle efficiently anyway.

Which leads us to:

 > And what happens when very large amounts of data are read into memory,
 > processed there, and placed into a field? Is there anything preventing
 > unresponsiveness?

Usually yes, and out-of-memory error should be thrown.  But low-memory
situations are very tricky:  if there isn't enough memory to complete
execution of the script, there may not be enough memory to report the error.

Mark can probably provide more details on that, but I've seen iffy
handling of low-memory situations with a wide range of applications and
even operating systems.  It's just a hard problem to solve with
consistent grace.

But fortunately, low-memory situations are rare on modern systems if we
just remain mindful of how to use memory efficiently:  read stuff in
chunks, process what you need, and if an aggregate operation results in
a large amount of data consider flushing it to disk periodically as you
go - "open...for append" on our output file is very efficient for that
sort of thing.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  [hidden email]                http://www.FourthWorld.com


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
RH
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

RH
In reply to this post by Richard Gaskin
Hi all, great response. Enjoyable to read, and motivating a lot .... ))).

> Mark: ... "When you say 'non-responsiveness' I take it you mean that
Windows thinks that the application has 'hung'? (i.e. the windows go
slightly opaque)."

That is correct, Mark.

As I understand it now, it is Windows making the application to be
unresponsive. To be more exact in describing: Immediately after the
function is called (why "immediately"?), the cursor changes to a spinning
blue wheel running, the LC windows title is suffixed with "(Not
Responding)" and the stack window appears a opaque.

> Paul: ... Using "wait 0 with messages" ...

I did that. Unfortunately it does not help much. But there is a change: Now
it switches from responsive to non-responsive state in every loop. So there
are responsive phases now, and unresponsive phases. ) But I still can not
stop the loop using the command key.

Since for testing I am not doing anything but using "read until <string>"
within the loop, not touching the read data, it seems obvious to me (...?)
that is has to do with the "until <string>" inner processing.

Yes, it is a "binary read".

> Mark: ... However, you might find replacing 'read until' with a script
solution might make things work better: ..."

That is something to really test and I will do this. Looks sophisticated !)

=== Workaround ===

There is a work-around for me:

Not using "read until <string>" but reading "at" <position> "for" <number
of bytes" also in this huge file.

I can read even 100 MB into memory (it does not create a big problem) and
then process using offset() and then reading the next pack of data.

Now I am creating an index for each occurrence which is a position for each
block of data within the file. Well, that index alone is larger than
100,000 lines. I am not sure here using an array as I read too many times
that they are very slow in LC8... ??? Well, it is better for access. Or
what is the best way of creating a very large index and where best to
store? As a text file? Or using local SQLite?

In case of reading from a file position for a defined number of bytes there
is no strange behaviour as far as I experienced until now. It takes a
little while, but acceptable. All is responsive.

In other words, reading a large file "from <startPosition> for
<numberOfBytes> seems ok even in a loop where it is still advisable to use
"wait 0 with messages".

Well - better now to test and test and test...

Continuing to explore this. It is needed.

Cheers, Roland





On 14 April 2016 at 16:41, Richard Gaskin <[hidden email]>
wrote:

> Mark Talluto wrote:
>
> > Version 7/8 are slower than version 6 in almost every way. Here are a
> > couple of bug reports on the issue:
> >
> > <http://quality.livecode.com/show_bug.cgi?id=16387>
> > <http://quality.livecode.com/show_bug.cgi?id=15711>
> >
>
> You may want to add this enhancement request to that list:
> <http://quality.livecode.com/show_bug.cgi?id=17210>
>
> It deals specifically with reading/parsing large files, using two methods:
> the common "read until <char> form, and one using a custom buffer similar
> to Mark's email earlier today.  Both forms use binary mode, so encodings
> don't come into play.
>
>
> > There are others out there. The team has been trying to make LC 8
> > more performant though. So our reports are being worked on.
>
> I discussed the performance concerns with Peter in our call last week. As
> you noted, they've been working on them, and in some cases have had
> significant success.
>
> Some chunk expressions, esp. related to lineoffset, and now nearly as fast
> as in v6.x, while also retaining all the features Unicode now offers.
>
> Other forms of chunk expressions may benefit from further performance
> enhancement, and similarly with arrays some performance gain have been made
> in in recent versions while others are still on the to-do list.
>
> File I/O remains an area slated for review in an upcoming version, and I
> haven't yet tested socket I/O so I don't know if performance there has
> changed but is obviously just as important.
>
> Given that v8.0 is now in RC, it would seem extremely unlikely that any
> enhancements not already close to completion will land in v8.0 Stable.
>
> That said, the team seems eager to increase performance as much as
> practical, and with the progress made thus far -- with v8 being much faster
> in many operations than v7 -- I'm confident we'll see a version after v8.0
> that will continue this trend.
>
> --
>  Richard Gaskin
>  Fourth World Systems
>  Software Design and Development for the Desktop, Mobile, and the Web
>  ____________________________________________________________________
>  [hidden email]                http://www.FourthWorld.com
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Roland Huettmann - Babanin GmbH - Switzerland www.babanin.com / roh@babanin.com
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Richard Gaskin
Roland Huettmann wrote:

 > There is a work-around for me:
 >
 > Not using "read until <string>" but reading "at" <position> "for"
 > <number of bytes" also in this huge file.
 >
 > I can read even 100 MB into memory (it does not create a big problem)
 > and then process using offset() and then reading the next pack of
 > data.

You may find the sample stack attached to this report useful:
<http://quality.livecode.com/show_bug.cgi?id=17210>

It's similar to the buffering script Mark posted earlier, but uses a
constant conveniently located at the top of the script to govern the
buffer size.

The reason this may be useful is that when I was experimenting with
buffer-and-parse-in-memory options like you're doing now I was surprised
to find that smaller buffers can be MUCH faster than larger ones.

On my Linux box the optimal buffer turned out to be 128k, but I'd be
very interested to learn which buffer size works best on your Windows
system.


 > Now I am creating an index for each occurrence which is a position
 > for each block of data within the file. Well, that index alone is
 > larger than 100,000 lines. I am not sure here using an array as I
 > read too many times that they are very slow in LC8... ???

Performance differences will vary as much by algorithm as by LC version.

Whether an array is "slow" compared to an alternative will depend on
what's being done with it.

I'd test an array version and an alternative before relying on
discussions others have had about algos that may be quite different from
your own.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  [hidden email]                http://www.FourthWorld.com


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

J. Landman Gay
In reply to this post by RH
On 4/14/2016 12:27 PM, Roland Huettmann wrote:
> There is a work-around for me:
>
> Not using "read until <string>" but reading "at" <position> "for" <number
> of bytes" also in this huge file.
>
> I can read even 100 MB into memory (it does not create a big problem) and
> then process using offset() and then reading the next pack of data.

As Mark suggested, that implies that the search string is uncommon in
the text and large amounts of data need to be parsed before a match is
found. By reading for a specified number of bytes instead, you guarantee
a limited amount of parsing.

--
Jacqueline Landman Gay         |     [hidden email]
HyperActive Software           |     http://www.hyperactivesw.com

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
RH
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

RH
< Richard: "You may find the sample stack attached to this report useful:
<http://quality.livecode.com/show_bug.cgi?id=17210>"

Richard, thanks a lot for the sample stack. I downloaded and tested
"LargeFileRead.livecode". Unfortunately it also creates the same symptoms
on Windows:

Stack becomes unresponsive in Windows (using Windows 10). As I see, you are
also using both "read until..:" and "read for..." in two different loops.
Also here the problem is with the "read until..:" loop.

But then I added the recommended "wait 0 with messages" to the "read until
" and ...

.....................THAT WORKED! ......................... )))))))

Though it does NOT work for me as I am reading much larger chunks of data
using "read until..."

(It is still running for 60 minutes now... I think I have to force-stop it.
No... now it finished with 3,060,000. -:)

I think it is important to know how to handle "big" data with files, in
memory and placing in fields. And of course in a future nearby 8.1 version
(or sooner?) those in need would love to see this taken care of in the
engine.

TESTING READING LARGE GIGABYTE FILES

To make a contribution here, I am going to test all the variants and
techniques discussed, published by you or developed by me, and creating a
test case checking for performance and stability using the different
methods as suggested. It will take some time during the weekend, or so.

It will be interesting to see ...

Roland









On 14 April 2016 at 20:47, J. Landman Gay <[hidden email]> wrote:

> On 4/14/2016 12:27 PM, Roland Huettmann wrote:
>
>> There is a work-around for me:
>>
>> Not using "read until <string>" but reading "at" <position> "for" <number
>> of bytes" also in this huge file.
>>
>> I can read even 100 MB into memory (it does not create a big problem) and
>> then process using offset() and then reading the next pack of data.
>>
>
> As Mark suggested, that implies that the search string is uncommon in the
> text and large amounts of data need to be parsed before a match is found.
> By reading for a specified number of bytes instead, you guarantee a limited
> amount of parsing.
>
> --
> Jacqueline Landman Gay         |     [hidden email]
> HyperActive Software           |     http://www.hyperactivesw.com
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Roland Huettmann - Babanin GmbH - Switzerland www.babanin.com / roh@babanin.com
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Mark Waddingham-2
On 2016-04-15 08:59, Roland Huettmann wrote:
> I think it is important to know how to handle "big" data with files, in
> memory and placing in fields. And of course in a future nearby 8.1
> version
> (or sooner?) those in need would love to see this taken care of in the
> engine.

I don't think the problem here is whether the engine has the facilities
to process large files - it clearly does given the number of examples of
how to do it that have been provided ;)

The issue here is how to keep a UI responsive on the Desktop whilst
running a long processing operation - this isn't a problem unique to
LiveCode it is one which pertinent to all languages on all OSes.

 From what I can understand of your use-case you are wanting to index an
*exceptionally large* file so that you can look up things from it at a
later date. Indeed, it seems to me that the problem you are trying to
solve falls into two pieces:

   1) Process the very large file and produce an index of 'points of
interest' in the file.

   2) Subsequently use the index to perform the main function of your
application.

With this in mind, I'd suggest considering separating these two parts
and run them in separate processes.
The foreground (UI) application can launch the second process to
generate the index and provide a simple progress update periodically
without having any impact on the performance of its job, and then when
that operation is completed the foreground application can load the
index to use.

The LiveCode Installer (albeit no longer used on Mac) works like this.
It is a single application which can run in two modes. In the foreground
mode you see the UI which you interact with, but when you finally get to
the 'install' button, when you click that it launches a background
process which runs a installation script - it feeds back progress
information back to the foreground process at suitable intervals so that
the UI can update a progress bar. (Admittedly the installer *has* to
work this way as the background 'slave' process has to run with
administrator rights, however it also happily solves the UI progress bar
update / cancellation problem though too).

This is basically what you could call a 'master/slave' pattern. You have
a master process which provides the UI and the application's main
function, and a slave process which is run and controlled by the master
during indexing operations. This model has all kinds of advantages - for
example, you can rerun the slave 'indexing' process whilst the user is
still using an existing index; or indeed launch several slave processes
to index multiple files at once.

In LiveCode such a pattern can be implemented in the master using an
'open process read/write process close process' loop; whilst all the
slave has to do is read from stdin and write to stdout.

The above might seem a little opaque - however, it might help to have a
look at:

   
https://github.com/livecode/livecode/blob/develop/builder/installer_utilities.livecodescript

Which is the main installer implementation. When run as slave, the only
handler which runs is 'runInstallerActions'; and when run as master
there is a UI which runs a 'send in time' loop 'installerMonitor' to
talk to the slave. The main entry point to the installer application is
a startup handler which decides (by looking at the command line
arguments) what it should do:

on startup
    local tAction

    if $1 is "install" then
       put "install" into tAction
    else if $1 is "doinstall" then
       put "doinstall" into tAction
    else if $1 is "uninstall" then
       put "uninstall" into tAction
    else if $1 is "douninstall" then
       put "douninstall" into tAction
    else
       set the itemDelimiter to slash
       if the last item of $0 contains "setup" then
          put "uninstall" into tAction
       else
          put "install" into tAction
       end if
    end if

    switch tAction
       case "install"
          if $2 is "noui" then
             runFacelessInstall
          else
             hide me
             send "runInstallerUI" to me in 0 millisecs
          end if
          break
       case "uninstall"
          if $2 is "noui" then
             runFacelessUninstall
          else
             hide me
             send "runUninstallerUI" to me in 0 millisecs
          end if
          break
       case "doinstall"
          runInstallerActions
          break
       case "douninstall"
          runUninstallerActions
          break
       default
          quit 1
    end switch
end startup

Here 'install' and 'uninstall' run the master UI (the default is
'install' if no arguments are specified); whilst doinstall/douninstall
run the slave.

Anyway, I'll try to find some time to distill out a skeleton application
which does this - it seems like it might be useful thing as an example
:)

Warmest Regards,

Mark.

P.S. One could argue that 'multi-threading' might help here - but it is
actually absolutely no different in effect from splitting into two
processes. Indeed, for this kind of thing separate processes is a much
much better idea as it completely insulates the master from the slave
and is much more failsafe as the two processes run in their own memory
spaces with their own resources and only communicate over a very thin
connection. i.e. Problems in one cannot propagate to the other. It also
provides more options in terms of running the indexing operation - it
doesn't necessarily need to be done in response to the master
application, indexing becomes an operation in its own right which can be
managed in any way which might be required.

--
Mark Waddingham ~ [hidden email] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Monte Goulding-2
In reply to this post by RH

> On 15 Apr 2016, at 4:59 PM, Roland Huettmann <[hidden email]> wrote:
>
> Though it does NOT work for me as I am reading much larger chunks of data
> using "read until..."
>
> (It is still running for 60 minutes now... I think I have to force-stop it.
> No... now it finished with 3,060,000. -:)
>
> I think it is important to know how to handle "big" data with files, in
> memory and placing in fields. And of course in a future nearby 8.1 version
> (or sooner?) those in need would love to see this taken care of in the
> engine.

Roland are you ensuring to encode your sentinel string as the same encoding as the file? I’m interested to know how much difference that makes as I think this could be the main source of any difference you are seeing between LC 6 and 7/8 on this.

One thing to remember is as much as we would like it to be the engine is not prescient. It does not know how much data it will need to read before it will find your sentinel string. What this means in practice is when reading until a string the engine needs to:
 - constantly resize the memory buffer when it nears overflow for data that is read in because it doesn’t know how big it needs to be
 - read one byte at a time then check if the last N bytes match your sentinel string

Obviously this is significantly more expensive than read for because the buffer is allocated once and the read can read the number of bytes you want and give it to you straight away because it doesn’t need to check it anything matches.

I second Mark’s recommendation to move the file parsing to a helper process if possible because no matter what this is going to be a bottleneck in your app.

Cheers

Monte
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
RH
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

RH
> One thing to remember is as much as we would like it to be the engine is
not prescient.
> It does not > know how much data it will need to read before it will find
your sentinel string.
> What this means in > practice is when reading until a string the engine
needs to:
>  - constantly resize the memory buffer when it nears overflow for data
 > that is read in because it > doesn’t know how big it needs to be
>  read one byte at a time then check if the last N bytes match your
sentinel string

Monte and Mark. Exactly this is the point! I completely agree.

What I tried to say - maybe I am not expressing it so well - is that by
experience so far it is WORKING WELL READING VERY LARGE FILES just using
 "READ ... FOR" in a large file and then processing later in a separate
step.

The problem I had was when testing the "READ ... UNTIL" - and you commented
this in depth.

I already have been successful before building an index and then reading
just pieces of data using the index. No odd behaviour. No "non
responsiveness".

YES Mark, I already was splitting the process! And even before writing to
this list, I had done that intuitively in a similar way. It was working and
it is working. And I can only suggest to anybody interested to follow your
advise and study the scripts you and other provided.

From all you said I get a picture which I did not have before in such
detail. And I hope others get it too. It makes us understand more and more.
So, all my thanks to your kind contributions.

I will follow up here...

Greetings, Roland













On 15 April 2016 at 10:05, Monte Goulding <[hidden email]> wrote:

>
> > On 15 Apr 2016, at 4:59 PM, Roland Huettmann <[hidden email]>
> wrote:
> >
> > Though it does NOT work for me as I am reading much larger chunks of data
> > using "read until..."
> >
> > (It is still running for 60 minutes now... I think I have to force-stop
> it.
> > No... now it finished with 3,060,000. -:)
> >
> > I think it is important to know how to handle "big" data with files, in
> > memory and placing in fields. And of course in a future nearby 8.1
> version
> > (or sooner?) those in need would love to see this taken care of in the
> > engine.
>
> Roland are you ensuring to encode your sentinel string as the same
> encoding as the file? I’m interested to know how much difference that makes
> as I think this could be the main source of any difference you are seeing
> between LC 6 and 7/8 on this.
>
> One thing to remember is as much as we would like it to be the engine is
> not prescient. It does not know how much data it will need to read before
> it will find your sentinel string. What this means in practice is when
> reading until a string the engine needs to:
>  - constantly resize the memory buffer when it nears overflow for data
> that is read in because it doesn’t know how big it needs to be
>  - read one byte at a time then check if the last N bytes match your
> sentinel string
>
> Obviously this is significantly more expensive than read for because the
> buffer is allocated once and the read can read the number of bytes you want
> and give it to you straight away because it doesn’t need to check it
> anything matches.
>
> I second Mark’s recommendation to move the file parsing to a helper
> process if possible because no matter what this is going to be a bottleneck
> in your app.
>
> Cheers
>
> Monte
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Roland Huettmann - Babanin GmbH - Switzerland www.babanin.com / roh@babanin.com
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Richard Gaskin
In reply to this post by Mark Waddingham-2
Mark Waddingham wrote:

> P.S. One could argue that 'multi-threading' might help here - but it is
> actually absolutely no different in effect from splitting into two
> processes. Indeed, for this kind of thing separate processes is a much
> much better idea as it completely insulates the master from the slave
> and is much more failsafe as the two processes run in their own memory
> spaces with their own resources and only communicate over a very thin
> connection. i.e. Problems in one cannot propagate to the other. It also
> provides more options in terms of running the indexing operation - it
> doesn't necessarily need to be done in response to the master
> application, indexing becomes an operation in its own right which can be
> managed in any way which might be required.

The more I learn about multiprocessing the less I'm interested in
multithreading, for all the reasons you mentioned.

The only place I still crave threading doesn't even need to be exposed
to us as such, but I believe would be VERY helpful: asynchronous
playback of GIF animations.

Right now even LC's GIF-based progress indicators hiccup, since they're
dependent on explicit slices of processing time given to it in between
other actions.

If there was some quick solution for adding an option to play GIFs in a
separate thread, so many UIs (including games) would be simpler and more
satisfying to build.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  [hidden email]                http://www.FourthWorld.com

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Richard Gaskin
In reply to this post by RH
Roland Huettmann wrote:

> < Richard: "You may find the sample stack attached to this report useful:
> <http://quality.livecode.com/show_bug.cgi?id=17210>"
>
> Richard, thanks a lot for the sample stack. I downloaded and tested
> "LargeFileRead.livecode". Unfortunately it also creates the same symptoms
> on Windows...

The responsiveness is a separate matter, well covered in Mark's replies.

Here my interest is in making sure your operation runs and efficiently
as it can.

Have you had a chance to explore measuring different executions times as
you adjust the kBufferSize constant at the top of the script?

Here on my Linux box I found, rather surprisingly, that a buffer as
small as 128k delivered optimal speeds.  In discussing this with Peter
Brett it turns out there are memory-mappings provided by the OS that
seem to play a role there.

It would be very interesting to me to learn what buffer size you find
optimal on Windows.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  [hidden email]                http://www.FourthWorld.com

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
RH
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

RH
Absolutely Richard

It changed the execution speed dramatically reducing the buffer size. I
really still need the time to set up various tests and check it carefully
trying to find the right size on my machine. Maybe 128k - we will see.

From Bernd - in an internal mail - he worked on my script and made that
happen changing the buffersize.: It is Windows. Same as you are reporting
about Linux.

Optimization here is the goal. Yes.

But also I would like to optimize the code if that gives additional
advantages.

Also to be noted by Bernd, the speed also dramatically increases using an
SSD. I am still using old fashioned hard drive (.

Roland





On 15 April 2016 at 17:41, Richard Gaskin <[hidden email]>
wrote:

> Roland Huettmann wrote:
>
> < Richard: "You may find the sample stack attached to this report useful:
>> <http://quality.livecode.com/show_bug.cgi?id=17210>"
>>
>> Richard, thanks a lot for the sample stack. I downloaded and tested
>> "LargeFileRead.livecode". Unfortunately it also creates the same symptoms
>> on Windows...
>>
>
> The responsiveness is a separate matter, well covered in Mark's replies.
>
> Here my interest is in making sure your operation runs and efficiently as
> it can.
>
> Have you had a chance to explore measuring different executions times as
> you adjust the kBufferSize constant at the top of the script?
>
> Here on my Linux box I found, rather surprisingly, that a buffer as small
> as 128k delivered optimal speeds.  In discussing this with Peter Brett it
> turns out there are memory-mappings provided by the OS that seem to play a
> role there.
>
> It would be very interesting to me to learn what buffer size you find
> optimal on Windows.
>
> --
>  Richard Gaskin
>  Fourth World Systems
>  Software Design and Development for the Desktop, Mobile, and the Web
>  ____________________________________________________________________
>  [hidden email]                http://www.FourthWorld.com
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Roland Huettmann - Babanin GmbH - Switzerland www.babanin.com / roh@babanin.com
Reply | Threaded
Open this post in threaded view
|

Re: LC7 and 8 - Non responsive processing large text files

Richard Gaskin
Roland Huettmann wrote:
 > Also to be noted by Bernd, the speed also dramatically increases
 > using an SSD. I am still using old fashioned hard drive (.

SSDs are indeed much faster.  After I put in even an old one in my
laptop, boot times are now under 9 seconds.

But there's no shame in using HDDs - when you need large capacity
there's no beating them on $/GB.

Looks like waiting until later year will pay off - SSDs are expected to
come very close to current HDD prices:
<http://siliconangle.com/blog/2015/12/02/consumer-ssds-will-be-almost-as-cheap-as-hdds-by-next-year/>

Even now they're not too bad: HDDs average around $0.06/GB, and I've
seen some SSDs on sale for as low as about $0.17/GB.

And where workflows can take advantage of it, hybrid SSD/HDD 3.5" drives
are very competitively priced.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  [hidden email]                http://www.FourthWorld.com


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
12