read from file until a line begins with a certain word

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

read from file until a line begins with a certain word

Bob Sneidar via use-livecode
Hi,

until today i used always put URL to read a complete file into memory. But now i have to process  really large text files with a size of 900 - 1500 MB.
I know i can read a file until EOF or so.

But how would i read a file until a line that starts with a certain keyword, e.g. mstart. I need to read until the line before that line which starts with mstart.
And then read from that “mstart” line until the next line before “mstart”.
Do i have to read line by line and check if the line starts with that keyword or is there also an other way?

Maybe i am thinking to complicated.

Regards,
Matthias



Matthias Rebbe
+49 5741 310000
‌wirmachen.software <http://wirmachen.software/>‌

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
I would just read the whole file and then let LC truncate the text following
the keyword sentence in the usual ways. I bet even with a couple of
gigabytes this would not take much time.

Craig Newman



--
Sent from: http://runtime-revolution.278305.n4.nabble.com/Revolution-User-f278306.html

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
One thought, what if you read overlapping chunks. Like, read from 0-10000, then from 9900-20000, etc. If the overlap is longer than your test string you would still be able to pick it up, in the cases where the string spanned the chunk boundary.

First test could be to read in 10000 chunks and time how long that takes compared to reading in 1000000 at a time, or 1000000000 at a time. There may be a sweet spot where reading in the text is fast enough to not bother with reading in smaller chunks.


> On Sep 16, 2017, at 9:11 PM, dunbarx via use-livecode <[hidden email]> wrote:
>
> I would just read the whole file and then let LC truncate the text following
> the keyword sentence in the usual ways. I bet even with a couple of
> gigabytes this would not take much time.
>
> Craig Newman
>
>
>
> --
> Sent from: http://runtime-revolution.278305.n4.nabble.com/Revolution-User-f278306.html
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
In reply to this post by Bob Sneidar via use-livecode
On 9/16/2017 8:50 PM, Matthias Rebbe via use-livecode wrote:

> Hi,
>
> until today i used always put URL to read a complete file into memory. But now i have to process  really large text files with a size of 900 - 1500 MB.
> I know i can read a file until EOF or so.
>
> But how would i read a file until a line that starts with a certain keyword, e.g. mstart. I need to read until the line before that line which starts with mstart.
> And then read from that “mstart” line until the next line before “mstart”.
> Do i have to read line by line and check if the line starts with that keyword or is there also an other way?
>
> Maybe i am thinking to complicated.
>

Why can you not use the
read from file tFilePath until "mstart"
of
read from {file /pathName/ | stdin} [at /start/] {until {/string/ | end
| EOF | empty} | for /amount/ [/chunkType/]} [in /time/]

If you need to back up to reread the "mstart" string you can then use
Seek to adjust the read position if needed.


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
What i forgot to mention was that the keyword also exists within the lines. But i have to find/read until a line which starts with mstart.

But anyway i will play around a little bit.

Thanks so far for all your comments.

Regards,
Matthias


Matthias Rebbe
+49 5741 310000
‌wirmachen.software <http://wirmachen.software/>‌

> Am 17.09.2017 um 12:55 schrieb Paul Dupuis via use-livecode <[hidden email] <mailto:[hidden email]>>:
>
> On 9/16/2017 8:50 PM, Matthias Rebbe via use-livecode wrote:
>> Hi,
>>
>> until today i used always put URL to read a complete file into memory. But now i have to process  really large text files with a size of 900 - 1500 MB.
>> I know i can read a file until EOF or so.
>>
>> But how would i read a file until a line that starts with a certain keyword, e.g. mstart. I need to read until the line before that line which starts with mstart.
>> And then read from that “mstart” line until the next line before “mstart”.
>> Do i have to read line by line and check if the line starts with that keyword or is there also an other way?
>>
>> Maybe i am thinking to complicated.
>>
>
> Why can you not use the
> read from file tFilePath until "mstart"
> of
> read from {file /pathName/ | stdin} [at /start/] {until {/string/ | end
> | EOF | empty} | for /amount/ [/chunkType/]} [in /time/]
>
> If you need to back up to reread the "mstart" string you can then use
> Seek to adjust the read position if needed.
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email] <mailto:[hidden email]>
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode <http://lists.runrev.com/mailman/listinfo/use-livecode>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
So a line starting with "mstart" would be cr&mstart

concatenating a cr (or whatever line delimiter the file uses if opening
as binary) before mStart and reading from file until that string will
find all instances but the 1st line of the file (it the file starts with
mstart)


On 9/17/2017 7:33 AM, Matthias Rebbe via use-livecode wrote:

> What i forgot to mention was that the keyword also exists within the lines. But i have to find/read until a line which starts with mstart.
>
> But anyway i will play around a little bit.
>
> Thanks so far for all your comments.
>
> Regards,
> Matthias
>
>
> Matthias Rebbe
> +49 5741 310000
> ‌wirmachen.software <http://wirmachen.software/>‌
>
>> Am 17.09.2017 um 12:55 schrieb Paul Dupuis via use-livecode <[hidden email] <mailto:[hidden email]>>:
>>
>> On 9/16/2017 8:50 PM, Matthias Rebbe via use-livecode wrote:
>>> Hi,
>>>
>>> until today i used always put URL to read a complete file into memory. But now i have to process  really large text files with a size of 900 - 1500 MB.
>>> I know i can read a file until EOF or so.
>>>
>>> But how would i read a file until a line that starts with a certain keyword, e.g. mstart. I need to read until the line before that line which starts with mstart.
>>> And then read from that “mstart” line until the next line before “mstart”.
>>> Do i have to read line by line and check if the line starts with that keyword or is there also an other way?
>>>
>>> Maybe i am thinking to complicated.
>>>
>> Why can you not use the
>> read from file tFilePath until "mstart"
>> of
>> read from {file /pathName/ | stdin} [at /start/] {until {/string/ | end
>> | EOF | empty} | for /amount/ [/chunkType/]} [in /time/]
>>
>> If you need to back up to reread the "mstart" string you can then use
>> Seek to adjust the read position if needed.
>>
>>
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email] <mailto:[hidden email]>
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode <http://lists.runrev.com/mailman/listinfo/use-livecode>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
In reply to this post by Bob Sneidar via use-livecode
The fact that the keyword both starts lines and is also embedded within lines
makes me believe even more that the required processing work within LC, and
not in gadgetry in the "read" command.

A keyword that starts a sentence may not have a return in front of it. That
would only occur if it started a paragraph.

Craig



--
Sent from: http://runtime-revolution.278305.n4.nabble.com/Revolution-User-f278306.html

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
In reply to this post by Bob Sneidar via use-livecode
On 9/17/17 8:18 AM, Paul Dupuis via use-livecode wrote:
> concatenating a cr (or whatever line delimiter the file uses if opening
> as binary) before mStart and reading from file until that string will
> find all instances but the 1st line of the file (it the file starts with
> mstart)

I think it should find the first line as well:

on parseFile
   put "/path/to/file" into tFile
   put cr & "mstart" into tDelim
   open file tFile for read
   repeat until eof
     read from file tFile until tDelim
     -- process the text here
   end repeat
end parseFile

The only hitch is that the last word of the retrieved text will be
"mstart" preceded by a cr. But it wouldn't be hard to delete that and
add it before the next retrieval, since the delimiter is a constant.

--
Jacqueline Landman Gay         |     [hidden email]
HyperActive Software           |     http://www.hyperactivesw.com

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
Jacque,

thanks for that code sample. I was not aware that i can use EOF in a repeat loop as condition. I always thought that it only can be used in a read command like

read from file MyFile until EOF


Thanks again to all who gave feedback. It helped me a lot.

Regards,

Matthias

Matthias Rebbe
+49 5741 310000
‌wirmachen.software <http://wirmachen.software/>‌

> Am 17.09.2017 um 21:38 schrieb J. Landman Gay via use-livecode <[hidden email] <mailto:[hidden email]>>:
>
> On 9/17/17 8:18 AM, Paul Dupuis via use-livecode wrote:
>> concatenating a cr (or whatever line delimiter the file uses if opening
>> as binary) before mStart and reading from file until that string will
>> find all instances but the 1st line of the file (it the file starts with
>> mstart)
>
> I think it should find the first line as well:
>
> on parseFile
>  put "/path/to/file" into tFile
>  put cr & "mstart" into tDelim
>  open file tFile for read
>  repeat until eof
>    read from file tFile until tDelim
>    -- process the text here
>  end repeat
> end parseFile
>
> The only hitch is that the last word of the retrieved text will be "mstart" preceded by a cr. But it wouldn't be hard to delete that and add it before the next retrieval, since the delimiter is a constant.
>
> --
> Jacqueline Landman Gay         |     [hidden email] <mailto:[hidden email]>
> HyperActive Software           |     http://www.hyperactivesw.com <http://www.hyperactivesw.com/>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email] <mailto:[hidden email]>
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode

> On 18 Sep 2017, at 6:13 am, Matthias Rebbe via use-livecode <[hidden email]> wrote:
>
> thanks for that code sample. I was not aware that i can use EOF in a repeat loop as condition. I always thought that it only can be used in a read command like

I was not aware of it either and can’t see anything in the engine that indicates it should work.. perhaps I’m missing something… `eof` is a constant. Really the repeat loop doesn’t know which file the `eof` might be referring to so it would be pretty weird syntax.

As far as I know you would need to use something like:

local tResult
repeat until tResult is eof
  read from file tFile until tDelim
  put the result into tResult
  -- process the text here
end repeat

Cheers

Monte
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
Thanks for mentioning this.
At the moment i stop reading the file after the tenth  finding. So i did not yet run into a problem.
Adjusted my script now.

Regards,
Matthias



Matthias Rebbe
+49 5741 310000
‌wirmachen.software <http://wirmachen.software/>‌

> Am 17.09.2017 um 23:15 schrieb Monte Goulding via use-livecode <[hidden email] <mailto:[hidden email]>>:
>
>
>> On 18 Sep 2017, at 6:13 am, Matthias Rebbe via use-livecode <[hidden email] <mailto:[hidden email]>> wrote:
>>
>> thanks for that code sample. I was not aware that i can use EOF in a repeat loop as condition. I always thought that it only can be used in a read command like
>
> I was not aware of it either and can’t see anything in the engine that indicates it should work.. perhaps I’m missing something… `eof` is a constant. Really the repeat loop doesn’t know which file the `eof` might be referring to so it would be pretty weird syntax.
>
> As far as I know you would need to use something like:
>
> local tResult
> repeat until tResult is eof
>  read from file tFile until tDelim
>  put the result into tResult
>  -- process the text here
> end repeat
>
> Cheers
>
> Monte
> _______________________________________________
> use-livecode mailing list
> [hidden email] <mailto:[hidden email]>
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
In reply to this post by Bob Sneidar via use-livecode
You're right of course, I had a thinko.

On 9/17/17 4:15 PM, Monte Goulding via use-livecode wrote:

>
>> On 18 Sep 2017, at 6:13 am, Matthias Rebbe via use-livecode <[hidden email]> wrote:
>>
>> thanks for that code sample. I was not aware that i can use EOF in a repeat loop as condition. I always thought that it only can be used in a read command like
>
> I was not aware of it either and can’t see anything in the engine that indicates it should work.. perhaps I’m missing something… `eof` is a constant. Really the repeat loop doesn’t know which file the `eof` might be referring to so it would be pretty weird syntax.
>
> As far as I know you would need to use something like:
>
> local tResult
> repeat until tResult is eof
>    read from file tFile until tDelim
>    put the result into tResult
>    -- process the text here
> end repeat


--
Jacqueline Landman Gay         |     [hidden email]
HyperActive Software           |     http://www.hyperactivesw.com


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
On 9/17/17 7:16 PM, J. Landman Gay via use-livecode wrote:
> You're right of course, I had a thinko.

It should have been this, btw:

on parseFile
   put "/path/to/file" into tFile
   put cr & "mstart" into tDelim
   open file tFile for read
   repeat
     read from file tFile until tDelim
     if it = "" then exit repeat
     -- process the text here
   end repeat
   close file tFile
end parseFile

--
Jacqueline Landman Gay         |     [hidden email]
HyperActive Software           |     http://www.hyperactivesw.com

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: read from file until a line begins with a certain word

Bob Sneidar via use-livecode
In reply to this post by Bob Sneidar via use-livecode
How about read for x characters, put the last line in a buffer and remove it from the read text, process the read text, read the next block, prepend the buffer line.

Bob S


> On Sep 16, 2017, at 17:50 , Matthias Rebbe via use-livecode <[hidden email]> wrote:
>
> Hi,
>
> until today i used always put URL to read a complete file into memory. But now i have to process  really large text files with a size of 900 - 1500 MB.
> I know i can read a file until EOF or so.
>
> But how would i read a file until a line that starts with a certain keyword, e.g. mstart. I need to read until the line before that line which starts with mstart.
> And then read from that “mstart” line until the next line before “mstart”.
> Do i have to read line by line and check if the line starts with that keyword or is there also an other way?
>
> Maybe i am thinking to complicated.
>
> Regards,
> Matthias
>
>
>
> Matthias Rebbe

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode