Splitting up a large text file with 'seek relative'

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Splitting up a large text file with 'seek relative'

Hugh Senior
I need to split large (1 Gig) log files for a client into more managable
chunks. The routine works fine, then (apparently) starts to slow down. I
suspect that I have not implemented 'seek' correctly, and that the file is
being read from zero each time, but would appreciate anyone's insights into
optimizing this utility...

Given tFilepath, write out 1Mb files sequentially numbered...
  put the hilite of btn "Binary" into isBinary
  if isBinary then open file tFilePath for binary read
  else open file tFilePath for text read
  set the numberFormat to "####"
  seek to 0 in file tFilePath
  repeat
    set the cursor to busy
    add 1 to n
    seek relative 0 in file tFilePath
    read from file tFilePath for 1000000
    put the result="eof" into isEOF
    if (it="") then exit repeat
    if isBinary then put it into URL("binfile:"& tDir&"/" &n& ".txt")
    else put it into URL("file:"& tDir&"/" &n& ".txt")
    if (isEOF OR the result <>"") then exit repeat
  end repeat
  close file tFilePath

Many appreciations in advance.

/H

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Splitting up a large text file with 'seek relative'

Bernard Devlin-2
Hi Hugh,

I've never used Rev for this kind of file I/O (in fact, I was surprised to
see that Rev has something like 'seek relative').  Anyway, if you are doing
this on a OS X or Linux box, why not use the 'split' command in a terminal
window - I've successfully used it before splitting up files that were
several gb long.  Using shell with something like that ought to be much
quicker and easier to do.  Of course, I'm also interested to see how this
works out if someone does have experience using 'seek relative' and chimes
in to help  ;-)

Bernard

On Thu, Jun 19, 2008 at 7:26 PM, Hugh Senior <[hidden email]>
wrote:

> I need to split large (1 Gig) log files for a client into more managable
> chunks. The routine works fine, then (apparently) starts to slow down. I
> suspect that I have not implemented 'seek' correctly, and that the file is
> being read from zero each time, but would appreciate
_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Splitting up a large text file with 'seek relative'

masmit
In reply to this post by Hugh Senior
Hugh, it strikes me that the "seek relative 0" might be redundant -  
and may be slowing things down.

Best,

Mark

On 19 Jun 2008, at 19:26, Hugh Senior wrote:

> I need to split large (1 Gig) log files for a client into more  
> managable chunks. The routine works fine, then (apparently) starts  
> to slow down. I suspect that I have not implemented 'seek'  
> correctly, and that the file is being read from zero each time, but  
> would appreciate anyone's insights into optimizing this utility...
>
> Given tFilepath, write out 1Mb files sequentially numbered...
>  put the hilite of btn "Binary" into isBinary
>  if isBinary then open file tFilePath for binary read
>  else open file tFilePath for text read
>  set the numberFormat to "####"
>  seek to 0 in file tFilePath
>  repeat
>    set the cursor to busy
>    add 1 to n
>    seek relative 0 in file tFilePath
>    read from file tFilePath for 1000000
>    put the result="eof" into isEOF
>    if (it="") then exit repeat
>    if isBinary then put it into URL("binfile:"& tDir&"/" &n& ".txt")
>    else put it into URL("file:"& tDir&"/" &n& ".txt")
>    if (isEOF OR the result <>"") then exit repeat
>  end repeat
>  close file tFilePath
>
> Many appreciations in advance.
>
> /H
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Splitting up a large text file with 'seek relative'

Hugh Senior
In reply to this post by Hugh Senior
You are right, but logging still shows a cumulative slowdown as each chunk
is 'read', and the computer slows to a crawl. Using 'read from ... for ...'
is even slower, however. (The source file is a 1 GIG binary text file)

Given tFilepath, write out 1Mb files sequentially numbered...
  put the hilite of btn "Binary" into isBinary
  if isBinary then open file tFilePath for binary read
  else open file tFilePath for text read
  set the numberFormat to "####" --| So file names have leading zeroes
  seek to 0 in file tFilePath
  repeat
    set the cursor to busy
    add 1 to n
    --| seek relative 0 in file tFilePath --| Redundant
    read from file tFilePath for 1000000
    put the result="eof" into isEOF
    if (it="") then exit repeat
    if isBinary then put it into URL("binfile:"& tDir&"/" &n& ".txt")
    else put it into URL("file:"& tDir&"/" &n& ".txt")
    if (isEOF OR the result <>"") then exit repeat
  end repeat
  close file tFilePath

Any further insights would be truly welcomed.

/H

----------------------------------------------
Hugh, it strikes me that the "seek relative 0" might be redundant -
and may be slowing things down.

Best,

Mark

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Splitting up a large text file with 'seek relative'

ron barber-2
In reply to this post by Hugh Senior
Hi Hugh,Just a thought about the approach rather than the code. What if you
did the same thing you are doing but first split the file in half viz read
for 5000000, or whatever. Then deal with each half, possibly even making
quarters or eighths. That way you are only dealing with the big file once?

HTH
Ron

On Fri, Jun 20, 2008 at 3:26 AM, Hugh Senior <[hidden email]>
wrote:

> I need to split large (1 Gig) log files for a client into more managable
> chunks. The routine works fine, then (apparently) starts to slow down. I
> suspect that I have not implemented 'seek' correctly, and that the file is
> being read from zero each time, but would appreciate anyone's insights into
> optimizing this utility...
>
> Given tFilepath, write out 1Mb files sequentially numbered...
>  put the hilite of btn "Binary" into isBinary
>  if isBinary then open file tFilePath for binary read
>  else open file tFilePath for text read
>  set the numberFormat to "####"
>  seek to 0 in file tFilePath
>  repeat
>   set the cursor to busy
>   add 1 to n
>   seek relative 0 in file tFilePath
>   read from file tFilePath for 1000000
>   put the result="eof" into isEOF
>   if (it="") then exit repeat
>   if isBinary then put it into URL("binfile:"& tDir&"/" &n& ".txt")
>   else put it into URL("file:"& tDir&"/" &n& ".txt")
>   if (isEOF OR the result <>"") then exit repeat
>  end repeat
>  close file tFilePath
>
> Many appreciations in advance.
>
> /H
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>
_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Splitting up a large text file with 'seek relative'

masmit
In reply to this post by Hugh Senior
Hugh, I just ran your handler on a 1gig file of random binary data -  
I didn't see any slowdown - I added a bit of benchmarking code:

     .....
    open file tFilePath for binary read
    set the numberformat to "####" --| So file names have leading zeroes
    put the millisecs into markerTime
    repeat
       set the cursor to busy
       add 1 to n
       read from file tFilePath for 1000000
       put the result="eof" into isEOF
       if (it="") then exit repeat
       if isBinary then put it into URL("binfile:"& tDir&"/" &n& ".txt")
       else put it into URL("file:"& tDir&"/" &n& ".txt")
       if n mod 100 = 0 or isEOF then
          put (the millisecs - markerTime) / 100  & " : " after timeList
          put the millisecs into markerTime
       end if
       if (isEOF or the result <>"") then exit repeat
    end repeat
    close file tFilePath

    put timeList

the output was : 0096 : 0094 : 0103 : 0103 : 0102 : 0104 : 0106 :  
0101 : 0107 : 0103 : 0048 :

As you can see - no significant slowdown. Is the hard disk you're  
writing to very full? Maybe it gets harder to find space as the loop  
goes on.

Best,

Mark

On 20 Jun 2008, at 08:31, Hugh Senior wrote:

> You are right, but logging still shows a cumulative slowdown as  
> each chunk is 'read', and the computer slows to a crawl. Using  
> 'read from ... for ...' is even slower, however. (The source file  
> is a 1 GIG binary text file)
>
> Given tFilepath, write out 1Mb files sequentially numbered...
>  put the hilite of btn "Binary" into isBinary
>  if isBinary then open file tFilePath for binary read
>  else open file tFilePath for text read
>  set the numberFormat to "####" --| So file names have leading zeroes
>  seek to 0 in file tFilePath
>  repeat
>    set the cursor to busy
>    add 1 to n
>    --| seek relative 0 in file tFilePath --| Redundant
>    read from file tFilePath for 1000000
>    put the result="eof" into isEOF
>    if (it="") then exit repeat
>    if isBinary then put it into URL("binfile:"& tDir&"/" &n& ".txt")
>    else put it into URL("file:"& tDir&"/" &n& ".txt")
>    if (isEOF OR the result <>"") then exit repeat
>  end repeat
>  close file tFilePath
>
> Any further insights would be truly welcomed.
>
> /H
>
> ----------------------------------------------
> Hugh, it strikes me that the "seek relative 0" might be redundant -
> and may be slowing things down.
>
> Best,
>
> Mark
> _______________________________________________
> use-revolution mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
Reply | Threaded
Open this post in threaded view
|

Re: Splitting up a large text file with 'seek relative'

Hugh Senior
In reply to this post by Hugh Senior
You are absolutely correct, Mark. A stupd blunder on my part was incorrectly
logging my split times so I have been chasing ghosts. No need for quicksort
binary splitting on the file sizes involved. Thank you.

/H

Mark Smith wrote:
> Hugh, I just ran your handler on a 1gig file of random binary data -
> I didn't see any slowdown - I added a bit of benchmarking code:

_______________________________________________
use-revolution mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution