How to extract specific columns/line items from a CSV file?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

How to extract specific columns/line items from a CSV file?

Keith Clarke-2
Hi folks,
I’ve a large number of sizeable CSV files from each of which I need to extract just 2 or 3 specific columns. Creating nested iterations through every item in every line seems somewhat inefficient - as does loading each full file into memory - so I feel I must be missing a trick here.

Does Livecode support any elegant way of directly manipulating or ‘querying’ (like SQL) delimited data, to extract numbered (or named) columns/items - such as 1(Id), 3(Name) & 15(Summary) - from in each line from a CSV container or direct from a file/URL?
Best,
Keith..
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to extract specific columns/line items from a CSV file?

Mike Bonner
If a file you can do this..

repeat for each line tline in url "file:yourfilename.csv"
 put item 1 of tLine & comma & item 3 of tline & comma & item 15 of tLine &
return after tExtracted
end repeat
delete the last char of tExtracted -- remove the extraneous return

Or you can put them into an array or whatever. It should be VERY fast.  If
the files are very large, you can instead open the file for read, read a
line, extract the data as above, and repeat till the end of the file. I
think using repeat for each with a file url loads the whole thing at once
then goes line to line, but I'm not sure of that.

On Tue, Sep 20, 2016 at 2:16 PM, Keith Clarke <[hidden email]> wrote:

> Hi folks,
> I’ve a large number of sizeable CSV files from each of which I need to
> extract just 2 or 3 specific columns. Creating nested iterations through
> every item in every line seems somewhat inefficient - as does loading each
> full file into memory - so I feel I must be missing a trick here.
>
> Does Livecode support any elegant way of directly manipulating or
> ‘querying’ (like SQL) delimited data, to extract numbered (or named)
> columns/items - such as 1(Id), 3(Name) & 15(Summary) - from in each line
> from a CSV container or direct from a file/URL?
> Best,
> Keith..
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to extract specific columns/line items from a CSV file?

dunbarxx
In reply to this post by Keith Clarke-2
What Mike said. The reason he said it is this:

 I made a 1,000,000 line CSV  dataset. I extracted two random columns from it. The new 1,000,000 line dataset took 33 ticks.

Craig Newman
Reply | Threaded
Open this post in threaded view
|

Re: How to extract specific columns/line items from a CSV file?

Keith Clarke-2
In reply to this post by Mike Bonner
Thanks Mike (& Craig) for confirming that iteration of some sort is the way to go.

It’s interesting that the url container can be addressed directly, instead of loading the file - I’d not seen that before. I’m currently pulling each file into a regular variable, so will probably stick with that as only a few (existing) lines of code would be saved and the net overall effect on timing / resources is probably marginal if the file gets loaded behind the scenes anyway.

As the column names/numbers vary by CSV file name - which I have built-out in a ColumnsOfInterest field - it looks like I’ll also be safe to iterate through these items to build the row extraction line dynamically, too.

Great, thanks again - I have a direction of travel.
Best,
Keith..
   

> On 20 Sep 2016, at 21:32, Mike Bonner <[hidden email]> wrote:
>
> If a file you can do this..
>
> repeat for each line tline in url "file:yourfilename.csv"
> put item 1 of tLine & comma & item 3 of tline & comma & item 15 of tLine &
> return after tExtracted
> end repeat
> delete the last char of tExtracted -- remove the extraneous return
>
> Or you can put them into an array or whatever. It should be VERY fast.  If
> the files are very large, you can instead open the file for read, read a
> line, extract the data as above, and repeat till the end of the file. I
> think using repeat for each with a file url loads the whole thing at once
> then goes line to line, but I'm not sure of that.
>
> On Tue, Sep 20, 2016 at 2:16 PM, Keith Clarke <[hidden email]> wrote:
>
>> Hi folks,
>> I’ve a large number of sizeable CSV files from each of which I need to
>> extract just 2 or 3 specific columns. Creating nested iterations through
>> every item in every line seems somewhat inefficient - as does loading each
>> full file into memory - so I feel I must be missing a trick here.
>>
>> Does Livecode support any elegant way of directly manipulating or
>> ‘querying’ (like SQL) delimited data, to extract numbered (or named)
>> columns/items - such as 1(Id), 3(Name) & 15(Summary) - from in each line
>> from a CSV container or direct from a file/URL?
>> Best,
>> Keith..
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to extract specific columns/line items from a CSV file?

Richard Gaskin
Keith Clarke wrote:

 > It’s interesting that the url container can be addressed directly,
 > instead of loading the file

   get url ("file:"& tFile)

...does the same as:

   open file tFile for read
   read from file until EOF
   close file tFile

It's just more convenient than using the three-line method (which can be
very useful if you need for fine-grained control over the read, such as
reading in chunks or from a particular offset).

Unless your files are much bigger than 100 MBs you should find loading
them into memory well worth the effort for the time saved parsing them thee.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  [hidden email]                http://www.FourthWorld.com

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to extract specific columns/line items from a CSV file?

Mike Bonner
Thanks Richard, that does answer my question.  Repeating for each line in
an external url does load the whole thing at once.  Someone remind me.. I
think I read that the "for each" no longer creates a second copy of the
data, is this correct?  Or did it ever make a duplicate?

On Tue, Sep 20, 2016 at 3:42 PM, Richard Gaskin <[hidden email]>
wrote:

> Keith Clarke wrote:
>
> > It’s interesting that the url container can be addressed directly,
> > instead of loading the file
>
>   get url ("file:"& tFile)
>
> ...does the same as:
>
>   open file tFile for read
>   read from file until EOF
>   close file tFile
>
> It's just more convenient than using the three-line method (which can be
> very useful if you need for fine-grained control over the read, such as
> reading in chunks or from a particular offset).
>
> Unless your files are much bigger than 100 MBs you should find loading
> them into memory well worth the effort for the time saved parsing them thee.
>
> --
>  Richard Gaskin
>  Fourth World Systems
>  Software Design and Development for the Desktop, Mobile, and the Web
>  ____________________________________________________________________
>  [hidden email]                http://www.FourthWorld.com
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to extract specific columns/line items from a CSV file?

Keith Clarke-2
In reply to this post by Richard Gaskin
Thanks Richard - the target CSVs are indeed in the 1-100MB range (containing mostly noise data for my current purpose) and it’s always useful to know, amongst the many roads available with LiveCode, which 'all lead to Rome', with the same ETA! :-)
Best,
Keith..

> On 20 Sep 2016, at 22:42, Richard Gaskin <[hidden email]> wrote:
>
> Keith Clarke wrote:
>
> > It’s interesting that the url container can be addressed directly,
> > instead of loading the file
>
>  get url ("file:"& tFile)
>
> ...does the same as:
>
>  open file tFile for read
>  read from file until EOF
>  close file tFile
>
> It's just more convenient than using the three-line method (which can be very useful if you need for fine-grained control over the read, such as reading in chunks or from a particular offset).
>
> Unless your files are much bigger than 100 MBs you should find loading them into memory well worth the effort for the time saved parsing them thee.
>
> --
> Richard Gaskin
> Fourth World Systems
> Software Design and Development for the Desktop, Mobile, and the Web
> ____________________________________________________________________
> [hidden email]                http://www.FourthWorld.com
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to extract specific columns/line items from a CSV file?

Peter TB Brett
In reply to this post by Keith Clarke-2


On 20/09/2016 21:16, Keith Clarke wrote:

> Hi folks, I’ve a large number of sizeable CSV files from each of
> which I need to extract just 2 or 3 specific columns. Creating nested
> iterations through every item in every line seems somewhat
> inefficient - as does loading each full file into memory - so I feel
> I must be missing a trick here.
>
> Does Livecode support any elegant way of directly manipulating or
> ‘querying’ (like SQL) delimited data, to extract numbered (or named)
> columns/items - such as 1(Id), 3(Name) & 15(Summary) - from in each
> line from a CSV container or direct from a file/URL?

I would probably just use Mike Kerner's libraries:
https://github.com/macMikey/csvToText

                                       Peter

--
Dr Peter Brett <[hidden email]>
LiveCode Technical Project Manager

lcb-mode for Emacs: https://github.com/peter-b/lcb-mode

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to extract specific columns/line items from a CSV file?

pmbrig
In reply to this post by Keith Clarke-2
I’m late to this party (these days too busy to peruse the list more than episodically).

Don’t know if this would be faster or slower or equivalent, but it’s a modular function:

function getColumns pData, pColNbrs, pDelim
   -- Extract specified columns from a table in order
   --    pData: a tab-delimited table (delim may be overridden with pDelim)
   --    pColNbrs: A comma separated list of columns or column ranges
   --       to be returned in the order indicated, for example "2,7,5"
   --       or an ascending column range: "3-5"
   --       or a descending column range: "5-3"
   --       or a combination "7,4-5,2,11-9"
   -- based on deleteColumns() by Hugh Senior & Michael Doub et. al., use-LC list
   --     adjusted by Peter M. Brigham, [hidden email]
   -- requires howMany()
   
   if pColNbrs = empty then return pData
   if pDelim = empty then put tab into pDelim
   -- first expand column ranges
   repeat for each item pCol in pColNbrs
      put offset("-",pCol) into dashPos
      if dashPos = 0 then
         put pCol & comma after tColsToReturn
      else if dashPos = 1 then
         -- for column numbers like -1, -2, etc.
         put howMany(pDelim,line 1 of pData) + 1 into nbrItems
         put pCol + 1 + nbrItems & comma after tColsToReturn
      else
         -- a column range
         put char 1 to dashPos-1 of pCol into firstColNbr
         put char dashPos+1 to -1 of pCol into lastColNbr
         if firstColNbr < lastColNbr then
            repeat with i = firstColNbr to lastColNbr
               put i & comma after tColsToReturn
            end repeat
         else
            repeat with i = firstColNbr down to lastColNbr
               put i & comma after tColsToReturn
            end repeat
         end if  
      end if      
   end repeat
   delete char -1 of tColsToReturn
   -- then extract columns in order
   set the columnDelimiter to pDelim
   split pData by column
   repeat for each item n in tColsToReturn
      add 1 to x
      put pData[n] into rData[x]
   end repeat
   combine rData by column
   return rData
end getColumns

function howmany pStr, pContainer, pCaseSens
   -- how many times pStr occurs in pContainer
   -- note that howmany("xx","xxxxxx") returns 3, not 5
   --     ie,  overlapping matches are not counted
   -- by Peter M. Brigham, [hidden email] — freeware
   
   if pCaseSens = empty then put false into pCaseSens
   set the casesensitive to pCaseSens
   if pStr is not in pContainer then return 0
   put len(pContainer) into origLength
   replace pStr with char 2 to -1 of pStr in pContainer
   return origLength - len(pContainer)
end howmany


> On Sep 20, 2016, at 5:14 PM, Keith Clarke <[hidden email]> wrote:
>
> Thanks Mike (& Craig) for confirming that iteration of some sort is the way to go.
>
> It’s interesting that the url container can be addressed directly, instead of loading the file - I’d not seen that before. I’m currently pulling each file into a regular variable, so will probably stick with that as only a few (existing) lines of code would be saved and the net overall effect on timing / resources is probably marginal if the file gets loaded behind the scenes anyway.
>
> As the column names/numbers vary by CSV file name - which I have built-out in a ColumnsOfInterest field - it looks like I’ll also be safe to iterate through these items to build the row extraction line dynamically, too.
>
> Great, thanks again - I have a direction of travel.
> Best,
> Keith..
>
>> On 20 Sep 2016, at 21:32, Mike Bonner <[hidden email]> wrote:
>>
>> If a file you can do this..
>>
>> repeat for each line tline in url "file:yourfilename.csv"
>> put item 1 of tLine & comma & item 3 of tline & comma & item 15 of tLine &
>> return after tExtracted
>> end repeat
>> delete the last char of tExtracted -- remove the extraneous return
>>
>> Or you can put them into an array or whatever. It should be VERY fast.  If
>> the files are very large, you can instead open the file for read, read a
>> line, extract the data as above, and repeat till the end of the file. I
>> think using repeat for each with a file url loads the whole thing at once
>> then goes line to line, but I'm not sure of that.
>>
>> On Tue, Sep 20, 2016 at 2:16 PM, Keith Clarke <[hidden email]> wrote:
>>
>>> Hi folks,
>>> I’ve a large number of sizeable CSV files from each of which I need to
>>> extract just 2 or 3 specific columns. Creating nested iterations through
>>> every item in every line seems somewhat inefficient - as does loading each
>>> full file into memory - so I feel I must be missing a trick here.
>>>
>>> Does Livecode support any elegant way of directly manipulating or
>>> ‘querying’ (like SQL) delimited data, to extract numbered (or named)
>>> columns/items - such as 1(Id), 3(Name) & 15(Summary) - from in each line
>>> from a CSV container or direct from a file/URL?
>>> Best,
>>> Keith..
>>> _______________________________________________
>>> use-livecode mailing list
>>> [hidden email]
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> _______________________________________________
>> use-livecode mailing list
>> [hidden email]
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to extract specific columns/line items from a CSV file?

Dr. Hawkins
In reply to this post by Keith Clarke-2
On Tue, Sep 20, 2016 at 1:16 PM, Keith Clarke <[hidden email]> wrote:

> I’ve a large number of sizeable CSV files from each of which I need to
> extract just 2 or 3 specific columns. Creating nested iterations through
> every item in every line seems somewhat inefficient - as does loading each
> full file into memory - so I feel I must be missing a trick here.


I'm late to the party, but if it's a one-off, open in a spreadsheet, select
 what you want, and paste into a field in livecode.



--
Dr. Richard E. Hawkins, Esq.
(702) 508-8462
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: How to extract specific columns/line items from a CSV file?

Michael Doub
There is a function in the masterlibrary that extracts columns.

https://www.dropbox.com/s/3wpwn3hfbmpl7sk/MasterLibrary.livecode?dl=0

-= Mike

  Original Message  
From: [hidden email]
Sent: September 28, 2016 8:14 PM
To: [hidden email]
Reply-to: [hidden email]
Subject: Re: How to extract specific columns/line items from a CSV file?

On Tue, Sep 20, 2016 at 1:16 PM, Keith Clarke <[hidden email]> wrote:

> I’ve a large number of sizeable CSV files from each of which I need to
> extract just 2 or 3 specific columns. Creating nested iterations through
> every item in every line seems somewhat inefficient - as does loading each
> full file into memory - so I feel I must be missing a trick here.

I'm late to the party, but if it's a one-off, open in a spreadsheet, select
what you want, and paste into a field in livecode.

--
Dr. Richard E. Hawkins, Esq.
(702) 508-8462
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode