A suggestion for an in-memory database, following up on Richard’s experiment

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

A suggestion for an in-memory database, following up on Richard’s experiment

J. Landman Gay via use-livecode
This is a different idea from the other thread, so I am starting a new thread.

Imagine the following scenario:

Each record is saved as a separate text file.

LC loads up all the text files into a single massive global array.

LC listens to a port for DB requests.

When a DB request comes in, LC makes changes in the global array and retrieves data from the global array. It sends the results back almost instantly.

LC sends the new version of the file over to Apache which is listening on a different port.

Apache spawns a new thread for each concurrent request. Each thread simply takes the record and saves it to its file.

In this way, LC operates as an in-memory database, which is supposed to be very fast, and Apache does the multithreading to back up each record in the background.

Assuming a powerful server with lots of RAM, could this allow us to handle massive concurrency while using LC as the server?

Apologies if these questions are getting tedious - they are relevant to my current project.

Sent from my iPhone
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: A suggestion for an in-memory database, following up on Richard’s experiment

J. Landman Gay via use-livecode
no reason why it wouldn't work... but keep 2 things in mind

if you use community edition, the number of HHTP Requests you can make to
the same domain at one time is exactly 1.

For a system like this, it could be better to save to an sql because
seperate TXT files would be a lot of IO calls.

If you are on Business License, its less of an issue....

 I'd be curious to see the performance of using LC Arrays as Database.




On Thu, Mar 1, 2018 at 11:07 AM, Jonathan Lynch via use-livecode <
[hidden email]> wrote:

> This is a different idea from the other thread, so I am starting a new
> thread.
>
> Imagine the following scenario:
>
> Each record is saved as a separate text file.
>
> LC loads up all the text files into a single massive global array.
>
> LC listens to a port for DB requests.
>
> When a DB request comes in, LC makes changes in the global array and
> retrieves data from the global array. It sends the results back almost
> instantly.
>
> LC sends the new version of the file over to Apache which is listening on
> a different port.
>
> Apache spawns a new thread for each concurrent request. Each thread simply
> takes the record and saves it to its file.
>
> In this way, LC operates as an in-memory database, which is supposed to be
> very fast, and Apache does the multithreading to back up each record in the
> background.
>
> Assuming a powerful server with lots of RAM, could this allow us to handle
> massive concurrency while using LC as the server?
>
> Apologies if these questions are getting tedious - they are relevant to my
> current project.
>
> Sent from my iPhone
> _______________________________________________
> use-livecode mailing list
> [hidden email]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
Reply | Threaded
Open this post in threaded view
|

Re: Re: A suggestion for an in-memory database, following up on Richard’s experiment

J. Landman Gay via use-livecode
Tom Glod wrote:

 > no reason why it wouldn't work... but keep 2 things in mind
 >
 > if you use community edition, the number of HHTP Requests you can make
 > to the same domain at one time is exactly 1.

How many are needed from a single client?

While it's unfortunate the LiveCode Community Edition is the only open
source scripting language I know of that doesn't have CURL support
available in its community, that's a problem that can be fixed here as
it has been for others: it would be possible for someone in the
community to wrap the most relevant subset affordably.  HTTPS alone
would take care of at least 80% of real-world needs in our API-driven world.

And in the meantime, the limit for domain access is in the scripted
libURL, so a modded version could up that (though I wouldn't take it too
high, for the same reasons browsers default to a small number of
connections to a single host).


 > For a system like this, it could be better to save to an sql because
 > seperate TXT files would be a lot of IO calls.

How would the method used for moving data from disk to socket affect
throughput for socket comms?

Ultimately all persistent storage needs to write to disk.  MySQL is very
efficient, but does a lot of complex B-tree traversal. It's possible
(for _very_ limited use cases) to outperform it for some forms of simple
retrieval in LC script.

You'd never match even a small fraction of the full scope of DB features
without losing that advantage many times over.  But if you knew in
advance that the only thing you ever needed to do was simple retrieval
your options are broad. Even writes aren't too bad under some circumstances.

Another concern for larger systems may be inode depletion:  if you have
a lot of records and each record is a separate file, unless you tune the
file system from its defaults you're limited to roughly total-disk-kb/4
for the number of files (Ext4 and most others these days use a default
4k block size).


 >  I'd be curious to see the performance of using LC Arrays as Database.

Poor as CGI, promising as daemon.

The ability to serialize arrays in LC is very nice, but as intensive as
one would imagine for the task:  beyond the ftstat and fread, it needs
to parse the data, extracting each element from length indicators,
translating numbers (which are serialized in binary form) by type
indicators, and tucking what it finds into an array, key and element by
key and element.  Certainly faster in the engine's machine code than
trying to do it in script, but it's a lot of work no matter who's doing it.

I was doing some measurements on this the other day, exploring options
for server storage that might be reasonably performant while more
portable than MySQL (and unencumbered by GPL in case I decide to ship a
complete solution from it).

One file was plain text, with keys longer than we commonly find but much
shorter than the max of 255 (35 chars) as item 1 of each line, and a
10-character integer as the value in item 2.

The second file was that same data in array form, stored on disk as LSON.

The test was for CGI, so each iteration reads the file from disk and
obtains the value associated with what happens to be the last key in the
text file.  I chose the last key specifically because it would be the
worst case for lineoffset, while of course for arrays it makes almost no
difference.

But even weighted against using lineoffset in a text file, the overhead
of arrayDecode more than ate up any benefits of using arrays for a
simple single lookup:  the LSON file took nearly 8 times longer for
100,000 keys:

Text:  21.8 ms
LSON: 167.9 ms

All that said, the overhead of LSON only applies for CGIs, where each
request is effectively working from a cold boot, and any files used need
to be read and unpacked each time.

As a daemon, the array would already be in memory, completely avoiding
the overhead of deserialization.

In a broad sense that's more or less how MongoDB works: a key-value
store in which the index is RAM-bound, with data on disk found by
pointers in the index.

Using a CouchDB-like logfile method (append is among the faster disk
write options), one could get pretty good performance for storing any
arbitrary data; kinda like have one big array on disk, but with the
added benefit of built-in versioning.

But this is ideal only for limited use cases, in which both of these
conditions are met: shared hosting where you have no control over the
DBs you can use, a preference for document-style NonSQL storage.

If you're really concerned about C10k, you're probably not on a shared
host (or you'll soon find out why you don't want to be on a shared host
for that <g>).

And if you're on a well-equipped VPS or dedicated server, there's
probably no reason why you wouldn't just use MongoDB or CouchDB if you
prefer those.  Compiled to machine code they'll give you not only far
better performance than any scripted solution, but far more efficient
and flexible options for managing the other half of most NonSQL stores,
materialized views.

TL;DR:

I appreciate the desire for LC-based server components more than most,
but given the performance advantage of any dedicated storage option
those are better for scalable systems.

And even as middleware, LiveCode Server is great for small low-load
systems, but given the blinding speed of PHP7 its advantages make it the
clear winner among scripting languages where performance is critical.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  [hidden email]                http://www.FourthWorld.com

_______________________________________________
use-livecode mailing list
[hidden email]
Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode