MetaKit for Python

The structured database which fits in the palm of your hand

[ Terminology | Installation | Getting started | Mk4py Reference ]

What it is - MetaKit is an embeddable database which runs on Unix, Windows, Macintosh, and other platforms. It lets you build applications which store their data efficiently, in a portable way, and which will not need a complex runtime installation. In terms of the data model, MetaKit takes the middle ground between RDBMS, OODBMS, and flat-file databases - yet it is quite different from each of them.

What it isn't - MetaKit is not: 1) an SQL database, 2) multi-user, 3) scalable to gigabytes, 4) proprietary software, 5) a toy.

Technology - Everything is stored variable-sized yet with efficient positional row access. Changing an existing datafile structure is as simple as re-opening it with that new structure. All changes are transacted. You can mix and match software written in C++, Python, and Tcl. Things can't get much more flexible...

Python - The extension for Python is called "Mk4py". It provides a lower-level API for the Metakit C++ core extension than an earlier version of this interface, and uses SCXX by Gordon McMillan as C++ glue interface.

Mk4py 2.01 - is the latest release. The homepage points to a download area with pre-compiled shared libraries for Unix, Windows, and Macintosh. The MetaKit source distribution includes this documentation, the Mk4py C++ source code, a "MkMemoIO.py" class which provides efficient and fail-safe I/O (therefore also pickling) using MetaKit memo fields, and a few more goodies.

Changes since 2.0 - Mk4py (which at one point was called MkWrap) is now part of MetaKit 2.0, and adds:

  • the view.store(...) call is gone, because it was dropped from the C++ core
  • a few new view operators (see http://www.equi4.com/metakit/wiki.cgi/69.html)
  • even more view operators, not yet documented - you'll have to do dir(view)
  • a fix for a free space manageemnt problem in the C++ core library
  • see the change log for further details (it's in the file called "CHANGES")

    License and support - MetaKit 2.01 is distributed under the liberal X/MIT-style open source license. Commercial support is available through an Enterprise License. See the license page for details.

    Credits - Are due to Gordon McMillan for not stopping at the original Mk4py and coming up with a more Pythonic interface, and to Christian Tismer for pushing Mk4py way beyond its design goals. Also to GvR and the Python community for taking scripting to such fascinating heights...

    Updates - The latest version of this document is at http://www.equi4.com/metakit/python.html


    Terminology

    There are several ways to say the same thing, depending on where you're coming from. For example, the terms table, list, collection, array, sequence, and vector all denote a more or less similar concept. To help avoid confusion, MetaKit uses a simple (but hopefully precise) terminology.

    The terms adopted by MetaKit can be summarized as follows:

    A few more comments about the semantics of MetaKit:


    Installation

    1. Download the latest version from http://www.equi4.com/pub/download.html
    2. On Unix, rename the appropriate compiled extension to "Mk4py.so" (on Win/Mac, use the corresponding file)
    3. Do a small test, by running "demo.py". If all is well, you should get some self-explanatory output
    4. Place the extension somewhere on Python's module search path (or just leave it in ".")


    Getting started

    Create a database:
    import Mk4py
    mk = Mk4py
    db = mk.Storage("datafile.mk",1)
    Create a view (this is the MetaKit term for "table"):
    vw = db.getas("people[first:S,last:S,shoesize:I]")
    Add two rows (this is the MetaKit term for "record"):
    vw.append(first='John',last='Lennon',shoesize=44)
    vw.append(first='Flash',last='Gordon',shoesize=42)
    Commit the changes to file:
    db.commit()
    Show a list of all people:
    for r in vw: print r.first, r.last, r.shoesize
    Show a list of all people, sorted by last name:
    for r in vw.sort(vw.last): print r.first, r.last, r.shoesize
    Show a list of all people with first name 'John':
    for r in vw.select(first='John'): print r.first, r.last, r.shoesize


    Mk4py Reference

    1. Module functions
    2. Storage objects
    3. View objects
    4. Derived views
    5. View operations
    6. Rowref objects
    7. Property objects

    1. Module functions

    These functions live at the module level. You can use them as described below after executing the following preamble:
         import Mk4py
         mk = Mk4py
         del Mk4py # tidy up

    SYNOPSYS

    db = mk.Storage()
    Create an in-memory database (can't use commit/rollback)
    db = mk.Storage(file)
    Use a specified file object to build the storage on
    db = mk.Storage(name, roflag)
    Open file, create if absent and rwflag is non-zero. Open read-only and shared if roflag is non-zero, else r/w and exclusively (the file will be created if needed).
    vw = mk.View()
    Create a standalone view; not in any storage object
    pr = mk.Property(type, name)
    Create a property (a column, when associated to a view)
    vw = mk.Wrap(sequence, proplist, byPos=0)
    Wraps a Python sequence as a read-only view
    ADDITIONAL DETAILS
    Storage - When given a single argument, the file object must be a real stdio file, not a class implementing the file r/w protocol. When the storage object is destroyed (such as with 'db = None'), the associated datafile will be closed. Be sure to keep a reference to it around as long as you use it.

    Wrap - This call can be used to wrap any Python sequence, it assumes that each item is either a dictionary or an object with attribute names corresponding to the property names. Alternately, if byPos is nonzero, each item can be a list or tuple - they will then be accessed by position instead. This mechanism can be used for joins and other view operations.

    2. Storage objects

    SYNOPSYS
    vw = storage.getas(description)
    Locate, define, or re-define a view stored in a storage object
    vw = storage.view(viewname)
    The normal way to retrieve an existing view
    storage.rollback()
    Revert data and structure as was last committed to disk
    storage.commit()
    Permanently commit data and structure changes to disk
    ds = storage.description(viewname='')
    The description string is described under getas
    vw = storage.contents()
    Returns the View which holds the meta data for the Storage.
    storage.autocommit()
    Commit changes automatically when the storage object goes away
    storage.load(fileobj)
    Replace storage contents with data from file (or any other object supporting read)
    storage.save(fileobj)
    Serialize storage contents to file (or any other object supporting write)
    ADDITIONAL DETAILS
    contents - Advanced use only!

    description - A description of the entire storage is retured if no viewname is specified, otherwise just the specified top-level view.

    getas - Side-effects: the structure of the view is changed.
    Notes: Normally used to create a new View, or alter the structure of an existing one.
    A description string looks like:
         "people[name:S,addr:S,city:S,state:S,zip:S]"
    That is "<viewname>[<propertyname>:<propertytype>...]"
    Where the property type is one of:
    Iadaptable integer (becomes Python int)
    FC float (becomes Python float)
    DC double (is a Python float)
    SC null terminated string (becomes Python string)
    BC array of bytes (becomes Python string)
    MC string (long) (becomes Python string)

    3. View objects

    View implements sequence (list) methods, including slicing, concatentation etc. They behave as a sequence of "rows", which in turn have "properties". Indexing (getitem) returns a reference to a row, not a copy.
         r = view[0]
         r.name = 'Julius Caesar'
         view[0].name # will yield 'Julius Caesar'
    Slices return copies. You can create an empty view with the same structure as another view with:
         v2 = v[0:0]
    Setting a slice changes the view:
         v[:] = [] # empties the view
    View supports getattr, which returns a Property (eg view.shoesize can be used to refer to the shoesize column). Views can be obtained from Storage objects: view = db.view('inventory') or from other views (see select, sort, flatten, join, project...) or empty, columnless views can be created: vw = Mk4py.View()

    SYNOPSYS

    view.insert(index, obj)
    Coerce object to a Row and insert at index in View
    ix = view.append(obj)
    Object is coerced to Row and added to end of View
    view.delete(index)
    Row at index removed from View
    lp = view.structure()
    Return a list of property objects
    cn = view.addproperty(fileobj)
    Define a new property, return its column position
    ADDITIONAL DETAILS
    addproperty - This adds properties which do not persist when committed. To make them persist, you should have used storage.getas() when defining the view.

    append - Also support keyword args (colname=value...).

    insert - coercion to a Row is driven by the View's columns, and works for:
    dictionaries(column name -> key)
    instances(column name -> attribute name)
    lists(column number -> list index) - watch out!

    4. Derived views

    SYNOPSYS
    vw = view.select(criteria...)
    Return a view which has fields matching the given criteria
    vw = view.select(low, high)
    Return a view with rows in the specified range (inclusive)
    vw = view.sort()
    Sort view in "native" order, i.e. the definition order of its keys
    vw = view.sort(property...)
    Sort view in the specified order
    vw = view.sortrev((propall...), (proprev...))
    Sort view in specified order, with optionally some properties in reverse
    vw = view.project(property...)
    Returns a derived view with only the named columns
    ADDITIONAL DETAILS
    select - Example selections, returning the corresponding subsets:
         inventory.select(shoesize=44)
         inventory.select({'shoesize':40},{'shoesize':43})
         inventory.select({},{'shoesize':43})
    The derived view is "connected" to the base view. Modifications of rows in the derived view are reflected in the base view

    sort - Example, returning the sorted permutation:
         inventory.sort(inventory.shoesize)
    See notes for select concerning changes to the sorted view

    5. View operations

    SYNOPSYS
    vw = view.flatten(subprop, outer=0)
    Produces one 'flat' view from a nested view
    vw = view.join(view, property...,outer=0)
    Both views must have a property (column) of that name and type
    ix = view.find(criteria..., start=0)
    Returns the index of the found row, or -1
    ix = view.search(criteria...)
    Binary search (native view order), returns match or insertion point
    vw = view.unique()
    Returns a new view without duplicate rows (a set)
    vw = view.union(view2)
    Returns a new view which is the set union of view and view2
    vw = view.intersect(view2)
    Returns a new view which is the set intersection of view and view2
    vw = view.different(view2)
    Returns a new view which is the set XOR of view and view2
    vw = view.minus(view2)
    Returns a new view which is (in set terms) view - view.intersect(view2)
    vw = view.rename('oldname', 'newname')
    Returns a derived view with one property renamed
    vw = view.product(view)
    Returns the cartesian product of both views
    vw = view.groupby(property..., 'subname')
    Groups on specified properties, with subviews to hold groups
    vw = view.counts(property..., 'name')
    Groups on specified properties, replacing rest with a count field
    ADDITIONAL DETAILS
    find - view[view.find(firstname='Joe')] is the same as view.select(firstname='Joe')[0] but much faster Subsequent finds use the "start" keyword: view.find(firstname='Joe', start=3)

    6. Rowref objects

    RowRef allows setting and getting of attributes (columns)
    RowRef encapsulates a (view, ndx) tuple.
    Normally obtained from a view: rowref = view[33]

    7. Property objects

    Property has attributes name, id and type. Example: p = Mk4py.Property('I', 'shoesize')
    Note that a property is used to describe a column, but it is NOT the same as a column. That is, in a given storage, the property Property('I', 'shoesize') will be unique, (that is, no matter how many instances you create, they will all have the same property.id). But that one property can describe any number of columns, each one in a different view. This is how joins are done, and why "view.sort(view.firstname)" is the same as "view.sort(Mk4py.Property('S','firstname'))".


    © 2000 Jean-Claude Wippler <jcw@equi4.com>