Introducing Content Mirror

A few weeks ago at the New Orleans symposium in Plone, I presented on some software I was writing called Content Mirror, which is an addon product for Plone content serialization to a structured/relational database. Its a tool for doing content data deployments. Its nothing particularly new,  Alan and I have been talking about the deployment story for Plone since 2002. I was building CMFDeployment at the time for pushing out static copies of plone sites for ultra-secure systems. Alan and EnfoldSystems went on to build a data deployment solution in Entransit, for doing data deployments. Both were fairly succesful deployment solutions and are still in use and deployed today by a variety of Plone vendors. But both also had some failings, they required alot of configuration and committment to get working for an existing plone site. For example, Entransit required using the instance layout and additional products needed by EnSimpleStaging. While CMFDeployment has a bewildering array of configuration options. In my opinion, both are specialized consulting ware, ie. their primary deployed successfully by folks doing Plone development full time.

For most of the last year, I’ve primarily been doing Zope3 applications, in relational databases. A large part of the reason why I’ve enjoyed Zope3 so much, is that the impedence mismatch between application development is much less than with Plone. Paul Everitt asked at the first Plone conference in New Orleans if Plone was a Product or Framework. Its a question still heard in the community to this day. But to me the answer is clear, Plone is a product, and frankly thats a good thing for both the software and its users. Its however a bad thing when your building applications, their tends to be much more policy with products, that needs to be replaced or worked around when your building on them. As a result, products tend to have two other downsides in application development, developer inefficiencies and computational ineffiencies.

As an example of a developer inefficiency, one is evident just in starting up and serving a page from Plone. I call it the Plone tax, and over the course of a year its about a man month of work. This becomes more stark in comparison to other web app frameworks ( pylons, django, rails, z3, etc) which startup and serve a page in a few seconds. There’s been alot of work recently to this with plone.reload, optional loading of translations, and some heroic work by hanno, but the fact is that there is a lot of code to load up as well as data from the zodb to startup and serve a page in Plone.

Developer inefficiencies are also evident in the learning curve associated with being productive with a product. A product is typically a much bigger software stack, and Plone has and utilizes many components, from zope2, zope3, cmf, archetypes providing foundations, in addition to a growing number of plone specific infrastructure. Plone is the OpenOffice of opensource content management systems. We could drop in a pylons in a cubby hole of a plone tarball. Smaller systems offer a much better productivity to new developers, by giving them the ability to focus on the problem domain and solution, rather than how to frame the problem domain in terms of product concepts and contexts like Plone.

The real key in a data deployment scenario is to keep all the many and great features of Plone as part of the content management process, but also making that data accessible for use in other applications. By deploying content to an rdbms we get language and framework neutrality to interact with it, as well as access to a widespread number of developers and tools. In a nutshell, its data portability.

As a bonus, when using Plone as a product, and reserving customizations to applications onto of the content of a data deployment, the migration story for a Plone instance also becomes much easier.

In terms of computation inefficiency,  Plone does alot of work, which makes it easier to use as a product, but its also computationaly expensive for content delivery compared to simpler solutions that fit the needs of an application/problem domain. ie. the first rule of optimization, do less work. Replacing Plone as a content delivery mechanism, is a great way to make a system more responsive and vertically scalable, while still allowing a dynamic system.

Plone is a great product, and out of the box its offers ease of through the web customization, installation, and a wide range of functionality. My goal for data deployment with Plone was to make something that would enable reusing Plone as a product, as a content management system, but would allow flexibility in usage of that data. Moreover a tool that was easy to drop into new or existing sites.

Data deployments can also bring new features into a Plone. Its much easier to mine business intelligence and reports out of a relational system. For example getting graphs of content creation broken down by month and type or user or using commercial reporting tools.

So back to introducing content mirror. Its basically a system for doing data deployments, it features.. .

- Out of the Box support for Default Plone Content Types.
- Out of the Box support for all builtin Archetypes Fields (including files,
and references ).
- Support for 3rd Party / Custom Archetypes Content Types in one line of configuration.
- Supports Capturing Containment and Workflow in the serialized database.
- Completely Automated Mirroring, zero configuration required beyond installation.
- Easy customization via the Zope Component Architecture
- Elegant and Simple Design, less than 600 lines of code, extensive doctests
- Support for Plone 2.5, 3.0, and 3.1
- Opensource ( GPLv3 )

installation docs
http://code.google.com/p/contentmirror/wiki/Installation

technical introduction / readme
http://code.google.com/p/contentmirror/wiki/Introduction

in a nutshell its technical architecture, is an event observers with aggregation by object on txn boundaries, using an operation pattern for serialization actions, along with schema transformation of archetypes to relational databases tables, using a sqlalchemy runtime generated orm layer. the technical introduction goes into more details.

About these ads

11 Comments

Filed under plone, sqlalchemy, zope

11 responses to “Introducing Content Mirror

  1. Hi Kapil,

    This sounds like really great work! I had a look at the code, and it’s very interesting indeed.

    I wonder why you chose to put it on Google Code rather than one of the Plone repositories, though. It would be a lot easier to work with if it lived in one of the Plone repositories.

    The two other things that come to mind:

    – How tied is it to Archetypes? Non-Archetypes content and other data (e.g. stored in bespoke data structures) ought to be configurable

    – Would it be feasible to support a TTW GUI for turning serialisation of given types on or off? I think that if we have serialisers for Archetypes schemata and zope.schema schemata provided by non-Archetypes objects, for workflow state, containment, and possibly a few other bits of “standard” information, could we be able to make it possible to turn serialisation on or off via a GUI rather than a configuration file?

    The static/simple deployment story is one that’s been lagging for a while. I’ve tried to get my head around Entransit a few times, but I must admit it scares me a bit, mainly due to the dependencies and configuration overhead. Content mirror sounds like exactly what we need, and I really hope it or something like it can be adopted as a standard part of Plone, or at least promoted as “best practice”.

    I think for that to happen, we need to have at least one standard front end that’s easy to deploy and skin, probably written in PHP. I’m hoping that wouldn’t be too hard, even. :)

    Cheers,
    Martin

  2. hi martin,

    its trivial to write support for z3 schema based content, alchemist already has support for transformation of such, as evidenced by more other systems on top of it, such as z3c.dobbin. but frankly all such layers we’re building on top of plone, ( like dexterity or devilsticks) are just part of the problem which content mirror is designed to reduce, namely reducing the inane layering, and deep software stack Plone uses and pushes on developers, in favor of simpler frameworks. underneath the hoods its all pretty nicely factored out. for example, i use the same event aggregation and operation factory for indexing in ore.xapian for example, and its trivial to switch out to async processing of batches, with processing by the cloud.

    frankly its not even that tied to plone, architecturally speaking. The entire thing was developed outside of plone, mocking whatever plone bits it uses.

    things like the plone collective to me are code ghettos and graveyards, welcome to the jungle playing in the background. i’ve used google code for a few projects (getpaid, alchemist, and bungeni) and its more than sufficed, as well as offering the right amount of support infrastructure for doing documentation and issue tracking. plone.org is painful to use in comparison, ‘nough said.

    i care not for ttw configuration, the system keeps no persistent state, which makes it easy to setup and upgrade. its possible should demand be strong for async batch operation that i’ll look into it, but persistent state for configuration means, quoting tres seaver, always having to say your sorry. indeed imo, its the relentless pursuit of ttw configuration that makes most z3 plone software useless for reuse outside of plone.

    data deployments are targeted towards dynamic sites, static site deployments have a different set of domain concerns, and outside of configuration fun, cmfdeployment serves those needs well from plone 1.0 to 3.1.

    entransit differs greatly for a number of reasons, entransit is designed as intermediate middleware between a plone system and a database, meaning its at least twice as much work to configure, additionally it currently depends on laying out your plone site in a specific instance configuration, using the ensimplestaging product. as i said both cmfdeployment and entransit are comparatively, consulting ware. contentmirror is much simpler, get a db, mod a config file, and go.

    as for front ends, python, php, or ruby, the possibilities are all there. personally i was favoring appengine.

    cheers,

    kapil

  3. Hi Kapil,

    If you’ want to run this as a project, the same way Chris Johnson has run GetPaid, then it makes sense to use a separate hosting infrastructure. I’m just worried that this isn’t traditionally your focus, which means that if there isn’t a CJ for this project, it may become another “oh, and there’s that thing Kapil wrote, which looks interesting, but I haven’t used it” kind of thing that people keep making passing remarks to.

    Don’t take that as a criticism. I’m saying it because I have a feeling this particular product could be more important to Plone’s future than many things we’ve seen lately. For things to be part of Plone’s future, though, the community needs to have some degree of adoption and sense of ownership, no matter how good the code. Even something as trivial as the need to ask for a new svn account and remember a different repository URL is a barrier that will keep some people from making the first step to contributing to this thing. We’re all horribly lazy. :-)

    As for TTW configuration, I think that will be needed if this is to have general appeal beyond developers. Perhaps it isn’t suited for that, but if there were a simple default front end that was generally useful, then perhaps. In any case, I agree that such front end ought to live in a separate package. I just hope that it’s architecturally possible to configure this thing with some local components rather than having everything be global.

    Martin

  4. frankly martin, i think your making quite a few assumptions, about my intent, and the inability of plone developers to type a different url into the command line, esp. considering that most of them already do.

    as it is, content mirror is already being used, already documented, has 100% unit test coverage, and setting up a wiki, issue tracker, mailing list, repository, etc. took me 5 minutes. which means i had a lot more time to actually work on the software. Doing the equivalent with plone.org infrastructure is so painful, that issue followups and documentations would never have gotten done.

    re ttw configuration, i give integrators credit for being able to operate a command line. i think expecting ttw configuration while setting up a relational database, is a bit silly ;-)

    that said i’d be willing to embrace a stateless ttw setup if it helps end users install things. but i still think that using local components, in a drive for ttw customization, is a driving force that makes plone harder to use or reconfigure, as it accumulates piles of opaque persistent configuration state and policy. have you ever tried to upgrade a plone 2.5 site using local components… or seen a non-developer do it successfully ? anyways that’s a different topic.

    in fact the main reason that people haven’t heard about most of the ‘interesting’ things i’ve done is, that i haven’t publicized them, which is exactly what i’m doing in this blog post.

    i’d also like to give plone developers a bit more credit, if the software is useful, well tested, well documented, i seriously doubt a different repo url is going to be the biggest issue with adoption or contributions.

  5. Hi Kapil,

    Sorry, didn’t mean to be presumptuous. It’s not a big issue anyway, so let’s leave it. I agree that the plone.org documentation and issue tracker thing is a bit painful currently due to speed. It’s mostly the svn repository I’d hoped to be a bit more accessible, but who cares. If I want to contribute, I hope you’ll give me access.

    I’m also really glad to see the amount of test coverage and documentation here, which is all the more reason why I believe this could be a “next big thing” for Plone.

    /me returns to reading the ContentMirror source code

    Martin

  6. hi martin,

    if you want access all you have to do is send me an email/im with your preferred google account.

    cheers,

    kapil

  7. Hi,
    I’ve started to play around and read the code.

    One thing I’ve noticed is that copy/paste does yield
    in a traceback:

    2008-07-03 21:49:55 ERROR Zope.SiteErrorLog http://localhost:8080/plone/folder_paste
    Traceback (innermost last):
    Module ZPublisher.Publish, line 125, in publish
    Module Zope2.App.startup, line 238, in commit
    Module transaction._manager, line 96, in commit
    Module transaction._transaction, line 389, in commit
    Module transaction._transaction, line 445, in _callBeforeCommitHooks
    Module ore.contentmirror.operation, line 130, in flush
    Module ore.contentmirror.operation, line 44, in process
    Module ore.contentmirror.serializer, line 34, in add
    Module ore.contentmirror.serializer, line 54, in _copy
    Module ore.contentmirror.transform, line 52, in copy
    Module ore.contentmirror.transform, line 276, in copy
    Module Products.Archetypes.ClassGen, line 56, in generatedAccessor
    Module Products.Archetypes.Field, line 1674, in get
    Module Products.Archetypes.Referenceable, line 81, in getRefs
    Module Products.CMFCore.utils, line 123, in getToolByName
    AttributeError: reference_catalog

    I’m using 0.4.1 and sqlite3, Plone 3.1.2, Zope 2.10.5 on Leopard.

    Other than that it works like a charm!

    cheers,
    seletz

  8. hi seletz,

    a blog isn’t really the right forum for bug reports, there’s a mailing list and issue tracker on the linked google code site. glad to hear its otherwise working for you.

  9. hi seletz, thanks for filing an issue for the copy/paste problem, its been fixed ( http://code.google.com/p/contentmirror/issues/detail?id=11 )

  10. Kapil-

    If you want “average” Plone integrators to find Content Mirror, you might consider creating a pointer at http://plone.org/products so that people searching Plone.org have some hope of stumbling across it.

    Content Mirror looks like the kind of radical simplification that is going to make a lot of neat stuff a LOT easier! Thanks.

  11. Anne Bowtell

    I set this up last night on a windows pc and I was really impressed at how easy it was to configure – so speaking as an average integrator I don’t think it is necessary to have TTW configuration. I think it is excellent. Fantastic. Thanks!

    I’ll be testing it out further I’m sure. I’d love a bit of documentation at some point as to how to make a custom field transform…..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s