Viewing Large Images – OpenLayers, GSIV, ModestMaps, DeepZoom, and Python

Lately I’ve been experimenting with displaying very large images on the internet via a web browser, with pan and zoom functionality. The guts of this functionality are the same regardless of implementation. On the server, a tile cutter processes a large image, and constructs an image pyramid. The image pyramid is a hierarchical structure composed of n levels of the same image at different resolutions. Starting with the bottom level as the original image, each successive level reduces the image size by half, and the process is repeated log_2( max( width, height)) times until finally an image of only 1 pixel (average of entire image) is generated as the top of the pyramid. Each level’s image is split into a set of fixed size tiles. A web browser client implementation ( flash, ajax, etc) constructs  a zoom interface, that responds to zoom in events by moving the viewport progressively further down the pyramid, showing tile images of the larger resolution to give the effect of zooming into an image. A nice illustrated write up of the concept can be found here. I’ve probably made it sound more complicated than it really is.

The initial implementation I was working with utilized OpenLayers, which implements a client for accessing OpenGIS Web Feature Servers (WFS) and Web Mapping Servers (WMS). Unfortunately the size of the library seems to be constantly increasing (~200K in the last year) and currently weighs in at 560K uncompressed, and requires a special implementation to serve up the tile images, ie. a WMS Compliant system, in this case TileCache. For scaling and efficiency purposes, I’d much prefer to directly serve these images off a CDN, disk (nginx), or via varnish and bypass any application code. Additionally the sheer size of the OpenLayers code was unwieldy for the integrations requirements I had, which did not include any GIS functionality.

Surveying the land for other non-commercial image viewers, turned up a few of interest. GSIV ( Giant Scalable Image Viewer), was a fairly basic but capable javascript based viewer, that fit my requirements bill, small size at 26Kb uncompressed, and focused on pan and zoom functionality (demo). However as a project it appears to be abandoned, and hasn’t been touched in two years, although several patches have been submitted which retrofit the implementation using jquery are extant.

I came across ModestMaps next, which is a flash (2 & 3 ) based implementation, with a small size (demo). One nice feature of modest maps, is that it performs interpolation between successive levels giving a smooth zoom experience to an end user unlike somewhat jerky experience that GSIV produced. Unfortunately being flash based meant a whole different chain of development tools. I looked around at what was available for an opensource flash compiler toolchain and found MTASC (Motion Twin Action Script Compiler ) and Haxe. In the end i decided against it, partly due to its GIS focus, and the customization/ maintenance cost for developing on propretiary platform (Flash). Despite that, i think its the best of the opensource viewer implementations if your already have/use an adobe flash development stack.

I was set on using GSIV, and then i came across a blurb on ajaxian about Seadragon Ajax and Deep Zoom from Microsoft’s Live Labs. Microsoft’s done some impressive work with image manipulation over the last few years. The PhotoSynth TED talk is one of the most impressive technology demos i’ve seen to date. Deep Zoom is a SilverLight technology ( more propretiary platform lockin), that allows for multiscale image zooming with smooth zooming. The Seadragon Ajax is a javascript implementation of the same functionality in a 154k library ( 20k minimized and gzipped). It fit the bill perfectly, standards (javascript) based, image zoom and pan, with a great user experience. One problem unlike all the other tools mentioned here, which have python based tile cutting implementations, Deep Zoom was utilizing a Windows only based program to process images and cut tiles. I had a couple of hundred gigabytes of images to cut, and not a windows system in sight. But based on this excellent blog write up by Daniel Gasienica, I constructed a python program using PIL that can be used as a command line tool or library for constructing Deep Zoom Compatible image pyramids. It can be found here, hopefully its useful to others. As a bonus, it runs in a fraction of the memory (1/6 by my measurements) needed by the GSIV image tile cutter and faster as well ( 100 images in 5m vs 1.25hr). Unfortunately the Seadragon Ajax Library is not opensource, but non commercial usage seems to be okay with the license, and i’ll give it over to some lawyers to figure it out.

To process the several hundred gigabytes of images, i utilized this library and wrote a batch driver utilizing  pyprocessing remote queues, a small zc.buildout and cloudcontrol to process the images across a cluster, but thats left as an implementation detail for the reader :-)

Python Deep Zoom TileCutter

17 Comments

Filed under python

Zipped Eggs on AppEngine

guido posted a work around to the issue of making zipped eggs work on appengine. Cool! forward z3 on gae :-)

http://code.google.com/p/googleappengine/issues/detail?id=161

6 Comments

Filed under cloud, python, zope

Introducing Content Mirror

A few weeks ago at the New Orleans symposium in Plone, I presented on some software I was writing called Content Mirror, which is an addon product for Plone content serialization to a structured/relational database. Its a tool for doing content data deployments. Its nothing particularly new,  Alan and I have been talking about the deployment story for Plone since 2002. I was building CMFDeployment at the time for pushing out static copies of plone sites for ultra-secure systems. Alan and EnfoldSystems went on to build a data deployment solution in Entransit, for doing data deployments. Both were fairly succesful deployment solutions and are still in use and deployed today by a variety of Plone vendors. But both also had some failings, they required alot of configuration and committment to get working for an existing plone site. For example, Entransit required using the instance layout and additional products needed by EnSimpleStaging. While CMFDeployment has a bewildering array of configuration options. In my opinion, both are specialized consulting ware, ie. their primary deployed successfully by folks doing Plone development full time.

For most of the last year, I’ve primarily been doing Zope3 applications, in relational databases. A large part of the reason why I’ve enjoyed Zope3 so much, is that the impedence mismatch between application development is much less than with Plone. Paul Everitt asked at the first Plone conference in New Orleans if Plone was a Product or Framework. Its a question still heard in the community to this day. But to me the answer is clear, Plone is a product, and frankly thats a good thing for both the software and its users. Its however a bad thing when your building applications, their tends to be much more policy with products, that needs to be replaced or worked around when your building on them. As a result, products tend to have two other downsides in application development, developer inefficiencies and computational ineffiencies.

As an example of a developer inefficiency, one is evident just in starting up and serving a page from Plone. I call it the Plone tax, and over the course of a year its about a man month of work. This becomes more stark in comparison to other web app frameworks ( pylons, django, rails, z3, etc) which startup and serve a page in a few seconds. There’s been alot of work recently to this with plone.reload, optional loading of translations, and some heroic work by hanno, but the fact is that there is a lot of code to load up as well as data from the zodb to startup and serve a page in Plone.

Developer inefficiencies are also evident in the learning curve associated with being productive with a product. A product is typically a much bigger software stack, and Plone has and utilizes many components, from zope2, zope3, cmf, archetypes providing foundations, in addition to a growing number of plone specific infrastructure. Plone is the OpenOffice of opensource content management systems. We could drop in a pylons in a cubby hole of a plone tarball. Smaller systems offer a much better productivity to new developers, by giving them the ability to focus on the problem domain and solution, rather than how to frame the problem domain in terms of product concepts and contexts like Plone.

The real key in a data deployment scenario is to keep all the many and great features of Plone as part of the content management process, but also making that data accessible for use in other applications. By deploying content to an rdbms we get language and framework neutrality to interact with it, as well as access to a widespread number of developers and tools. In a nutshell, its data portability.

As a bonus, when using Plone as a product, and reserving customizations to applications onto of the content of a data deployment, the migration story for a Plone instance also becomes much easier.

In terms of computation inefficiency,  Plone does alot of work, which makes it easier to use as a product, but its also computationaly expensive for content delivery compared to simpler solutions that fit the needs of an application/problem domain. ie. the first rule of optimization, do less work. Replacing Plone as a content delivery mechanism, is a great way to make a system more responsive and vertically scalable, while still allowing a dynamic system.

Plone is a great product, and out of the box its offers ease of through the web customization, installation, and a wide range of functionality. My goal for data deployment with Plone was to make something that would enable reusing Plone as a product, as a content management system, but would allow flexibility in usage of that data. Moreover a tool that was easy to drop into new or existing sites.

Data deployments can also bring new features into a Plone. Its much easier to mine business intelligence and reports out of a relational system. For example getting graphs of content creation broken down by month and type or user or using commercial reporting tools.

So back to introducing content mirror. Its basically a system for doing data deployments, it features.. .

- Out of the Box support for Default Plone Content Types.
- Out of the Box support for all builtin Archetypes Fields (including files,
and references ).
- Support for 3rd Party / Custom Archetypes Content Types in one line of configuration.
- Supports Capturing Containment and Workflow in the serialized database.
- Completely Automated Mirroring, zero configuration required beyond installation.
- Easy customization via the Zope Component Architecture
- Elegant and Simple Design, less than 600 lines of code, extensive doctests
- Support for Plone 2.5, 3.0, and 3.1
- Opensource ( GPLv3 )

installation docs
http://code.google.com/p/contentmirror/wiki/Installation

technical introduction / readme
http://code.google.com/p/contentmirror/wiki/Introduction

in a nutshell its technical architecture, is an event observers with aggregation by object on txn boundaries, using an operation pattern for serialization actions, along with schema transformation of archetypes to relational databases tables, using a sqlalchemy runtime generated orm layer. the technical introduction goes into more details.

11 Comments

Filed under plone, sqlalchemy, zope

Zipped Packages on App Engine :-(

as part of an ongoing project to get zope3 running well on google app engine, i worked on loading python code from zipped egg files. I started by following guido’s hint in the corresponding appengine issue. As google appengine doesn’t contain the zipimport builtin extension, i relied on using the python svn sandbox code that implements imports in pure python, and uses the zipfile module on top of that to provide a zip import facility. I checked out the import in py code, and added the following to get it working in google app engine dev server.

import os.path, imp, types
from zipimport_ import zipimport

imp.new_module = types.ModuleType
imp.PY_SOURCE = 1
imp.PY_COMPILED = 2

def load_zipegg( egg_path ):
    egg_file = os.path.basename( egg_path )
    egg_name = egg_file.split('-',1)[0]
    head = egg_name.split('.')[0]
    importer = zipimport.zipimporter( egg_path )
    # we load the head, the importer loads any contained modules
    importer.load_module( head )

unfortunately while it works ok on the dev server, it doesn't work on appengine :-(, due to a non implementation of marshall.dumps. i've appended a comment to that effect on the issue.

in a nutshell you hand it hand it the path to an egg, and it loads the egg code. Its not 100% perfect, it has some issues with minor namespace'd eggs (stomps  on __path__, extraneous sys.modules entry created in subpackages). Hopefully its useful to those wanting to build larger applications or utilize frameworks other than the builtin django.

It does add considerably to the startup time for an app, which went from .3-.5s  to .9-1s. after this initial load that particular app server instance has a cached sys.modules to work with, and startup time is neglible.

The porting of zope3 to appengine, still needs some support for zip contained resources, such as configuration, presentation templates, and browser resources. Even just zipping the packages without those, gets a simple zope3 application down to ~350 files.

Currently i'm just zipping individual directory eggs, with some filtering for pyc, so, text, and test files. Load time would likely go considerably faster if addons were packaged in a single egg.

4 Comments

Filed under cloud, python, zope

Zope3 on App Engine – Redux

I published a minimal zope3 app on app engine, along the lines of what of my post from yesterday. you can check it out at http://zope3.appspot.com

its basically a minimal zope3 application, using a custom publication, and bootstraping some components via zcml.

also a simple test runner for verifying packages on the google app engine
http://zope3.appspot.com/tests

the minimal egg working set used by the app.

zope.deprecation-3.4.0-py2.5.egg
zope.publisher-3.5.2-py2.5.egg
appengine_monkey-0.1dev_r28-py2.5.egg
zope.dottedname-3.4.2-py2.5.egg
zope.schema-3.4.0-py2.5.egg
zope.event-3.4.0-py2.5.egg
zope.tal-3.4.1-py2.5.egg
zope.exceptions-3.5.2-py2.5.egg
zope.tales-3.4.0-py2.5.egg
zope.i18n-3.4.0-py2.5.egg
zope.testing-3.5.1-py2.5.egg
transaction-1.0a1-py2.5.egg
zope.i18nmessageid-3.4.3-py2.5-macosx-10.5-i386.egg
zope.thread-3.4-py2.5.egg
zope.component-3.4.0-py2.5.egg
zope.interface-3.4.1-py2.5-macosx-10.5-i386.egg
zope.traversing-3.5.0a3-py2.5.egg
zope.configuration-3.4.0-py2.5.egg
zope.location-3.4.0-py2.5.egg
zope.deferredimport-3.4.0-py2.5.egg
zope.pagetemplate-3.4.0-py2.5.egg

several modifications to the eggs were nesc to remove security.proxy references, and remove some BBB/deprecated code.

This set of eggs, just fits in under the 1000 file limit ( after manually removing extra locales in zope.i18n and pytz, its about 980 ). we could slim the eggs down a bit, ditch the docs and the tests for a minimal egg, and maybe have a ceiling of 200-300 to play with. but clearly for most zope/grok applications, which will be using a quite a bit more eggs, zip based imports are probably the only realistic option.

Loading zope is about a half-second, but once its initialized, subsequent executions run fast, from a timer around the import statements,

Initial Request      Zope Load Time -0.588495016098
Subsequent Request   Zope Load Time -0.000102996826172

in terms of exploring zip based imports, the suggestion guido pointed out in GAE issue 161, was using something like importlib/zipimport to manually load zip archives/eggs, for another day.

Leave a comment

Filed under cloud, python, zope

Zope3 on Google App Engine

a few weeks ago, i did some exploration, of getting zope3 up and running on app engine, with some discussion in a grok thread. there’s been some interest in the topic, so i wrote it up for a wider audience.

there’s a number of issues with getting zope3 up and running.

- no c extensions ( no proxy, speedups, persistent, etc)
- 1000k file limit
- restricted python language

C Extensions

the lack of the C extensions does away with a large portion of zope3 from being able to run without modification. this is immediately visible upon trying to import basic zope components ( such as zope.component) the zope.deferredmodule is used throughout the codebase, to speed startup time, and breaks as its implementation uses the proxy c extension.

i went ahead and replaced the usage there with a pure python implementation. it passes all but one of the unit tests. that test in particular uses an isinstance check of the proxy against the types.ModuleType, and is the reason the c extension is required. i went ahead and tested the implementation with my existing zope3 development instances (all rdb storage, no zodb present), and this particular check was never an issue. the modified zope.deferredmodule egg and source diff against trunk are here.

the removal of other extensions, causes numerous other packages to not be useable. a rule of thumb i found in executing unit tests so far on app engine is that zope.* packages work ok (zope.schema, zope.interface,, zope.configuration, zope.pagetemplate), but zope.app.* tend to have some use of security proxies, location proxies, or persistence, any of which cause an import error.
functionally, it isn’t a typical zope environment in any sense, its a collection as an application using zope egg components. there isn’t any zodb, but thats not really an issue for most of the zope core components. potentially though existing components could be used with some sort of modification to use null/dummy implementations as was done for zope.deferredimport.

larger legacy frameworks like zope2 and plone are two intimately wedded with implementation choices of c extensions, that are incompatible with appengine to work without major rewriting efforts.

1000K file limit

Google app engine maintains a hard limit on the number of files in an application. See Issue 161 for details ( to vote for it click the star ).

Zope quickly can run into this file limit, checking for a file count in a zope3 app’s buildout eggs directory

  $ find . -type f  | wc -l
  4980

across this many eggs

  $ ls -al | wc-l
  139

crucial to getting zope3 as an appserver on gae, will running well be running on zipped eggs, to minimize the files resources we need. in order to do that there are a couple of facilities that need support opening files via pkg_resources to support zipped eggs in the core. the zcml include directive, so that we can load and register components via zcml and do bootstrap on the system. and new view/page directives that allow for page template file to be loaded from resources ( not zope.app.pagetemplate which introduce security proxies during traversal). things like browser resources are best left served as static files from the app engine environment.

Deployment

Ian did a nice write up and introduction of getting pylons working on gae, using virtual env. the same directions work well in setting up a zope3 wsgi app.

Startup Time

this is a bit speculative in terms of application to gae, but typical startup time for a zope3 app to be initialized is around 3.5-4s (on my laptop 2.16ghz core duo) with some trimming of excess zcml. The notion is that app caching will allow us to initialize a python cgi process for multiple requests at startup and rely on the import caching and global registry at a module scope, via defining a main function entry point for request processing separate from initialization. http://code.google.com/appengine/docs/python/appcaching.html

Status

You can do simple publisher applications and use the component architecture (without c extension optimizations), and simple page template files.

3 Comments

Filed under cloud, python, zope

ore.xapian – indexing and searching in zope3

i released the ore.xapian package to pypi a few weeks back, and after a few iterations i’ve got in production on few small applications, its a thin layer on top of xappy to give an indexing framework for zope3 based applications.

its pretty xapian agnostic.. its designed as an async indexing framework, with abstractions for content indexers, content storage/ resolution, transactional flush into the indexing queue, manages reopening search connections, etc.

the pypi page goes into a bit more detail (doctest style).
http://pypi.python.org/pypi/ore.xapian

i’m using it succesfully to index content from relational databases and subversion with a zope3 front end. only real todo is to make the index queue persistent for remote indexers, but to be useful that
would need corresponding support for remote search connections in xappy. unfortunately i don’t have the bandwidth for the latter atm, but the details are here.

http://groups.google.com/group/xappy-discuss/browse_thread/thread/7ae9fb8d212529b2

Leave a comment

Filed under python, zope