cyn.in Buildout: Low bandwidth? Try these settings!

June 25, 2009

A big part of being a cyn.in developer is being one with the fine art of building out… your buildout.Geek / Tech warning! This post is best read if you're a software developer interested in working with cyn.in (and / or plone) and are interested in improving your development productivity.Ok, that dire warning out of the way, let's get down to it, shall we? 🙂

I restate my assumptions:

You downloaded the cyn.in source snapshot.
You actually opened the README file and went through it.
Figured out the dependencies and got through the buildout procedure
OR you're starting off from the plone.org community and have got a plone buildout running (yes, this applies to you too!)
You'd like to reduce the bandwidth / time it takes to build a buildout

So why does buildout take so long?

Well because cyn.in depends upon a whole lot of python eggs (and even a few old-style zope products) that are downloaded during buildout. The biggest of these is plone itself, because that depends upon more code than you'll ever end up going through (or at least that's what it looks like when you're starting out).

So how do we optimize our Internet usage?

Well, several ways. Read on:download-cache: This is most important to you if you build a lot of new buildouts from scratch. And when I say from scratch I mean on new operating system installs, or if you're into virtualization and appliance building then this one's for you! You can set this to a path outside your main buildout directory and instead of downloading directly from the Internet, buildout will first consult your download-cache directory to see if the file's already there. So while buildout will still spend time looking on the net for the correct version to download, for each egg, once it does figure out which one, it'll usually find the file already present in your cache. The idea of course is that your downoad-cache directory will be a network share somewhere that's mounted locally on a path, so you always have a pre-primed cache. You should have this setting in your own personal override config, (user.cfg in cyn.in buildout is a great example of this!). This is the minimal override that is usually required for per-buildout case, where you set up your personal settings, like port and network ip to bind to, the effective user to run under and so on. If you keep your download-cache setting in your main buildout.cfg then all developers will be forced to use or override it, which is not a good thing to do.

download-directory and eggs-directory:

Most developers should already be aware of this one. When developing you often have to recreate new buildouts repeatedly on the same computer with different filestates, typically in a multi-developer team where everyone's working on different branches in subversion and so on. For this kind of scenario, buildout provides these 2 settings to speeden things up, some. In buildout, packages can be marked to be on particular versions or they default to the latest (more on this in the newest=false setting). The idea with these 2 settings is that you should set them in your user-account-wide defaults file. Where? In linux, usually in your /home/username/.buildout/default.cfg folder.

/home/username/.buildout/default.cfg

That's:

/home is where your user homes are
/username is your's (just used cd ~ to land there, and then the pwd command will show you where it's at)
.buildout is a folder you create in it
default.cfg is a file that contians your defaults for buildout

In this default.cfg file you should put in these 2 lines:

download-directory = /location/to/where/you/want/all/downloadseggs-directory = /location/to/where/you/want/all/eggs

If you put these settings in the correct home folder location (and then go and actually create the target folder paths - buildout will error if they're not already present!) then buildout saves downloaded files in these locations as a default. The advantage to doing that account-wide for your user is that by default all buildouts will go there and re-use the files already present first. This is best used in combination with the below setting,

newest=false.

Note that the files here are used directly by every buildout made by your account. That's different from the download-cache where they're downloaded from. Why is this important? Well in the very, very, rare case that something in your normal download or egg directory is causing a problem, you can override it in your user.cfg override and fix the problem.newest = falseThis should be present by default in your main buildout.cfg itself. That way when any developer builds the buildout, it will by default only download the dependencies that are missing, if it finds matching dependencies already met, it will not go looking for a better one. This setting when combined with the above user-account default one is like magic, you can re-buildout any number of times after the first one, and do it without any Internet consumption at all!When you actually want buildout to look for better alternatives, then you simply add -n (that's small n, not capital. Capital N does the exact opposite, same as the setting above!) to your buildout command line and it will go hunting for the latest and the best.

So what's your buildout command line looking like now?

First time or force-check for newer dependencies is like this:

./bin/buildout -c user.cfg -n -vvvvv

And “normal” re-buildout is like this:

./bin/buildout -c user.cfg

The -c user.cfg part means that buildout should use user.cfg as configuration. user.cfg extends buildout.cfg, overriding just the personal settings that you want to change from default.-n tells buildout to go hunting for fresher eggs-vvvvv is a way of getting as much verbose log output as possible. More the v character's here the more the verbose the output. You want verbosity when you're doing it the first time - to diagnose error messages correctly, if they happen.In normal re-buildout you don't need these settings because you want the default, newest=false and verbosity is just pointless scroll.Hopefully that should speed up things some, especially if you're struggling. There are some more advanced techniques, of course, like ingeniweb's eggproxy product, which is the perfect way to create a fast, pass-through cached mirror of the PyPI on your LAN - that too only the eggs you need and re-use. But that's out of the scope of this post, perhaps some other time.

More?

If you want to read more about these sort of things then do let me know by adding your comments. If you have other tips to improve buildout efficiency then do put in the reference links or whatever else you can tell us about them, thanks!