In modern-day software development for the web I find that we end up trying many different ways to deploy code. While at work we’re using python as our primary programming language, I’ve enjoyed the node.js philosophy, especially the practice of Small Kernels of Functionality and Loosely Coupled Components.
From the article…
“…why package two modules together if you can simply break them apart into two kernels of functionality which are codependent?””
One of the core sore points for me right now is the existence of “common” libraries in our work. It’s common to have a piece of code that is needed in the current project, but doesn’t particularly belong there. The approach (I often see) is to create said “common” library and deploy that with all of the projects that need the code. The major resistance to putting this in an individual package is probably the overhead of maintaining a separate repository for the individual code, along with the pull/commit/push/tag/release cycle that comes with it to make changes to a potentially developing module. So in the end, we end up with the “common” library.
The problem with is many-fold though:
- dependency chains are not explicit,
- the “common” library grows over time,
- the same library becomes disorganized,
- it’s not clear later on how to break things out because it’s not clear what projects are using what parts of the library,
- the library with all theses different pieces of functionality breaks the rule of single responsibility.
Back to the node.js philosophy, if you’ve ever used npm before, you know that there are tons and tons of modules available for node (as an interesting sidenode, npmjs module counts are growing by 94 modules/day at the time of writing [link]). The recommended approach is to keep modules small, and publish them independently so they can be used explicitly across applications. James Halliday writes about this approach on his blog.
Python has been criticized for having painful package management. At work, we currently use setuptools for installing packages from Github, and it does a pretty decent job. As I’ve written before you can specify
dependency_links in the
setup.py file to pull tarballs from any source control system that will provide them. Like I said, this works pretty well.
I’ve also recently set up a mypi private package index for our work, so we can start moving towards small, reusable python packages. I’ve also looked at djangopypi and djangopypi2, the latter being a bootstrap-converted fork of the former. Both these projects seem to add a little more functionality around users management, and of course they’re built on Django, which means you get the nice Django admin at the same time. I haven’t had time to do a full comparison, that will have to come later. For the time being, mypi seems to do the trick nicely.
Turns out, using pip, you can just specify a custom index in your
~/.pip/pip.conf and then
pip install <packagename> and you’re good to go. That’s fine for installing one-off modules, however, automating the entire depenedency installation process wasn’t obvious at first.
My scenario had 2 projects, Project A and Project B. Project A relies on custom packages in my mypi index, and is published to the package also. Project B has a single dependency on Project A. Using setuptools
python setup.py install would find Project A in the private package index (via
dependency_links), but none of Project A‘s custom index dependencies were being found, despite having specified the
dependency_links in that project.
The answer just turned out to be a little bit more understanding of the evolution of python package management, specifically this little tidbit about pip:
“Internally, pip uses the setuptools package, and the pkg_resources module, which are available from the project, Setuptools.”
Turns out pip spits out the setuptools configuration (whatever you have in your
setup.py) into a
/<project-name>.egg-info/ folder, including
To get the pip equivalent of
python setup.py develop just run:
To get the same for
python setup.py install run:
The super-cool thing about this is that
dependency_links no longer need to be set in the
setup.py files as pip will use the custom index set up in the
I think this solution will solve some of the problem of having all the git/Github overhead involved in releases. With a simple
fab setup, release candidates and formal releases can be incremented and deployed in a way that feels a little more clean and independent of the git workflow, while still maintaining source control. I’m hoping it will promote users to push modules early in a ‘sharable’ way to the private index so they can be easily installed for others. All in all, it feels cleaner to do it this way for me.
Hope that helps someone else down the road. Now we have a nice private registry for our python packages, and an easy way to automate their installation.
Note It appears that djangopypi is actually maintained by Disqus, that may make it a good reason to use the project, as it will probably be maintained for a longer period. I will explore that option and write up a comparison later.