Greenfield Research vs. Greenfield Development

greenfield

So-called "greenfield" development is the holy grail of programming projects. Everyone wants to be able to work on a brand new project, to start fresh with a clean slate. You choose the language and libraries. You can use the latest and greatest tools. You control the whole architecture. No legacy code to get in your way. No dependency hell from a bloated code base. You can start with good practices from the ground up. Yes, this time you're going to do it right!

Starting fresh with a new project is a great feeling, and a great way to try out different architectures and get a broad set of experiences. Some web dev shops get to do this all the time. They develop a new website for a client, take it to the first release, then hand it over to the client for maintenance. They have the constant opportunity to start over and try again, working in different domains, and interacting with different technologies. This is a very attractive proposition! After all, who likes being pigeonholed?

This contrasts starkly with research projects, in my present experience. Getting a research project started is a lot of hard work. You have to work for months to just set up the infrastructure that will support your experiments. Those are months in which you aren't working directly toward a publication, which also means you aren't advancing your career. Sometimes you learn along the way, which is a big plus, but in many cases you are just putting in frustrating hours of work as a prerequisite.

Robotics is particularly severe in this case. Let's say you want to test a single algorithm; navigating with moving obstacles, for instance. This requires: a fully functional robot, software to control the robot (which usually needs to be very precise), working odometry, well-calibrated vision sensors, having those sensors integrated into a perception pipeline, communication between multiple modules, and an interface to control it all. A huge amount of development goes into enabling just one experiment.

But then, once all of that is up and running, and you can maintain it, oh what sweet joy! Now all you have to do is tweak your system. Want to try a new idea and run a brand new experiment? All you need to do is swap out the navigation algorithm. Want to enter a brand new field of study in the perception area? You can do that too! Just keep the standard navigation module, and swap out the perception module instead.

The time you invest initially in a research project is compounded interest over time, and once you gain that momentum you're really rolling. But it's a lot of work to get there initially. So a surprising insight I've had recently is that brownfield development is actually much more attractive in research than greenfield.

Who would've thought?

Nanodegrees

data-analyst-nanodegree I was very skeptical at first, but now I think Udacity might really be on to something with their new "Nanodegrees". Check out the Data Analyst Nanodegree, for instance. I mean, I'm no hiring manager for data analysts, but if I were, I'd definitely be willing to pick up a person who had completed all these courses:

  • Intro to Data Science
  • Data Wrangling with MongoDB
  • Data Analysis with R
  • Intro to Machine Learning
  • Data Visualization

Now, there are some caveats to my hire. Mainly, I'd be willing to hire such a person because people with the above experience are highly improbable unicorns for whom the demand appeared practically overnight and now pretty much every single business wants one. For someone like that, you're much more willing to hire a go-getter with little experience but eagerness to learn. Further, I'd be a lot more attracted to someone who was already employed as a programmer, and picked up this degree on the side. And I would definitely, absolutely only hire this person if there was an expert data scientist to oversee and mentor them. I'd be very reluctant to hire a "data analyst" based on some 400-hour program where they learned how to run some basic statistical models but have no idea how to analyze the variance on their predictions or identify a when a biased sample is corrupting their results (I don't know how to do that yet so I'd be very reluctant to hire myself as a data scientist).1

hansel

Hansel is hot right now. Could there be a Hansel Nanodegree?

So maybe this model only works for professions that are hot right now. But if it can prove itself beyond that, we could be looking at a return to vocational and trade certifications in the United States.

Many professions have experienced extreme academic inflation over the past few decades, like nursing, which in my parents' generation was a 2-year part-time program and is now a 4-year full-time bachelor's degree at the least. In recent years, there have been an explosion of master's degrees, from new inventions like the OMSCS and more MBAs than you can count on your fingers, to professional masters like UBC's MSS and U of T's MScAC, to minute specializations like Cybersecurity, Data Warehousing & Business Intelligence, and yes, Data Science. (I just googled for it and there's a whole website devoted to the concept of MS in Data Science! Geez!)

At most U.S. institutions, these types of programs now cost $30,000–$40,000 per year!! I've never seen a bubble that was so ready to burst. This inflation needs to be reined in, and the entire post-secondary education system needs to be split into smaller, more manageable chunks. Maybe these nanodegrees are just the thing. Let's just hope they stay at a reasonable price.2


  1. If you're considering hiring me for a data scientist position, please ignore this statement.
  2. The price is already rather unreasonable in many countries ($2400 per year), but at least in the U.S. it's competitive.

OpenCV in a Virtualenv

In an earlier post I outlined how to get set up for computer vision in Python. There, I skipped over one important component: installing OpenCV.

Partly, I've separated this to its own post because it's large enough to be a topic of its own. But mainly it's because you can actually get quite far without ever needing OpenCV. However, as I found out this weekend, if you want to do any work with video, you will pretty much be forced to use OpenCV.1 OpenCV makes it really easy to both extract individual frames of the video and draw visualizations on top of them.

Installing OpenCV is highly system-dependent, so here I will focus on OS X (as usual). The official documentation covers Windows and Linux well enough, anyway.

Two caveats here:

  1. You must install NumPy globally in order to install OpenCV with Homebrew.
  2. It's 2014 and OpenCV still doesn't work with Python 3!!2

Homebrew may warn you that you need NumPy installed first. Unfortunately you will be forced to install NumPy globally. Most scientists who hack together vision systems probably don't give a hoot about this, but I like to keep my system clean with virtualenvs. It's not a big deal though, because admittedly it would be useful to have NumPy available on the global level. Plus, I can always uninstall NumPy later; we just need it for the duration of the build process.

I tried installing while under a virtualenv, but for some reason the build did not create the necessary shared object (.so) files in my site-packages. So, we'll install both NumPy and OpenCV globally, and then copy cv to wherever we need it. Unfortunately we'll have to do this every time we create a new virtualenv for OpenCV. :-/ 3

So, make sure you are not inside a virtualenv, then issue the following commands:

# You should use Homebrew's Python if you're not already:
$ brew install python
$ pip install numpy
$ brew install opencv

# Or, you might want to include some of the optional items such as:
$ brew install --with-eigen --with-ffmpeg --with-openni opencv

# Now let's copy the cv files to our virtualenv:
$ cp /usr/local/lib/python2.7/site-packages/cv* <path-to-venv>/lib/python2.7/site-packages

Let's see if it works:

$ workon <venv>
$ python
>>> import cv2
>>> print cv2.__version__

If that didn't elicit any errors, then you're golden!


  1. Your other two options are using ffmpeg by manually calling out to the command line, or to do the video parsing yourself (which is actually pretty easy if it's an .avi file). There used to be python bindings for ffmpeg, but those are now defunct.
  2. But it's coming in version 3.0—which at the time of writing is already a month and a half late.
  3. Someone on StackOverflow told me that they had no problems quarantining OpenCV to a virtualenv. So maybe you should try it yourself and see if it works?

Eclipse with PyDev and Virtualenv

These are instructions for someone who may have already dabbled with some Python programming and is now looking for a more professional setup for productive development. I'll get you started with Python package management and IDE configuration. Justification first; skip to the procedure if you're already sold.

Why PyDev

If you don't already have a favorite development environment for Python, I highly recommend using PyDev. A lot of people are still in the dark ages, using things like IDLE. Frankly, this is an outrage. If you are one of these people, please install PyDev.

Just the use of the Eclipse editor alone will make for a much nicer programming experience. I get mad when I'm working outside of a proper editor (vim and emacs are not proper editors, and neither is IDLE). Managing your application launch configurations is another convenience that seems so minor you don't appreciate how useful it really is. But most of all, the biggest win in using PyDev is the debugger. The debugger is absolutely invaluable and if you haven't been using it, you are de facto terrible at debugging. Sorry to break it to you.

So please do take the time to set up a proper IDE. The only one better than PyDev is PyCharm. The only reason I don't use PyCharm is it costs money (until now!).1 Another possible alternative (if you do one-off, experimental scripts for science or research) is the IPython Notebook. I have no experience with either of these so I can't talk too much about them.

Why Virtualenv

You should also take the time to properly quarantine the dependencies for different projects. Chances are, if you've been using Python already then you're already familiar with the pip package manager. You may or may not be using virtualenv, however.

Here's the short version: pip lets you install packages (Python libraries). Usually, you do not want to install packages globally, for the entire system, because they may conflict with each other. Instead, you want each Python project you create to have its own isolated ecosystem. Virtualenv provides this isolation. Virtualenvwrapper makes virtualenv nicer to use.

Even if you're not worried about conflicts, virtualenv can help you make sure your demo still works years from now (especially important if you care about reproducible research). The fact is that libraries aren't always perfectly backward-compatible or bug-free. You may upgrade a package and find that it breaks your project. The more time passes since you last ran a piece of code, the more likely it is to be broken. Using a virtualenv to freeze the dependencies is a safeguard against this problem.

For a more detailed introduction to these tools, I found this blog post useful.

The Procedure

  1. First, install pip. The best way is with the get-pip.py script from the instructions provided here. If you use Homebrew on OS X, it might even come already installed—I'm not sure—you can use $ which pip on the command line to check (if you get no output, it's not installed).

  2. Install virtualenv and virtualenvwrapper in one go. It's as easy as:
    $ sudo pip install virtualenvwrapper

    See here for more details.

  3. Install Eclipse (any version—I recommend Eclipse IDE for C/C++ Developers or Eclipse IDE for Java Developers). This is straightforward, unless you're on Linux, in which case it's stupid retarded.

    Linux users: If you install through a package manager, you'll probably get a version that's way too old. You can simply download the binary, but then it doesn't get properly installed on your system. If you're on Ubuntu, you can fix this by following the instructions here or by using this handy little script.

  4. Now install PyDev from this Eclipse Update site: http://pydev.org/updates. More detailed instructions can be found here.

  5. Now you need to configure PyDev to point to your new virtualenv. This is done by adding an interpreter under Preferences... > PyDev > Interpreters > Python Interpreter. You should also set up interpreters for your base installation of Python. This can be done automatically using the Auto-Config buttons. To add an interpreter for your virtualenv, you will instead need to click the New... button and Browse... for the Python executable. Under a typical setup, the location would be ~/.virtualenvs/<venv-name>/bin/python. In both cases, the appropriate libraries should be selected automatically, so leave them as they are.

    OS X Users: If you follow those instructions you'll get a big, fat warning message, like this:

    pydev-lib-error

    In my experience, it runs fine anyway. However, the in-editor parsing will be missing all your system libraries, so it will show you errors where in reality there are none. To fix this, you should select all libraries when you set up the interpreter:

    pydev-libs-correct

    The only problem with this is that I'm not sure how that affects your PYTHONPATH at runtime. If you have some libraries installed globally that conflict with the ones in your virtualenv, you may run into problems. So far I haven't had any issues. Let me know if you have more info on this.

  6. After setting up your interpreters, you should see something like this:

    pydev-interpreters

  7. If you already have a PyDev project you can now configure the project to use this interpreter. Or, you can create a new project:

    new-pydev-project

  8. If you later install additional libraries, you will need to go back to the interpreter definitions, click "Apply", and tell Pydev which interpreters it should scan again. Until you do that, PyDev might not notice your new libraries. For more information, see this post.

...And you're locked and loaded for Python development! Go get 'em!


  1. While writing this post I discovered that PyCharm now has a free version! You can bet that I will be switching over very soon! I have extremely high regard for all the JetBrains IDEs. In my opinion, these guys can do no wrong.

    The only thing that may make me hesitate to switch is the fact that I may sometimes need to develop partly in C++, and there is not yet a JetBrains C++ IDE. They're working on it (which makes me very excited!) but they've still got a lot to do (which makes me very sad-face).

Python for Computer Vision

This is a quick installation guide that will show you how to set up a programming environment for writing computer vision algorithms in Python. You'll install Python, an IDE, and some supporting libraries. This guide is mostly cross-platform, with some emphasis on OS X.

You will need:

  • Python 3.x (3.3 at time of writing)1
  • Python libraries for common vision & scientific computing tasks
  • OpenCV (optional)
  • Eclipse with PyDev (optional but recommended)

Here are the Python libraries that you will use:

  • Python Imaging Library (PIL)
  • NumPy
  • matplotlib

And here are a couple additional ones which are optional, but you'll probably find them useful sooner or later:

  • SciPy
  • scikit-image
  • ipython

To install them you will use pip and virtualenv.2

Python and Assorted Libraries

You likely already have Python on your computer. But if you are on a Mac, I recommend for you to use Homebrew to manage your Python installations.

$ brew install python3 # Using Python 3, but you can also use Python 2.

If you don't already have pip, install it now (if you're using Homebrew, this should already have been done for you):

$ curl -O https://bootstrap.pypa.io/get-pip.py
$ sudo python get-pip.py

If you don't already have virtualenv, install it now:

$ sudo pip3 install virtualenv # Use 'pip' for Python 2, 'pip3' for Python 3
$ sudo pip3 install virtualenvwrapper

You could at this point try installing your Python packages, but you may have some missing dependencies.

On OS X, I needed to perform the following installations first (note that freetype may already exist somewhere, but needs to be symlinked to the correct location):3

$ brew install freetype # required by PIL
$ ln -s /usr/local/include/freetype2 /usr/local/include/freetype # only on OS X; see footnote 3
$ brew install swig # required by scipy

On Linux, I needed to perform the following installations first (note that there are alternative choices for all of these dependencies; you just need some version of BLAS and LAPACK and a Fortran compiler):

$ sudo apt-get install libblas-dev # required by scipy
$ sudo apt-get install liblapack-dev # required by scipy
$ sudo apt-get install gfortran # required by scipy

Now you should be ready to install your cool Python tools!

Linux users: You may be able to skip part of the following step, because the major packages are often shipped with Linux distributions. It can't hurt to install the latest version, but you don't need to if you don't want to. Find out what's already installed with pip list. Find out if newer versions are available with pip list --outdated.

# This will automatically switch you into the new virtualenv so you can start installing packages.
# Your new virtualenv will be called "vision".
# You can exclude the "-p `which python3`" if you don't want to use Python 3.
$ mkvirtualenv -p `which python3` vision
$ pip install Pillow # see footnote 4
$ pip install numpy
$ pip install matplotlib

# And the optional packages:
$ pip install scipy
$ pip install ipython
$ pip install cython
$ pip install scikit-image

# confirm that everything worked
$ pip list

That's it! You're all ready to go with your next-generation Python algorithms for computer vision! If you additionally want to install OpenCV, see my separate post about that. If you don't yet have a Python development environment, do read my post on PyDev and virtualenv.


  1. I'm using Python 3 here. If you know anything about Python, you'll have heard how much confusion there is around Python 2 vs. Python 3. You can also use Python 2, but the entire NumPy/SciPy ecosystem has supported Python 3 for a couple years now, so you should be safe to prefer 3. Homebrew manages 2 and 3 as completely separate packages. You can have both simultaneously installed on your Mac, and 'python' will always refer to Python 2, while 'python3' will always refer to Python 3. The only hitch is you will have to remember to specify python3 for your virtualenv, and use pip3 to install global libraries for Python 3. If you don't understand what that means, just forget I even said it; I've written my instructions to do things the Python 3 way.
  2. If you need an introduction to Python's packaging system, see this page. TL;DR: pip lets you install packages (Python libraries). Usually, you do not want to install packages globally, for the entire system, because they may conflict with each other. Instead, you want each Python project you create to have its own isolated ecosystem. Virtualenv provides this isolation. Virtualenvwrapper makes virtualenv nicer to use.
  3. I'm not sure why Freetype is in a different location on OS X than on Linux, but I guess this is the location that Xcode decided upon and Homebrew follows suit. So we just create a quick symlink and hopefully never have to worry about it again.
  4. While writing this post I discovered a new package for the Python Imaging Library. It seems that support for PIL is waning, and is not available via pip by default. It might someday regain favor, but I find Pillow to be better supported at the moment.