data-analyst-nanodegree I was very skeptical at first, but now I think Udacity might really be on to something with their new "Nanodegrees". Check out the Data Analyst Nanodegree, for instance. I mean, I'm no hiring manager for data analysts, but if I were, I'd definitely be willing to pick up a person who had completed all these courses:

  • Intro to Data Science
  • Data Wrangling with MongoDB
  • Data Analysis with R
  • Intro to Machine Learning
  • Data Visualization

Now, there are some caveats to my hire. Mainly, I'd be willing to hire such a person because people with the above experience are highly improbable unicorns for whom the demand appeared practically overnight and now pretty much every single business wants one. For someone like that, you're much more willing to hire a go-getter with little experience but eagerness to learn. Further, I'd be a lot more attracted to someone who was already employed as a programmer, and picked up this degree on the side. And I would definitely, absolutely only hire this person if there was an expert data scientist to oversee and mentor them. I'd be very reluctant to hire a "data analyst" based on some 400-hour program where they learned how to run some basic statistical models but have no idea how to analyze the variance on their predictions or identify a when a biased sample is corrupting their results (I don't know how to do that yet so I'd be very reluctant to hire myself as a data scientist).1


Hansel is hot right now. Could there be a Hansel Nanodegree?

So maybe this model only works for professions that are hot right now. But if it can prove itself beyond that, we could be looking at a return to vocational and trade certifications in the United States.

Many professions have experienced extreme academic inflation over the past few decades, like nursing, which in my parents' generation was a 2-year part-time program and is now a 4-year full-time bachelor's degree at the least. In recent years, there have been an explosion of master's degrees, from new inventions like the OMSCS and more MBAs than you can count on your fingers, to professional masters like UBC's MSS and U of T's MScAC, to minute specializations like Cybersecurity, Data Warehousing & Business Intelligence, and yes, Data Science. (I just googled for it and there's a whole website devoted to the concept of MS in Data Science! Geez!)

At most U.S. institutions, these types of programs now cost $30,000–$40,000 per year!! I've never seen a bubble that was so ready to burst. This inflation needs to be reined in, and the entire post-secondary education system needs to be split into smaller, more manageable chunks. Maybe these nanodegrees are just the thing. Let's just hope they stay at a reasonable price.2

  1. If you're considering hiring me for a data scientist position, please ignore this statement.
  2. The price is already rather unreasonable in many countries ($2400 per year), but at least in the U.S. it's competitive.

OpenCV in a Virtualenv

In an earlier post I outlined how to get set up for computer vision in Python. There, I skipped over one important component: installing OpenCV.

Partly, I've separated this to its own post because it's large enough to be a topic of its own. But mainly it's because you can actually get quite far without ever needing OpenCV. However, as I found out this weekend, if you want to do any work with video, you will pretty much be forced to use OpenCV.1 OpenCV makes it really easy to both extract individual frames of the video and draw visualizations on top of them.

Installing OpenCV is highly system-dependent, so here I will focus on OS X (as usual). The official documentation covers Windows and Linux well enough, anyway.

Two caveats here:

  1. You must install NumPy globally in order to install OpenCV with Homebrew.
  2. It's 2014 and OpenCV still doesn't work with Python 3!!2

Homebrew may warn you that you need NumPy installed first. Unfortunately you will be forced to install NumPy globally. Most scientists who hack together vision systems probably don't give a hoot about this, but I like to keep my system clean with virtualenvs. It's not a big deal though, because admittedly it would be useful to have NumPy available on the global level. Plus, I can always uninstall NumPy later; we just need it for the duration of the build process.

I tried installing while under a virtualenv, but for some reason the build did not create the necessary shared object (.so) files in my site-packages. So, we'll install both NumPy and OpenCV globally, and then copy cv to wherever we need it. Unfortunately we'll have to do this every time we create a new virtualenv for OpenCV. :-/ 3

So, make sure you are not inside a virtualenv, then issue the following commands:

# You should use Homebrew's Python if you're not already:
$ brew install python
$ pip install numpy
$ brew install opencv

# Or, you might want to include some of the optional items such as:
$ brew install --with-eigen --with-ffmpeg --with-openni opencv

# Now let's copy the cv files to our virtualenv:
$ cp /usr/local/lib/python2.7/site-packages/cv* <path-to-venv>/lib/python2.7/site-packages

Let's see if it works:

$ workon <venv>
$ python
>>> import cv2
>>> print cv2.__version__

If that didn't elicit any errors, then you're golden!

  1. Your other two options are using ffmpeg by manually calling out to the command line, or to do the video parsing yourself (which is actually pretty easy if it's an .avi file). There used to be python bindings for ffmpeg, but those are now defunct.
  2. But it's coming in version 3.0—which at the time of writing is already a month and a half late.
  3. Someone on StackOverflow told me that they had no problems quarantining OpenCV to a virtualenv. So maybe you should try it yourself and see if it works?

Eclipse with PyDev and Virtualenv

These are instructions for someone who may have already dabbled with some Python programming and is now looking for a more professional setup for productive development. I'll get you started with Python package management and IDE configuration. Justification first; skip to the procedure if you're already sold.

Why PyDev

If you don't already have a favorite development environment for Python, I highly recommend using PyDev. A lot of people are still in the dark ages, using things like IDLE. Frankly, this is an outrage. If you are one of these people, please install PyDev.

Just the use of the Eclipse editor alone will make for a much nicer programming experience. I get mad when I'm working outside of a proper editor (vim and emacs are not proper editors, and neither is IDLE). Managing your application launch configurations is another convenience that seems so minor you don't appreciate how useful it really is. But most of all, the biggest win in using PyDev is the debugger. The debugger is absolutely invaluable and if you haven't been using it, you are de facto terrible at debugging. Sorry to break it to you.

So please do take the time to set up a proper IDE. The only one better than PyDev is PyCharm. The only reason I don't use PyCharm is it costs money (until now!).1 Another possible alternative (if you do one-off, experimental scripts for science or research) is the IPython Notebook. I have no experience with either of these so I can't talk too much about them.

Why Virtualenv

You should also take the time to properly quarantine the dependencies for different projects. Chances are, if you've been using Python already then you're already familiar with the pip package manager. You may or may not be using virtualenv, however.

Here's the short version: pip lets you install packages (Python libraries). Usually, you do not want to install packages globally, for the entire system, because they may conflict with each other. Instead, you want each Python project you create to have its own isolated ecosystem. Virtualenv provides this isolation. Virtualenvwrapper makes virtualenv nicer to use.

Even if you're not worried about conflicts, virtualenv can help you make sure your demo still works years from now (especially important if you care about reproducible research). The fact is that libraries aren't always perfectly backward-compatible or bug-free. You may upgrade a package and find that it breaks your project. The more time passes since you last ran a piece of code, the more likely it is to be broken. Using a virtualenv to freeze the dependencies is a safeguard against this problem.

For a more detailed introduction to these tools, I found this blog post useful.

The Procedure

  1. First, install pip. The best way is with the script from the instructions provided here. If you use Homebrew on OS X, it might even come already installed—I'm not sure—you can use $ which pip on the command line to check (if you get no output, it's not installed).

  2. Install virtualenv and virtualenvwrapper in one go. It's as easy as:
    $ sudo pip install virtualenvwrapper

    See here for more details.

  3. Install Eclipse (any version—I recommend Eclipse IDE for C/C++ Developers or Eclipse IDE for Java Developers). This is straightforward, unless you're on Linux, in which case it's stupid retarded.

    Linux users: If you install through a package manager, you'll probably get a version that's way too old. You can simply download the binary, but then it doesn't get properly installed on your system. If you're on Ubuntu, you can fix this by following the instructions here or by using this handy little script.

  4. Now install PyDev from this Eclipse Update site: More detailed instructions can be found here.

  5. Now you need to configure PyDev to point to your new virtualenv. This is done by adding an interpreter under Preferences... > PyDev > Interpreters > Python Interpreter. You should also set up interpreters for your base installation of Python. This can be done automatically using the Auto-Config buttons. To add an interpreter for your virtualenv, you will instead need to click the New... button and Browse... for the Python executable. Under a typical setup, the location would be ~/.virtualenvs/<venv-name>/bin/python. In both cases, the appropriate libraries should be selected automatically, so leave them as they are.

    OS X Users: If you follow those instructions you'll get a big, fat warning message, like this:


    In my experience, it runs fine anyway. However, the in-editor parsing will be missing all your system libraries, so it will show you errors where in reality there are none. To fix this, you should select all libraries when you set up the interpreter:


    The only problem with this is that I'm not sure how that affects your PYTHONPATH at runtime. If you have some libraries installed globally that conflict with the ones in your virtualenv, you may run into problems. So far I haven't had any issues. Let me know if you have more info on this.

  6. After setting up your interpreters, you should see something like this:


  7. If you already have a PyDev project you can now configure the project to use this interpreter. Or, you can create a new project:


  8. If you later install additional libraries, you will need to go back to the interpreter definitions, click "Apply", and tell Pydev which interpreters it should scan again. Until you do that, PyDev might not notice your new libraries. For more information, see this post.

...And you're locked and loaded for Python development! Go get 'em!

  1. While writing this post I discovered that PyCharm now has a free version! You can bet that I will be switching over very soon! I have extremely high regard for all the JetBrains IDEs. In my opinion, these guys can do no wrong.

    The only thing that may make me hesitate to switch is the fact that I may sometimes need to develop partly in C++, and there is not yet a JetBrains C++ IDE. They're working on it (which makes me very excited!) but they've still got a lot to do (which makes me very sad-face).

Python for Computer Vision

This is a quick installation guide that will show you how to set up a programming environment for writing computer vision algorithms in Python. You'll install Python, an IDE, and some supporting libraries. This guide is mostly cross-platform, with some emphasis on OS X.

You will need:

  • Python 3.x (3.3 at time of writing)1
  • Python libraries for common vision & scientific computing tasks
  • OpenCV (optional)
  • Eclipse with PyDev (optional but recommended)

Here are the Python libraries that you will use:

  • Python Imaging Library (PIL)
  • NumPy
  • matplotlib

And here are a couple additional ones which are optional, but you'll probably find them useful sooner or later:

  • SciPy
  • scikit-image
  • ipython

To install them you will use pip and virtualenv.2

Python and Assorted Libraries

You likely already have Python on your computer. But if you are on a Mac, I recommend for you to use Homebrew to manage your Python installations.

$ brew install python3 # Using Python 3, but you can also use Python 2.

If you don't already have pip, install it now (if you're using Homebrew, this should already have been done for you):

$ curl -O
$ sudo python

If you don't already have virtualenv, install it now:

$ sudo pip3 install virtualenv # Use 'pip' for Python 2, 'pip3' for Python 3
$ sudo pip3 install virtualenvwrapper

You could at this point try installing your Python packages, but you may have some missing dependencies.

On OS X, I needed to perform the following installations first (note that freetype may already exist somewhere, but needs to be symlinked to the correct location):3

$ brew install freetype # required by PIL
$ ln -s /usr/local/include/freetype2 /usr/local/include/freetype # only on OS X; see footnote 3
$ brew install swig # required by scipy

On Linux, I needed to perform the following installations first (note that there are alternative choices for all of these dependencies; you just need some version of BLAS and LAPACK and a Fortran compiler):

$ sudo apt-get install libblas-dev # required by scipy
$ sudo apt-get install liblapack-dev # required by scipy
$ sudo apt-get install gfortran # required by scipy

Now you should be ready to install your cool Python tools!

Linux users: You may be able to skip part of the following step, because the major packages are often shipped with Linux distributions. It can't hurt to install the latest version, but you don't need to if you don't want to. Find out what's already installed with pip list. Find out if newer versions are available with pip list --outdated.

# This will automatically switch you into the new virtualenv so you can start installing packages.
# Your new virtualenv will be called "vision".
# You can exclude the "-p `which python3`" if you don't want to use Python 3.
$ mkvirtualenv -p `which python3` vision
$ pip install Pillow # see footnote 4
$ pip install numpy
$ pip install matplotlib

# And the optional packages:
$ pip install scipy
$ pip install ipython
$ pip install cython
$ pip install scikit-image

# confirm that everything worked
$ pip list

That's it! You're all ready to go with your next-generation Python algorithms for computer vision! If you additionally want to install OpenCV, see my separate post about that. If you don't yet have a Python development environment, do read my post on PyDev and virtualenv.

  1. I'm using Python 3 here. If you know anything about Python, you'll have heard how much confusion there is around Python 2 vs. Python 3. You can also use Python 2, but the entire NumPy/SciPy ecosystem has supported Python 3 for a couple years now, so you should be safe to prefer 3. Homebrew manages 2 and 3 as completely separate packages. You can have both simultaneously installed on your Mac, and 'python' will always refer to Python 2, while 'python3' will always refer to Python 3. The only hitch is you will have to remember to specify python3 for your virtualenv, and use pip3 to install global libraries for Python 3. If you don't understand what that means, just forget I even said it; I've written my instructions to do things the Python 3 way.
  2. If you need an introduction to Python's packaging system, see this page. TL;DR: pip lets you install packages (Python libraries). Usually, you do not want to install packages globally, for the entire system, because they may conflict with each other. Instead, you want each Python project you create to have its own isolated ecosystem. Virtualenv provides this isolation. Virtualenvwrapper makes virtualenv nicer to use.
  3. I'm not sure why Freetype is in a different location on OS X than on Linux, but I guess this is the location that Xcode decided upon and Homebrew follows suit. So we just create a quick symlink and hopefully never have to worry about it again.
  4. While writing this post I discovered a new package for the Python Imaging Library. It seems that support for PIL is waning, and is not available via pip by default. It might someday regain favor, but I find Pillow to be better supported at the moment.

Why PhD?

This is a repost of my answer to the Quora question: What are some strong motivations for earning a PhD?

You want to earn a PhD because...

  • You want to surround yourself with the best, brightest, and most motivated people on Earth, and in so doing, push yourself to become one of them.
  • You want to get paid to (mostly) have free reign to explore your own ideas, to fail without consequences, or to try something that doesn't have to work—as long as it expands human knowledge.
  • This video looks like heaven to you: rocks, bands, logic (2012)
  • You relish being able to say "I just got home from the lab" instead of "I just got home from the office".
  • You want your life to be science fiction.
  • You want to live on the very precipice of human knowledge, and witness new discoveries firsthand, years before the general public catches wind of it.
  • You've looked everywhere and you just can't find the stimulating intellectual atmosphere like that of a university. Your coworkers just don't seem to think about or care about the same things as you.
  • No matter what high-tech goliath or trendy startup or progressive non-profit you work for, it's just not fulfilling. It appears to you that fulfilling industry jobs do exist, but they almost all require a PhD and involve at least part-time research.
  • You constantly challenge and push yourself. You relish being around people who do the same.
  • You want to continue growing and learning at a breakneck pace. (It is possible to do this in the workplace, but far from automatic. Being in school forces you to grow due to the incredibly steep learning curve. See here about the importance of maintaining a steep learning curve: Edmond Lau's answer to Career Advice: How do you know when it's time to leave your current company and move on?)
  • You tried briefly to follow research on your own, but it's just too hard to navigate. You need someone to guide you into the field. You need a set of strong mentors to teach you how to read a paper and how to identify the important results in the field (be it science, history, philosophy... even literature).
  • You want to work at pretty much the only place outside the army that will demand everything you can give and force you to "Be all you can be".
  • You want to own all of your work. You decide what you're going to work on and whether to give it away for free or start a company out of it. Unlike the army, you're not a cog in someone else's machine here.
  • You want a job that will have you (1) travel all over the world and (2) meet people who are the best in the world at what they do.
  • You want to be part of a global community (science transcends borders; same for other scholarly disciplines).
  • You want a job that will train you in (or force you to learn) all kinds of invaluable life skills: technical writing and communication, delivering presentations and speeches, teaching and mentoring, etc. (It's pretty hard to get such a wide range of leadership skills in any other entry-level job!)
  • You want your contribution to the world to be in the form of knowledge. In their careers, most people render a product or service to the world. Your job is to render knowledge, and publications are the medium.
  • And of course, you want a job where you can answer questions on Quora instead of working for a day. ;-)

This might be a slightly romanticized version, but you asked for some strong reasons, and these mostly hold true. They may not all apply to you, but the biggest advantage of a PhD is the flexibility. It really is what you make it out to be; you can become more involved and practice more leadership skills, or you can put your head down and bury yourself in your research. You choose how often you skip out on work and go to the beach. You decide who to collaborate with, and when to go on vacation.

Whichever combination of these reasons appeals to you, they should obey this one overriding rule: You do a PhD for the experience of doing a PhD. You don't do a PhD for the job that comes afterward. Some people go on into very lucrative jobs, some start companies out of their PhD, and some struggle with low wage for the rest of their lives. Whatever comes after the PhD, is whatever comes after the PhD. You do the degree for the degree itself.

People often ask me what I want out of my PhD, what my end goal is, why am I doing it, what job title am I looking for? Well, of course I have some ideas about that—too many ideas, in fact—but for now, for right now, the answer to their question is actually: I want the job title of PhD Candidate. Seriously. There is nothing better I can imagine doing with my life right now than being paid to dig deep, being paid to learn, to take classes, to take on outlandish projects, to soak up knowledge like a sponge. I have to live on a very meagre salary and work very hard, but in return I get this incredible job with flexible hours and amazing colleagues from all over the world. If I'm lucky, I'll get to continue doing that after I'm done, and I'll work in a place with the same invigorating environment and the same clever, diverse, interesting colleagues. I can only hope.

Disclaimer: I'm currently only in the first year of my Masters, but already grad school has been the most rewarding and transforming experience of my entire life, and I don't expect it to change anytime soon.