2

I often need to create two versions of an ipython notebook: One contains tasks to be carried out (usually including some python code and output), the other contains the same text plus solutions. Let's call them the assignment and the solution.

It is easy to generate the solution document first, then strip the answers to generate the assignment (or vice versa). But if I subsequently need to make changes (and I always do), I need to repeat the stripping process. Is there a reasonable workflow that will allow changes in the assignment to be propagated to the solutions document?

Partial self-answer: I have experimented with leveraging mercurial's hg copy, which will let two files with different names share history. But I can only get this to work if assignment and solution are in different directories, in two linked hg repositories. I would much prefer a simpler set-up. I've also noticed that diff gets very confused when one JSON file has more sections than another, making a VCS-based solution even less attractive. (To be clear: Ordinary use of a VCS with notebooks is fine; it's the parallel versions that stumble).

This question covers similar ground, but does not solve my problem. In fact an answer to my question would solve the OP's second remaining problem, "pulling changes" (see the Update section).

2 Answers 2

1

It sounds like you are maintaining an assignment and an answer key of some kind and want to be able to distribute the assignments (without solutions) to students, and still have the answers for yourself or a TA.

For something like this, I would create two branches "unsolved" and "solved". First write the questions on the "unsolved" branch. Then create the "solved" branch from there and add the solutions. If you ever need to update a question, update back to the "unsolved" branch, make the update and merge the change into "solved" and fix the solution.

You could try going the other way, but my hunch is that going "backwards" from solved to unsolved might be strange to maintain.

1
  • Thanks, this makes sense from the VCS standpoint, it's pretty much what I've already tried. And indeed the unsolved version needs to be upstream, otherwise every time I modify a solution, the changes would be getting propagated downstream to unsolved (and cause merge conflicts.) Unfortunately, it's not so simple: First of all, having two directories is unnatural and causes problems with the resources that must be present (and need to be checked in as well). Second, ipython notebooks are JSON structures and they do not diff well. This probably needs a Notebook-centric solution, I suspect.
    – alexis
    Commented Jun 9, 2014 at 18:48
1

After some experimentation I concluded that it is best to tackle this by processing the notebook's JSON code. Version control systems are not the right approach, for the following reasons:

  1. JSON doesn't diff very well when adding or deleting cells. A minimal change leads to mis-matched braces and a very messy diff.

  2. In my use case, the superset version of the file (containing both the assignments and their solutions) must be the source document. This is because the assignment includes example code and output that depends on earlier parts, to be written by the students. This model does not play well with version control, as pointed out by @ChrisPhillips in his answer.

I ended up filtering the JSON structure for the notebook and stripping out the solution cells; they may be recognized via special metadata (which can be set interactively using the metadata button in the interface), or by pattern-matching on the cell contents. The following snippet shows how to filter out cells whose first line starts with # SOLUTION:

def stripcell(cell, pattern):
    """Check if the first line of the cell's content matches `pattern`"""
    if cell["cell_type"] == "code":
        content = cell["input"]
    else:
        content = cell["source"]    
    return ( len(content) > 0 and re.search(pattern, content[0]) )

pattern = r"^# SOLUTION:"

struct = json.load(open("input.ipynb"))    
cells = struct["worksheets"][0]["cells"]
struct["worksheets"][0]["cells"] = [ c for c in cells if not stripcell(c, pattern) ]

json.dump(struct, open("output.ipynb", "wb"), indent=1)

I used the generic json library rather than the notebook API. If there's a better way to go about it, please let me know.

Not the answer you're looking for? Browse other questions tagged or ask your own question.