Sunday, March 24, 2013

Setting up matplotlib Python Package on a Mac

Thanks to this post, I've managed to get python and a few packages working on a Hackintosh running Snow Leopard. However, I found matplotlib to be particularly challenging, even with the instructions. Hence, for myself and others, I have decided to record my insights.

Once you have homebrew, python, and numpy installed as indicated in the post, above, do as follows in a terminal:

brew install freetype
brew install libpng

chmod +w /usr/local/lib
brew link freetype --force --overwrite

The last line should fail, but list a couple of files. Find the files (it tells you where they are) and delete them (or move them into a new folder for safe keeping). Retype the command and the link should occur. Then:

chmod +w /usr/local/include
brew link libpng --force --overwrite

As in the previous case, this will fail and list a few files. Proceed as in the previous case. Retype the command.

Finally type:

pip install matplotlib
or
pip install git+git://github.com/matplotlib/matplotlib.git#egg=matplotlib-dev

You'll have to use the latter if you're using the Mountain Lion OS. It was the latter that I concluded with, personally.

Monday, March 18, 2013

Kruskal's MST Algorithm in Python

I needed this to create an algorithm that calculates base cycles in a graph. The code requires the disjoint set class from a previous post.

def kruskal_mst(vertices, edges):
  """Kruskal's algorithm that returns edge list of minimum spanning tree.

  For details see:
  http://en.wikipedia.org/wiki/Kruskal's_algorithm

  vertices - a list of the vertex names
  edges - a list of paired tuples representing edges e.g., ('v1', 'v2')

  """
  span = []
  spanAdd = span.append
  data = DisjointDataStruct(vertices)
  for v1, v2 in edges:
    if data.find(v1) != data.find(v2):
      spanAdd((v1, v2))
      data.union(v1, v2)
  return span


Note: This is not a weighted graph implementation. To make it such, have the edges come in with weights as the first value in a tuple triplet. Call edges.sort() after data assignment and make the for statement read:

for w, v1, v2 in edges:
  ...

Disjoint-set Data Structure in Python

I needed this code to write Kruskal's minimum spanning tree algorithm.

class DisjointDataStruct(object): 
  """Disjoint-set data structure. 

  For details see:
  http://en.wikipedia.org/wiki/Disjoint-set_data_structure

  Public functions:
  union(x, y)
  find(x)
  """

  def __init__(self, vertices):
    """My personal representation of the spaghetti stack.

    vertices - a list of vertex names from a graph
    """
    self.struct = {v: [v, 0] for v in vertices}

  def union(self, x, y):
    """Joins subsets x and y into a single subset.

    Uses union by rank optimization.

    x, y - vertex names corresponding to keys in self.struct
    """
    xRoot = self.find(x)
    yRoot = self.find(y)
    if xRoot == yRoot:
      return

    xV = self.struct[xRoot]
    yV = self.struct[yRoot]
    if xV[1] < yV[1]:
      xV[0] = yRoot
    elif xV[1] > yV[1]:
      yV[0] = xRoot
    else:
      yV[0] = xRoot
      xV[1] += 1

  def find(self, x):
    """Determines which subset a particular element x is in.

    Uses path compression optimization.

    x - vertex name corresponding to key in self.struct
    """
    if self.struct[x][0] != x:
      self.struct[x][0] = self.find(self.struct[x][0])
    return self.struct[x][0]

The timings are as follows:
__init__ is 17.5ms per loop in 100 loops with 100000 vertices (using timeit)
union has a mean of 1.0ms, standard deviation of 1.10e-7, min of ~0.9ms, and max of 1.0ms.
The union data was computed over the 224 union time values that did not equal 0 over a total of 100000 random unions of 100000 vertices (using time.time end - start).

On a single run of cProfile with a vertex set of 100000, and 100000 random unions where the two elements are not equal (I increment one of them if they are), I get the following values:
0.042s for __init__
0.197s for union in total (100000 calls) so per call is ~1.97e-06s
0.130s for find in total (299990 total / 200000 primitive) so per call is ~4.33e-07s

All of this was clocked running an i5-2500K @ 3.3GHz

Sunday, March 10, 2013

2D Wraparound World

I figured I would share a finding I had while working on a project. For those of you who have attempted to code a representation of a wraparound map (2D torus) using a 1 or 2D array, you may have found the experience quite challenging to get operating smoothly. I was trying to find a more elegant solution when I stumbled upon the following formula:


sqrt(min(|x1 - x2|, w - |x1 - x2|)^2 + min(|y1 - y2|, h - |y1-y2|)^2)

Where the two points are (x1, y1) and (x2, y2);
w and h are the width and height;
min and sqrt are minimization and square root functions, respectively.

With that the process is absurdly easy.

Note: If you just need relative distance for a comparison, don't bother using the square root.


Function courtesy of:
http://stackoverflow.com/questions/2123947/calculate-distance-between-two-x-y-coordinates/2123977#2123977

Saturday, March 9, 2013

Python Nested List

In a previous post, I outlined a python function for creating nested lists. In retrospect, it was a naive, if functional, first attempt. As the size of the various dimensions increases, the function would become quite slow largely as a result of the deepcopy calls and recursion. A more efficient method would use the following syntax:

nestedList = [[[baseValue for x in range(axisLength)] 
                for y in range(axisLength)] for z in ...]

Now, a naive attempt to turn this into a function might do something like the following:

def nestedList(axisLength, degree, base):
  for x in range(degree):
    base = [base for y in range(axisLength)]
  return base

But this will actually just create shallow copies. The problem is that base is not recreated at each iteration. However, this is fixable with generators. For example:

def baseGen(base, axisLength):   
  while True:
    try: 
      yield [base.next() for x in range(axisLength)]
    except AttributeError:
      yield [base for x in range(axisLength)]

def nestedList(base, axisLength, degree):
  for d in range(degree):
    base = baseGen(base, axisLength)
  return base

Then, you would just call nList = nestedList(None, 5, 3) and then ls = nList.next() for a 5x5x5 cube with None as the base value. An added advantage of using generators is that you can just keep calling nList.next() every time you want a new cube.