# make figures better:
font = {'weight':'normal','size':20}
matplotlib.rc('font', **font)
matplotlib.rc('figure', figsize=(9.0, 6.0))
matplotlib.rc('xtick.major', pad=10) # xticks too close to border!
import warnings
warnings.filterwarnings('ignore')
import random, math
import numpy as np
import scipy, scipy.stats
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
nicered = "#E6072A"
niceblu = "#424FA4"
nicegrn = "#6DC048"
James Bagrow, james.bagrow@uvm.edu, http://bagrow.com
Given a small number of low-level drawing commands (draw point, draw line segment, draw filled region) you can build up to pretty much any more advanced drawing level. For example, to draw a circle of radius \(r\) centered at \(x_0,y_0\) you only need to draw a number of line segments:
x_prev = x0 + r*cos(0) # first point of circle
y_prev = y0 + r*sin(0)
for th in [th1,th2, ..., 2*pi]:
x = x0 + r*cos(th)
y = y0 + r*sin(th)
LINE(x_prev, y_prev, x, y)
x_prev = x
y_prev = y
However, almost all drawing packages/toolboxes/frameworks/etc. also provide high-level convenience functions for typical tasks. Drawing a circle is very common so in practice you almost never have to write something like the above loop. Instead:
CIRCLE(x0, y0, r) # much shorter!
The specific function equivalents LINE
and CIRCLE
will of course depend on your particular drawing library.
You can always visualize any curve by drawing a number of short line segments, with increasing accuracy as the number of segments increases. However, most drawing systems have a different way of encoding curves (or "splines").
Consider this function:
\[ S(t) = \begin{cases} (t+1)^2-1 & -2 \le t < 0\\ 1-(t-1)^2 & 0 \le t \le 2 \end{cases} \]
Let's plot \(S(t)\):
def S(t):
if -2 <= t < 0:
return (t+1)**2 - 1
elif 0 <= t <= 2:
return 1 - (t-1)**2
else:
return float("nan")
x = []
y = []
for t in np.linspace(-2,2,100):
x.append(t)
y.append( S(t) )
plt.plot(x,y, '.-')
plt.xlabel("t")
plt.ylabel("S(t)")
plt.show()
The function S(t)
encodes a nice, smooth curve to "infinite" precision. When we go to draw it we can choose the smoothness of the curve by the number of values of t
we pick.
A more extreme example:
\[ S(t)= \begin{cases} \frac{1}{4}(t+2)^3 & -2\le t \le -1\\ \frac{1}{4}\left(3\left|t\right|^3 - 6t^2 +4 \right)& -1\le t \le 1\\ \frac{1}{4}(2-t)^3 & 1\le t \le 2 \end{cases} \]
And to draw it:
def S(t):
if -2 <= t < -1:
return 0.25 * (t+2)**3
elif -1 <= t < 1:
return 0.25 * (3*abs(t)**3 - 6*t**2 + 4)
elif 1 <= t <= 2:
return 0.25 * (2 - t)**3
else:
return float("nan")
x = []
y = []
for t in np.linspace(-2,2,100):
x.append(t)
y.append( S(t) )
plt.plot(x,y, '.-')
plt.xlabel("t")
plt.ylabel("S(t)")
plt.show()
"OK," you may be thinking, "big deal: those are just a sin function and a gaussian. Why do I need to build these crazy piece-wise functions?"
These functions are polynomials. This makes them very easy to work with and the computer can work with them very efficiently.
These piecewise, smooth polynomials are often called splines. They are so easy to work with and well understood that computer scientists decided to use them to encode curves in early graphical software. It's been the standard ever since.
Of course, looking at the previous \(S(t)\)'s, they don't look so easy. Where do the equations come from?
A bezier curve is a way of writing a polynomial as a parametric curve that is particularly convenient for drawing.
The simplest bezier curve is a straight line. Imagine a straight line connecting two points \(\vec{P}_{0} = (x_0,y_0)\) and \(\vec{P}_{1} = (x_1,y_1)\). We can write down this line as
\[ B(t) = \vec{P}_{0} + t \left(\vec{P}_{1} - \vec{P}_{0} \right) \]
where \(t\) is in \([0,1]\). This \(t\) parameterizes the curve, telling us how far along the line we are.
Cool animation:
OK, so what??
* We can combine such linear interpolations to draw much more complex curves:
First, take the linear interpolation between \(P_0\) and \(P_1\) and the interpolation between \(P_1\) and \(P_2\). As \(t\) increases a point (\(Q_0\), \(Q_1\)) will move along each line. Draw a line (shown in green) between those two points. A point moving along this moving line segment traces out a quadratic bezier curve.
To make more involved shapes keep repeating this process:
By the way, if you've ever used a vector drawing program like adobe illustrator, you may have seen these before:
The pen tool lays down a bezier curve (or spline) and the little "handles" that you drag around are the control points that guide the shape of the curve.
Here's an interactive version of the previous animations:
In principle that seems OK, but in practice isn't it going to be a lot of math to compute the curves?
In matplotlib, for example, you can draw a bezier curve using a Path
:
from matplotlib.path import Path
# define the geometry of a path:
verts = [(0.0, 0.0), # P0
(0.2, 1.0), # P1
(1.0, 0.8), # P2
(0.8, 0.0), # P3
]
codes = [Path.MOVETO, # put the "pen" at P0
Path.CURVE4, # and draw bezier curves
Path.CURVE4, # from P0 to P1, to P2
Path.CURVE4, # to P3.
]
path = Path(verts, codes) # build path from vertices & command codes
This defined a path
object that we encodes the geometry of a curve. Matplotlib (and many other packages) let you define "paths" geometrically and then use a separate object for drawing/visualization, typically called a "patch".
path
using a patch:import matplotlib.patches as patches
# build stylizeable "patch" to visualize geometric "path":
patch = patches.PathPatch(path, facecolor='none',
edgecolor=niceblu, lw=4)
ax = plt.gca() # gca = get current plot's "axes"
ax.add_patch(patch)
# draw control points:
xs, ys = zip(*verts)
ax.plot(xs, ys, 'o--', lw=2, color='black', ms=10)
# label the control points:
ax.text( 0.05, -0.05, 'P0')
ax.text( 0.15, 1.05, 'P1')
ax.text( 1.05, 0.85, 'P2')
ax.text( 0.85, -0.05, 'P3')
# resize the plot:
ax.set_xlim(-0.1, 1.15)
ax.set_ylim(-0.1, 1.15)
ax.set_aspect("equal")
plt.show()
We're programming the curve!
Here's a fun little example. Suppose we want to draw a flow chart:
from matplotlib.lines import Line2D
from matplotlib.patches import Circle
ax = plt.gca()
r = 0.08
# draw top circle
circ = Circle((0.5,0.75),r, color="red", zorder=2)
ax.add_patch(circ)
for i in range(5):
# line from top circle to bottom
line = Line2D( [0.5, i/4.0], [0.75,0.25], linewidth=5, zorder=1 )
ax.add_line(line)
# draw bottom circle
circ = Circle((i/4.0,0.25),r, color="red", zorder=2)
ax.add_patch(circ)
ax.set_aspect('equal', "datalim")
ax.axis('off')
plt.show()
Instead of straight line connectors we can program in some curves!
ax = plt.gca()
r = 0.08
x_top = 0.5
y_top = 0.75
y_bot = 0.25
# draw top circle
circ = Circle((x_top,y_top),r, color=nicered, zorder=2)
ax.add_patch(circ)
for i in range(5):
x_bot = i/4.0
# set up the path:
verts = [(x_top, y_top), # P0
(x_top, y_bot ), # P1
(x_bot, y_bot+0.6*(y_top-y_bot) ), # P2 ***
(x_bot, y_bot), # P3
]
codes = [Path.MOVETO,
Path.CURVE4,
Path.CURVE4,
Path.CURVE4,
]
path = Path(verts, codes)
patch = patches.PathPatch(path, facecolor='none', zorder=1,
edgecolor=niceblu, lw=4)
ax.add_patch(patch)
# draw bottom circle
circ = Circle((x_bot,y_bot),r, color=nicered, zorder=2)
ax.add_patch(circ)
ax.set_aspect('equal', "datalim")
ax.axis('off')
plt.show()
You may have seen commands like this:
plt.plot(X,Y, '-', color="red")
and thought, "Oh ok, a red line.". But what about this:
plt.plot(X,Y, 'o-', color='#FF00AA', markerfacecolor="#00FF00")
Those strings, which you've likely seen before if you do any web design, are hexadecimal numbers representing RGB (red green blue) colors. This is known as a hex triplet. The leading "#" is a standard convention for denoting a triple.
A hex number is base-16. It ranges from 0 to 9 and then from A to F. Base-16 is convenient on the computer when working with bytes and it lets you represent a number between 0 and 256 with two digits, where as base-10 could only represent numbers between 0 and 99 with two digits.
The six-digits in a hex triple let us define the color channels for red, green, and blue:
#RRGGBB
So the color pure red, sometimes denoted RGB(1.0,0,0)
is #FF0000
. The first FF
the largest value possible, while the other two channels are 00
since there is no green or blue in the color.
Since hex triples are base-16, people often write the color scale going from 0 to 255 instead of 0 to 1, so pure red would then be RGB(255,0,0)
with the same hex representation.
This can sometimes cause problems. If a function wants the color channels to be in [0,1] and you pass a channel value of 128 (which represents 50%), the function may round that down to 1, the largest value it assumes can exist.
The way to represent color on a computer is non-trivial and there are lots of different color systems beside RGB.
RGB is convenient because media that transmit light (such as TVs) use red, green, and blue pixels. Let's see how a modern computer display actually works, it's cool!
from IPython.display import YouTubeVideo
YouTubeVideo("jiejNAUwcQ8", width=600, height=600*0.8235)
So the computer display is just an array of red, green, and blue pixels in close proximity. Colored light gets mixed. This is called additive mixing:
This is different from what paint and pigment does, which is called subtractive mixing:
Notice how the circles are "cyan", "magenta", and "yellow" and their intersections are red, green, and blue? This is why high-end graphic design doesn't use RGB colors but instead uses CMYK: it more accurately models the ink in a printing press.
Our brains automatically mix light additively:
There are only three colors in that image!
Additive mixing is easy, it's just addition. Let's mix two colors. All we do is sum up the three color channels elementwise and then round down any numbers that are too big:
c1 = (1.0, 0.0, 0.0) # pure red in RGB
c2 = (0.0, 0.0, 1.0) # pure blue
cS = ( c1[0]+c2[0], c1[1]+c2[1], c1[2]+c2[2] )
# round down to 1.0:
cS = ( min([cS[0],1]),
min([cS[1],1]),
min([cS[2],1]) )
print cS # should be 100% red and 100% blue
Now to convert that tuple to a hex triple string is a little weird. Here's a function:
def rgb_to_rgb256(rgb):
"""Map [0,1] rgb to [0,255] rgb."""
r,g,b = rgb
return ( int(255*r), int(255*g), int(255*b) )
def rgb256_to_hex(rgb):
"""Make hex triple from rgb"""
return '#%02X%02X%02X' % rgb
print cS
hex_triple = rgb256_to_hex( rgb_to_rgb256(cS) )
print hex_triple
Let's see what we've got:
h1 = rgb256_to_hex( rgb_to_rgb256(c1) )
h2 = rgb256_to_hex( rgb_to_rgb256(c2) )
# Circle((x0, y0 ), r , )
circle1=plt.Circle((0.25,0.25),0.25, color=h1)
circle3=plt.Circle((0.75,0.75),0.25, color=h2)
circle2=plt.Circle((0.50,0.50),0.25, color=hex_triple)
plt.clf()
fig = plt.gcf()
fig.gca().add_artist(circle1)
fig.gca().add_artist(circle2)
fig.gca().add_artist(circle3)
fig.gca().set_aspect('equal')
plt.show()
There is another convenient way to blend RGB colors mathematically, just take the averages of the RGB channels:
color = (255,0,0) # pure red
h = rgb256_to_hex(color)
print 0, h, color
ax = plt.gca()
r_circ = 0.1
circ = Circle((0/10.0,0.25),r_circ, color=h, ec='black')
ax.add_patch(circ)
circ = Circle((0/10.0,0.75),r_circ, color=h, ec='none')
ax.add_patch(circ)
for i in range(7): # draw seven circles
r,g,b = color
r = (r + 0)/2.0 # average with blue, rgb(0,0,255)
g = (g + 0)/2.0
b = (b + 255)/2.0
color = (int(r),int(g),int(b))
h = rgb256_to_hex(color)
print i+1, h, color
# draw two circles
x_circ = (i+1)/7.0
circ = Circle( (x_circ, 0.25), r_circ,
color=h, ec="black", lw=1.0)
ax.add_patch(circ)
circ = Circle( (x_circ, 0.75), r_circ,
color=h, ec="none", lw=1.0)
ax.add_patch(circ)
ax.set_aspect('equal')
plt.show()
This repeated averaging takes us from the first color towards the second. Although you see the scale appears to be nonlinear. This is because each step through the loop mixes 50% blue with 50% of the current color; there is far more blue than red.
If you want to uniformly change from red to blue, you just need to linearly interpolate between the two colors:
r1,g1,b1 = (255,0,0)
r2,g2,b2 = (0,0,255)
ax = plt.gca()
for a in [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]:
# linear interpolation:
r = (1-a)*r1 + a*r2
g = (1-a)*g1 + a*g2
b = (1-a)*b1 + a*b2
color = (int(r),int(g),int(b))
h = rgb256_to_hex(color)
print a, h, color
circ = Circle( ( a, 0.25), 0.1,
color=h, ec="black", lw=1.0)
ax.add_patch(circ)
circ = Circle( ( a, 0.75), 0.1,
color=h, ec="none", lw=1.0)
ax.add_patch(circ)
ax.set_aspect('equal')
plt.show()
The (R,G,B) tuples can be thought as defining a space:
The XYZ euclidean dimensions map to RGB.
Euclidean coordinates are not the only way to describe 3D space. There are also spherical and cylindrical coordinates:
We can use cylindrical coordinates for colors. This is known as HSV (HSB, HSL) colors.
A saturation of 0 corresponds to 'white', while a value of 0 corresponds to 'black'.
For a fixed saturation, we get a 2D color space:
Python provides a nice module, colorsys
for converting between color systems. Here's an example converting some HSV colors to RGB
import colorsys # h in [0,1] for this module, not [0,360]
print colorsys.hsv_to_rgb(1.0, 0.0,0.0 ) # black?
print colorsys.hsv_to_rgb(0.5, 1.0,1.0 )
print colorsys.hsv_to_rgb(1.0, 1.0,1.0 )
OK cylinders are great. So what?
Unlike RGB, HSV separates color and brightness/lightness. This lets us do certain operations more conveniently.
In RGB we need to carefully pick (r,g,b) values very far apart. But in HSV all we need to do is pick five values of H that are evenly spaced between 0 and 360 degrees:
num_colors = 5.0
hue = 0.0
sat, val = 1.0, 1.0
while hue < 1.0:
rgb = list( colorsys.hsv_to_rgb(hue, 1.0, 1.0) )
hex = rgb256_to_hex(rgb_to_rgb256(rgb))
print hue, "-->", hex, rgb
hue += 1.0/num_colors;
These colors are pretty much guaranteed to be as distinct as possible for a given number of colors. Of course, if you have hundreds of colors they will be forced to be very close to one another.
HSV is also nice for darkening or lightening a color without changing its saturation, just change V.
Here's some useful functions you may want to use.
def distinguishable_colors(num, sat=1.0, val=1.0):
"""Generate a list of `num' rgb hexadecimal color strings. The strings are
linearly spaced along hue values from 0 to 1, leading to `num' colors with
maximally different hues.
Example:
>>> print distinguishable_colors(5)
['#ff0000', '#ccff00', '#00ff66', '#0066ff', '#cc00ff']
"""
list_colors = [];
hue = 0.0
while abs(hue - 1.0) > 1e-4:
rgb = list( colorsys.hsv_to_rgb(hue, sat, val) )
list_colors.append( rgb_to_hex(rgb) )
hue += 1.0/num;
return list_colors
def rgb_to_hex(rgb):
"""Convert an rgb 3-tuple to a hexadecimal color string.
Example:
>>> print rgb_to_hex((0.50,0.2,0.8))
#8033cc
"""
return '#%02x%02x%02x' % tuple([round(x*255) for x in rgb])
def hex_to_rgb(hexrgb):
""" Convert a hexadecimal color string to an rgb 3-tuple.
Example:
>>> print hex_to_rgb("#8033CC")
(0.502, 0.2, 0.8)
"""
hexrgb = hexrgb.lstrip('#')
lv = len(hexrgb)
return tuple(round(int(hexrgb[i:i+lv/3], 16)/255.0,4) for i in range(0, lv, lv/3))
def darken_hex(hexrgb, factor=0.5):
"""Take an rgb color of the form #RRGGBB and darken it by `factor' without
changing the color. Specifically the RGB is converted to HSV and V ->
V*factor.
Example:
>>> print darken_hex("#8033CC")
'#401966'
"""
rgb = hex_to_rgb(hexrgb)
hsv = list(colorsys.rgb_to_hsv(*rgb))
hsv[2] = hsv[2]*factor
rgb = colorsys.hsv_to_rgb(*hsv)
return rgb_to_hex(rgb)
def darken_rgb(rgb, factor=0.5):
"""Take an rgb 3-tuple and darken it by `factor', approximately
preserving the hue.
Example:
>>> print darken_rgb((0.5,0.2,0.7))
(0.251, 0.098, 0.3529)
"""
hexrgb = darken_hex(rgb_to_hex(rgb), factor=factor)
return hex_to_rgb(hexrgb)
print darken_rgb((0.5,0.2,0.7))
print darken_hex("#8033CC")
print hex_to_rgb("#8033CC")
print rgb_to_hex((0.50,0.2,0.8))
print distinguishable_colors(5)
Colormaps are a concept specific to plotting. Given a \(t\), say between 0 and 1, we want a color to demonstrate the value of \(t\). In other words we want a function \(f(t)\) that takes a value \(t\) and return three values \((r,g,b)\). Alternatively we can think of this as three functions \(\left(f_R(t), f_G(t), f_B(t)\right)\). These functions define a color map.
A colormap lets us make a plot like this:
(We've been doing these for a while.)
Or something like this:
The colormap is then drawn for the viewer with the colorbar on the right side of these plots.
Matplotlib has lots of builtin colormaps:
jet
is very common, as is hot
.
image = np.random.rand(10,10)
fig, axs = plt.subplots(1,2, figsize=(8,6))
im = axs[0].imshow(image, interpolation='none', cmap=plt.cm.jet)
plt.colorbar(im,ax=axs[0])
im = axs[1].imshow(image, interpolation='none', cmap=plt.cm.hot)
plt.colorbar(im,ax=axs[1])
plt.show()
Matplotlib also provides facilities to design your own colormaps!
How to choose combinations of colors that are visually pleasing and also represent aspects of the data?
You've probably seen things about complementary colors and the color wheel before. That is, color schemes:
There are lots of tools online for designing color schemes, usually for designing logos, web pages, etc.
But there is another component to choosing a color scheme when visualizing data. You want to capture the theme of the data.
These are especially useful for maps:
Here's a great online tool especially for choosing map color schemes:
Remember that some fraction of your audience may not be able to distinguish all colors equally well. Often red and green appear the same for color blind people.
There are tools for this as well:
(I skipped over some important details with colors, in particular how the human eye reacts nonlinearly to red vs. green vs. blue; how Macs and Windows machines show colors differently, and how displays (and printers!) need calibration. It's a mess!)