Posts technical---or quite simplistic » math

Gradient transformations

dnquark — Mon, 04 Jun 2012 11:58:12 +0000

From the "I keep having to re-derive this because the Wikipedia explanation sucks" department, here are the transformation rules for differentials and for the gradient in terms of the Jacobian (you can think of this as an adaptation of the simple multivariate chain rule). Of course, the real story is quite a bit more interesting and nuanced, and revolves around the fact that gradient components live in the dual space -- hence the strange inverse transpose Jacobian transformation law. For an elementary, yet rather insightful and detailed discussion I would recommend Koks' Explorations in Mathematical Physics (Chapter 8).

Integrating Dirichlet distributions

dnquark — Tue, 29 May 2012 08:06:06 +0000

I've been learning a whole lot about PGMs and machine learning lately. I don't consider it straying too far from my physics roots, in light of the fact that many juicy bits of contemporary AI, such as Markov random fields or Metropolis-Hastings sampling originated in physics. This connection notwithstanding, my background doesn't give me that great of an advantage -- most of the time. A few days ago, however, I was able to apply one delicious trick I knew in order to work out the integral of a Dirichlet distribution, and I can't help sharing it here. This story has it all -- Fourier representation of the Dirac delta, Gamma functions, Laplace transforms, sandwiches, Bromwiches -- and yet it all fits into a pretty simple narrative.

Our story starts on a stormy summer night, with our protagonist grappling with the following question: how on Earth do you take this integral?

$\int_{\theta_1}\cdots\int_{\theta_k} \prod_{j=1}^k\theta_j^{\alpha-1} d\theta_1\cdots d\theta_k$

Variables $\theta_j$ here are multinomial parameters, and thus must be non-negative and sum to 1. In a valiant (but ultimately futile) effort to avoid doing this integral by myself, I found this blog post -- which was a good start, and would allow me to casually drop terms like "integration over a simplex" or "k-fold Laplace convolution". But it turns out that we can get away with something much simpler than this simplex business. When faced with some constraint in an integrand, a physicist's instinct is to express it as a Dirac delta and integrate right over it. For the problem above, this approach works like magic! Without giving too much away, here's the punchline:

Want to know more? Because I've been on a Xournal/youtube binge lately, I narrated this derivation and put it up for the world to see. Here are the "slides" and below is the actual video. Enjoy!

Deriving the volume of an n-dimensional hypersphere in 3 minutes

dnquark — Mon, 20 Feb 2012 18:00:00 +0000

What's bigger, the unit circle or the unit sphere?

This is a trick question, because $L^2$ and $L^3$ are incompatible units, so the fact that the area of one is less than the volume of the other (i.e. $\pi$ < $4\pi/3$ ) doesn't tell you much. So let's ask a different question: if we inscribe an n-sphere inside an n-cube, is a unit circle bigger or smaller, relative to its bounding square, than the unit sphere relative to its bounding cube? And in general, what will be the ratio of their volumes as a function of $n$ ?

Suppose we fix the radius of the n-sphere to be 1. The edge length of the n-cube, then, is 2, and its volume is 2ⁿ. So, that means that the ratio of the volumes is (1/2)ⁿ, times some prefactors. Right?

Right. But the form of the prefactor here is quite fascinating. Here's what the volume of the n-sphere is:

$\frac{\pi^{n/2} r^n}{\Gamma(1+n/2)}.$

See that Gamma function in the denominator? It grows like $(n/2)!$ , meaning that our original estimate of the hypersphere/hypercube volume ratio as (1/2)ⁿ is quite a few orders of magnitude off for even moderate n. As n grows, the volume of a unit n-sphere goes to zero super-exponentially.

This has an important implication when you want to cluster high-dimensional data. Intuitively, any clustering algorithm (e.g. k-means) involves drawing a boundary around a set of points that lie within a certain radius of their mean. But what we just found is that if we draw a sphere of even a relatively large radius around points in n-dimensions, for n larger than about 30, the volume of that region enclosed by the spherical boundary is approximately zero, meaning that it's highly unlikely to have any points! In that sense, everything is far apart in high dimensions. This is known as the curse of dimensionality.

Volumes of n-spheres are useful in other contexts (for instance, in statistical mechanics). In fact, back in my physics days I learned a very cool and easy derivation of the n-sphere volume formula above. I like it so much that I made not one but two videos about it. Below you can see me doing the derivation in 3 minutes flat. There is also the longer version where I do the same steps, but a little more slowly and methodically. Enjoy!

The video cannot be shown at the moment. Please try again later.