Monday 21 February 2011

Tensor products of vector spaces, part 1

A little while back on this blog, we defined the free k-vector space over a type b:

newtype Vect k b = V [(b,k)] deriving (Eq,Ord)
Elements of Vect k b are k-linear combinations of elements of b.

Whenever we have a mathematical structure like this, we want to know about building blocks and new-from-old constructions.

We already looked at one new-from-old construction: given free k-vector spaces A = Vect k a and B = Vect k b, we can construct their direct sum A⊕B = Vect k (Either a b).

We saw that the direct sum is both the product and the coproduct in the category of free vector spaces - which means that it is the object which satisfies the universal properties implied by the following two diagrams:


So we have injections i1, i2 : Vect k a, Vect k b -> Vect k (Either a b), to put elements of A and B into the direct sum A⊕B, and projections p1, p2 : Vect k (Either a b) -> Vect k a, Vect k b to take them back out again.


However, there is another obvious new-from-old construction: Vect k (a,b). What does this represent?

In order to answer that question, we need to look at bilinear functions. The basic idea of a bilinear function is that it is a function of two arguments, which is linear in each argument. So we might start by looking at functions f :: Vect k a -> Vect k b -> Vect k t.

However, functions of two arguments don't really sit very well in category theory, where arrows are meant to have a single source. (We can handle functions of two arguments in multicategories, but I don't want to go there just yet.) In order to stay within category theory, we need to combine the two arguments into a single argument, using the direct sum construction. So instead of looking at functions f :: Vect k a -> Vect k b -> Vect k t, we will look at functions f :: Vect k (Either a b) -> Vect k t.

To see that they are equivalent, recall from last time that Vect k (Either a b) is isomorphic to (Vect k a, Vect k b), via the isomorphisms:

to :: (Vect k a, Vect k b) -> Vect k (Either a b)
to = \(u,v) -> i1 u <+> i2 v
from :: Vect k (Either a b) -> (Vect k a, Vect k b)
from = \uv -> (p1 uv, p2 uv)

So in going from f :: Vect k a -> Vect k b -> Vect k t to f :: Vect k (Either a b) -> Vect k t, we're really just uncurrying.

Ok, so suppose we are given f :: Vect k (Either a b) -> Vect k t. It helps to still think of this as a function of two arguments, even though we've wrapped them up together in either side of a direct sum. Then we say that f is bilinear, if it is linear in each side of the direct sum. That is:
- for any fixed a0 in A, the function f_a0 :: Vect k b -> Vect k t, f_a0 = \b -> f (i1 a0 <+> i2 b) is linear
- for any fixed b0 in B, the function f_b0 :: Vect k a -> Vect k t, f_b0 = \a -> f (i1 a <+> i2 b0) is linear


Here's a QuickCheck property to test whether a function is bilinear:


prop_Bilinear :: (Num k, Ord a, Ord b, Ord t) =>
     (Vect k (Either a b) -> Vect k t) -> (k, Vect k a, Vect k a, Vect k b, Vect k b) -> Bool
prop_Bilinear f (k,a1,a2,b1,b2) =
    prop_Linear (\b -> f (i1 a1 <+> i1 b)) (k,b1,b2) &&
    prop_Linear (\a -> f (i1 a <+> i1 b1)) (k,a1,a2)

prop_BilinearQn f (a,u1,u2,v1,v2) = prop_Bilinear f (a,u1,u2,v1,v2)
    where types = (a,u1,u2,v1,v2) :: (Q, Vect Q EBasis, Vect Q EBasis, Vect Q EBasis, Vect Q EBasis)

What are some examples of bilinear functions?

Well, perhaps the most straightforward is the dot product of vectors. If our vector spaces A and B are the same, then we can define the dot product:


dot0 uv = sum [ if a == b then x*y else 0 | (a,x) <- u, (b,y) <- v]
    where V u = p1 uv
          V v = p2 uv

However, as it stands, this won't pass our QuickCheck property - because it has the wrong type! This has the type dot0 :: Vect k (Either a b) -> k, whereas we need something of type Vect k (Either a b) -> Vect k t.

Now, it is of course true that k is a k-vector space. However, as it stands, it's not a free k-vector space over some basis type t. Luckily, this is only a technicality, which is easily fixed. When we want to consider k as itself a (free) vector space, we will take t = (), the unit type, and equate k with Vect k (). Since the type () has only a single inhabitant, the value (), then Vect k () consists of scalar multiples of () - so it is basically just a single copy of k itself. The isomorphism between k and Vect k () is \k -> k *> return ().

Okay, so now that we know how to represent k as a free k-vector space, we can define dot product again:


dot1 uv = nf $ V [( (), if a == b then x*y else 0) | (a,x) <- u, (b,y) <- v]
    where V u = p1 uv
          V v = p2 uv

This now has the type dot1 :: Vect k (Either a b) -> Vect k (). Here's how you use it:

> dot1 ( i1 (e1 <+> 2 *> e2) <+> i2 (3 *> e1 <+> e2) )
5()

(So thinking of our function as a function of two arguments, what we do is use i1 to inject the first argument into the left hand side of the direct sum, and i2 to inject the second argument into the right hand side.)

So we can now use the QuickCheck property:

> quickCheck (prop_BilinearQn dot1)
+++ OK, passed 100 tests.


Another example of a bilinear function is polynomial multiplication. Polynomials of course form a vector space, with basis {x^i | i <- [0..] }. So we could define a type to represent the monomials x^i, and then form the polynomials as the free vector space in the monomials. In a few weeks we will do that, but for the moment, to save time, let's just use our existing EBasis type, and take E i to represent x^i. Then polynomial multiplication is the following function:


polymult1 uv = nf $ V [(E (i+j) , x*y) | (E i,x) <- u, (E j,y) <- v]
    where V u = p1 uv
          V v = p2 uv

Let's just convince ourselves that this is polynomial multiplication:

> polymult1 (i1 (e 0 <+> e 1) <+> i2 (e 0 <+> e 1))
e0+2e1+e2

So this is just our way of saying that (1+x)*(1+x) = 1+2x+x^2.

Again, let's verify that this is bilinear:

> quickCheck (prop_BilinearQn polymult1)
+++ OK, passed 100 tests.


So what's all this got to do with Vect k (a,b)? Well, here's another bilinear function:


tensor :: (Num k, Ord a, Ord b) => Vect k (Either a b) -> Vect k (a, b)
tensor uv = nf $ V [( (a,b), x*y) | (a,x) <- u, (b,y) <- v]
    where V u = p1 uv; V v = p2 uv

> quickCheck (prop_BilinearQn tensor)
+++ OK, passed 100 tests.

So this "tensor" function takes each pair of basis elements a, b in the input to a basis element (a,b) in the output. The thing that is interesting about this bilinear function is that it is in some sense "the mother of all bilinear functions". Specifically, you can specify a bilinear function completely by specifying what happens to each pair (a,b) of basis elements. It follows that any bilinear function f :: Vect k (Either a b) -> Vect k t can be factored as f = f' . tensor, where f' :: Vect k (a,b) -> Vect k t is the linear function having the required action on the basis elements (a,b) of Vect k (a,b).


For example:


bilinear :: (Num k, Ord a, Ord b, Ord c) =>
    ((a, b) -> Vect k c) -> Vect k (Either a b) -> Vect k c
bilinear f = linear f . tensor

dot = bilinear (\(a,b) -> if a == b then return () else zero)

polymult = bilinear (\(E i, E j) -> return (E (i+j)))


We can check that these are indeed the same functions as we were looking at before:

> quickCheck (\x -> dot1 x == dot x)
+++ OK, passed 100 tests.
> quickCheck (\x -> polymult1 x == polymult x)
+++ OK, passed 100 tests.

So Vect k (a,b) has a special role in the theory of bilinear functions. If A = Vect k a, B = Vect k b, then we write A⊗B = Vect k (a,b) (pronounced "A tensor B").



[By the way, it's possible that this diagram might upset category theorists - because the arrows in the diagram are not all arrows in the category of vector spaces. Specifically, note that bilinear maps are not, in general, linear. We'll come back to this in a moment.]

So a bilinear map can be specified by its action on the tensor basis (a,b). This corresponds to writing out matrices. To specify any bilinear map Vect k (Either a b) -> Vect k t, you write out a matrix with rows indexed by a, columns indexed by b, and entries in Vect k t.


      b1  b2 ...
 a1 (t11 t12 ...)
 a2 (t21 t22 ...)
... (...        )

So this says that (ai,bj) is taken to tij. Then given an element of A⊕B = Vect k (Either a b), which we can think of as a vector (x1 a1 + x2 a2 + ...) in A = Vect k a together with a vector (y1 b1 + y2 b2 + ...) in B = Vect k b, then we can calculate its image under the bilinear map by doing matrix multiplication as follows:


 a1 a2 ...        b1  b2 ...
(x1 x2 ...)  a1 (t11 t12 ...)  b1 (y1)
             a2 (t21 t22 ...)  b2 (y2)
            ... (...        ) ... (...)

(Sorry, this diagram might be a bit confusing. The ai, bj are labeling the rows and columns. The xi are the entries in a row vector in A, the yj are the entries in a column vector in B, and the tij are the entries in the matrix.)

So xi ai <+> yj bj goes to xi yj tij.

For example, dot product corresponds to the matrix:


(1 0 0)
(0 1 0)
(0 0 1)

Polynomial multiplication corresponds to the matrix:


    e0 e1 e2 ...
e0 (e0 e1 e2 ...)
e1 (e1 e2 e3 ...)
e2 (e2 e3 e4 ...)
...

A matrix with entries in T = Vect k t is just a convenient way of specifying a linear map from A⊗B = Vect k (a,b) to T.

Indeed, any matrix, provided that all the entries are in the same T, defines a bilinear function. So bilinear functions are ten-a-penny.


Now, I stated above that bilinear functions are not in general linear. For example:

> quickCheck (prop_Linear polymult)
*** Failed! Falsifiable (after 2 tests and 2 shrinks):
(0,Right e1,Left e1)

What went wrong? Well:

> polymult (Right e1)
0
> polymult (Left e1)
0
> polymult (Left e1 <+> Right e1)
e2

So we fail to have f (a <+> b) = f a <+> f b, which is one of the requirements of a linear function.


Conversely, it's also important to realise that linear functions (on Vect k (Either a b)) are not in general bilinear. For example:

> quickCheck (prop_BilinearQn id)
*** Failed! Falsifiable (after 2 tests):
(1,0,0,e1,0)

The problem here is:

> id $ i1 (zero <+> zero) <+> i2 e1
Right e1
> id $ (i1 zero <+> i2 e1) <+> (i1 zero <+> i2 e1)
2Right e1

So we fail to have linearity in the left hand side (or the right for that matter).

Indeed we can kind of see that linearity and bilinearity are in conflict.
- Linearity requires that f (a1 <+> a2 <+> b) = f a1 <+> f a2 <+> f b
- Bilinearity requires that f (a1 <+> a2 <+> b) = f (a1 <+> b) <+> f (a2 <+> b)

Exercise: Find a function which is both linear and bilinear.

No comments:

Post a Comment

Followers