The Involution.

a web-log about computers and math and hacking.

by Youwen Wu | |

An assortment of preliminaries on linear algebra

and also a test for pandoc

2025-02-15

This entire document was written entirely in Typst and directly translated to this file by Pandoc. It serves as a proof of concept of a way to do static site generation from Typst files instead of Markdown.


I figured I should write this stuff down before I forgot it.

Basic Notions

Vector spaces

Before we can understand vectors, we need to first discuss vector spaces. Thus far, you have likely encountered vectors primarily in physics classes, generally in the two-dimensional plane. You may conceptualize them as arrows in space. For vectors of size >3> 3, a hand waving argument is made that they are essentially just arrows in higher dimensional spaces.

It is helpful to take a step back from this primitive geometric understanding of the vector. Let us build up a rigorous idea of vectors from first principles.

Vector axioms

The so-called axioms of a vector space (which we’ll call the vector space VV) are as follows:

  1. Commutativity: u+v=v+u, u,vVu + v = v + u,\text{ }\forall u,v \in V

  2. Associativity: (u+v)+w=u+(v+w), u,v,wV(u + v) + w = u + (v + w),\text{ }\forall u,v,w \in V

  3. Zero vector: \exists a special vector, denoted 00, such that v+0=v, vVv + 0 = v,\text{ }\forall v \in V

  4. Additive inverse: vV, wV such that v+w=0\forall v \in V,\text{ }\exists w \in V\text{ such that }v + w = 0. Such an additive inverse is generally denoted v- v

  5. Multiplicative identity: 1v=v, vV1v = v,\text{ }\forall v \in V

  6. Multiplicative associativity: (αβ)v=α(βv) vV, scalars α,β(\alpha\beta)v = \alpha(\beta v)\text{ }\forall v \in V,\text{ scalars }\alpha,\beta

  7. Distributive property for vectors: α(u+v)=αu+αv u,vV, scalars α\alpha(u + v) = \alpha u + \alpha v\text{ }\forall u,v \in V,\text{ scalars }\alpha

  8. Distributive property for scalars: (α+β)v=αv+βv vV, scalars α,β(\alpha + \beta)v = \alpha v + \beta v\text{ }\forall v \in V,\text{ scalars }\alpha,\beta

It is easy to show that the zero vector 00 and the additive inverse v- v are unique. We leave the proof of this fact as an exercise.

These may seem difficult to memorize, but they are essentially the same familiar algebraic properties of numbers you know from high school. The important thing to remember is which operations are valid for what objects. For example, you cannot add a vector and scalar, as it does not make sense.

Remark. For those of you versed in computer science, you may recognize this as essentially saying that you must ensure your operations are type-safe. Adding a vector and scalar is not just “wrong” in the same sense that 1+1=31 + 1 = 3 is wrong, it is an invalid question entirely because vectors and scalars and different types of mathematical objects. See [@chen2024digression] for more.

Vectors big and small

In order to begin your descent into what mathematicians colloquially recognize as abstract vapid nonsense, let’s discuss which fields constitute a vector space. We have the familiar field of \mathbb{R} where all scalars are real numbers, with corresponding vector spaces n{\mathbb{R}}^{n}, where nn is the length of the vector. We generally discuss 2D or 3D vectors, corresponding to vectors of length 2 or 3; in our case, 2{\mathbb{R}}^{2} and 3{\mathbb{R}}^{3}.

However, vectors in n{\mathbb{R}}^{n} can really be of any length. Vectors can be viewed as arbitrary length lists of numbers (for the computer science folk: think C++ std::vector).

Example. (123456789)9\begin{pmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7 \\ 8 \\ 9 \end{pmatrix} \in {\mathbb{R}}^{9}

Keep in mind that vectors need not be in n{\mathbb{R}}^{n} at all. Recall that a vector space need only satisfy the aforementioned axioms of a vector space.

Example. The vector space n{\mathbb{C}}^{n} is similar to n{\mathbb{R}}^{n}, except it includes complex numbers. All complex vector spaces are real vector spaces (as you can simply restrict them to only use the real numbers), but not the other way around.

From now on, let us refer to vector spaces n{\mathbb{R}}^{n} and n{\mathbb{C}}^{n} as 𝔽n{\mathbb{F}}^{n}.

In general, we can have a vector space where the scalars are in an arbitrary field, as long as the axioms are satisfied.

Example. The vector space of all polynomials of at most degree 3, or 3{\mathbb{P}}^{3}. It is not yet clear what this vector may look like. We shall return to this example once we discuss basis.

Vector addition. Multiplication

Vector addition, represented by ++ can be done entrywise.

Example.

(123)+(456)=(1+42+53+6)=(579)\begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix} + \begin{pmatrix} 4 \\ 5 \\ 6 \end{pmatrix} = \begin{pmatrix} 1 + 4 \\ 2 + 5 \\ 3 + 6 \end{pmatrix} = \begin{pmatrix} 5 \\ 7 \\ 9 \end{pmatrix} (123)(456)=(142536)=(41018)\begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix} \cdot \begin{pmatrix} 4 \\ 5 \\ 6 \end{pmatrix} = \begin{pmatrix} 1 \cdot 4 \\ 2 \cdot 5 \\ 3 \cdot 6 \end{pmatrix} = \begin{pmatrix} 4 \\ 10 \\ 18 \end{pmatrix}

This is simple enough to understand. Again, the difficulty is simply ensuring that you always perform operations with the correct types. For example, once we introduce matrices, it doesn’t make sense to multiply or add vectors and matrices in this fashion.

Vector-scalar multiplication

Multiplying a vector by a scalar simply results in each entry of the vector being multiplied by the scalar.

Example.

β(abc)=(βaβbβc)\beta\begin{pmatrix} a \\ b \\ c \end{pmatrix} = \begin{pmatrix} \beta \cdot a \\ \beta \cdot b \\ \beta \cdot c \end{pmatrix}

Linear combinations

Given vector spaces VV and WW and vectors vVv \in V and wWw \in W, v+wv + w is the linear combination of vv and ww.

Spanning systems

We say that a set of vectors v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V span VV if the linear combination of the vectors can represent any arbitrary vector vVv \in V.

Precisely, given scalars α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n},

α1v1+α2v2++αnvn=v,vV\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{n}v_{n} = v,\forall v \in V

Note that any scalar αk\alpha_{k} could be 0. Therefore, it is possible for a subset of a spanning system to also be a spanning system. The proof of this fact is left as an exercise.

Intuition for linear independence and dependence

We say that vv and ww are linearly independent if vv cannot be represented by the scaling of ww, and ww cannot be represented by the scaling of vv. Otherwise, they are linearly dependent.

You may intuitively visualize linear dependence in the 2D plane as two vectors both pointing in the same direction. Clearly, scaling one vector will allow us to reach the other vector. Linear independence is therefore two vectors pointing in different directions.

Of course, this definition applies to vectors in any 𝔽n{\mathbb{F}}^{n}.

Formal definition of linear dependence and independence

Let us formally define linear independence for arbitrary vectors in 𝔽n{\mathbb{F}}^{n}. Given a set of vectors

v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V

we say they are linearly independent iff. the equation

α1v1+α2v2++αnvn=0\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{n}v_{n} = 0

has only a unique set of solutions α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n} such that all αn\alpha_{n} are zero.

Equivalently,

|α1|+|α2|++|αn|=0\left| \alpha_{1} \right| + \left| \alpha_{2} \right| + \ldots + \left| \alpha_{n} \right| = 0

More precisely,

i=1k|αi|=0\sum_{i = 1}^{k}\left| \alpha_{i} \right| = 0

Therefore, a set of vectors v1,v2,,vmv_{1},v_{2},\ldots,v_{m} is linearly dependent if the opposite is true, that is there exists solution α1,α2,,αm\alpha_{1},\alpha_{2},\ldots,\alpha_{m} to the equation

α1v1+α2v2++αmvm=0\alpha_{1}v_{1} + \alpha_{2}v_{2} + \ldots + \alpha_{m}v_{m} = 0

such that

i=1k|αi|0\sum_{i = 1}^{k}\left| \alpha_{i} \right| \neq 0

Basis

We say a system of vectors v1,v2,,vnVv_{1},v_{2},\ldots,v_{n} \in V is a basis in VV if the system is both linearly independent and spanning. That is, the system must be able to represent any vector in VV as well as satisfy our requirements for linear independence.

Equivalently, we may say that a system of vectors in VV is a basis in VV if any vector vVv \in V admits a unique representation as a linear combination of vectors in the system. This is equivalent to our previous statement, that the system must be spanning and linearly independent.

Standard basis

We may define a standard basis for a vector space. By convention, the standard basis in 2{\mathbb{R}}^{2} is

(10)(01)\begin{pmatrix} 1 \\ 0 \end{pmatrix}\begin{pmatrix} 0 \\ 1 \end{pmatrix}

Verify that the above is in fact a basis (that is, linearly independent and generating).

Recalling the definition of the basis, we can represent any vector in 2{\mathbb{R}}^{2} as the linear combination of the standard basis.

Therefore, for any arbitrary vector v2v \in {\mathbb{R}}^{2}, we can represent it as

v=α1(10)+α2(01)v = \alpha_{1}\begin{pmatrix} 1 \\ 0 \end{pmatrix} + \alpha_{2}\begin{pmatrix} 0 \\ 1 \end{pmatrix}

Let us call α1\alpha_{1} and α2\alpha_{2} the coordinates of the vector. Then, we can write vv as

v=(α1α2)v = \begin{pmatrix} \alpha_{1} \\ \alpha_{2} \end{pmatrix}

For example, the vector

(12)\begin{pmatrix} 1 \\ 2 \end{pmatrix}

represents

1(10)+2(01)1 \cdot \begin{pmatrix} 1 \\ 0 \end{pmatrix} + 2 \cdot \begin{pmatrix} 0 \\ 1 \end{pmatrix}

Verify that this aligns with your previous intuition of vectors.

You may recognize the standard basis in 2{\mathbb{R}}^{2} as the familiar unit vectors

î,ĵ\hat{i},\hat{j}

This aligns with the fact that

(αβ)=αî+βĵ\begin{pmatrix} \alpha \\ \beta \end{pmatrix} = \alpha\hat{i} + \beta\hat{j}

However, we may define a standard basis in any arbitrary vector space. So, let

e1,e2,,ene_{1},e_{2},\ldots,e_{n}

be a standard basis in 𝔽n{\mathbb{F}}^{n}. Then, the coordinates α1,α2,,αn\alpha_{1},\alpha_{2},\ldots,\alpha_{n} of a vector v𝔽nv \in {\mathbb{F}}^{n} represent the following

(α1α2αn)=α1e1+α2+e2+αnen\begin{pmatrix} \alpha_{1} \\ \alpha_{2} \\ \vdots \\ \alpha_{n} \end{pmatrix} = \alpha_{1}e_{1} + \alpha_{2} + e_{2} + \alpha_{n}e_{n}

Using our new notation, the standard basis in 2{\mathbb{R}}^{2} is

e1=(10),e2=(01)e_{1} = \begin{pmatrix} 1 \\ 0 \end{pmatrix},e_{2} = \begin{pmatrix} 0 \\ 1 \end{pmatrix}

Matrices

Before discussing any properties of matrices, let’s simply reiterate what we learned in class about their notation. We say a matrix with rows of length mm, and columns of size nn (in less precise terms, a matrix with length mm and height nn) is a m×nm \times n matrix.

Given a matrix

A=(123456789)A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{pmatrix}

we refer to the entry in row jj and column kk as Aj,kA_{j,k} .

Matrix transpose

A formalism that is useful later on is called the transpose, and we obtain it from a matrix AA by switching all the rows and columns. More precisely, each row becomes a column instead. We use the notation ATA^{T} to represent the transpose of AA.

(123456)T=(142536)\begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{pmatrix}^{T} = \begin{pmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{pmatrix}

Formally, we can say (AT)j,k=Ak,j\left( A^{T} \right)_{j,k} = A_{k,j}

Linear transformations

A linear transformation T:VWT:V \rightarrow W is a mapping between two vector spaces VWV \rightarrow W, such that the following axioms are satisfied:

  1. T(v+w)=T(v)+T(w),vV,wWT(v + w) = T(v) + T(w),\forall v \in V,\forall w \in W

  2. T(αv)+T(βw)=αT(v)+βT(w),vV,wWT(\alpha v) + T(\beta w) = \alpha T(v) + \beta T(w),\forall v \in V,\forall w \in W, for all scalars α,β\alpha,\beta

Definition. TT is a linear transformation iff.

T(αv+βw)=αT(v)+βT(w)T(\alpha v + \beta w) = \alpha T(v) + \beta T(w)

Abuse of notation. From now on, we may elide the parentheses and say that T(v)=Tv,vVT(v) = Tv,\forall v \in V

Remark. A phrase that you may commonly hear is that linear transformations preserve linearity. Essentially, straight lines remain straight, parallel lines remain parallel, and the origin remains fixed at 0. Take a moment to think about why this is true (at least, in lower dimensional spaces you can visualize).

Examples.

  1. Rotation for V=W=2V = W = {\mathbb{R}}^{2} (i.e. rotation in 2 dimensions). Given v,w2v,w \in {\mathbb{R}}^{2}, and their linear combination v+wv + w, a rotation of γ\gamma radians of v+wv + w is equivalent to first rotating vv and ww individually by γ\gamma and then taking their linear combination.

  2. Differentiation of polynomials. In this case V=nV = {\mathbb{P}}^{n} and W=n1W = {\mathbb{P}}^{n - 1}, where n{\mathbb{P}}^{n} is the field of all polynomials of degree at most nn.

    ddx(αv+βw)=αddxv+βddxw,vV,wW, scalars α,β\frac{d}{dx}(\alpha v + \beta w) = \alpha\frac{d}{dx}v + \beta\frac{d}{dx}w,\forall v \in V,w \in W,\forall\text{ scalars }\alpha,\beta

Matrices represent linear transformations

Suppose we wanted to represent a linear transformation T:𝔽n𝔽mT:{\mathbb{F}}^{n} \rightarrow {\mathbb{F}}^{m}. I propose that we need encode how TT acts on the standard basis of 𝔽n{\mathbb{F}}^{n}.

Using our intuition from lower dimensional vector spaces, we know that the standard basis in 2{\mathbb{R}}^{2} is the unit vectors î\hat{i} and ĵ\hat{j}. Because linear transformations preserve linearity (i.e. all straight lines remain straight and parallel lines remain parallel), we can encode any transformation as simply changing î\hat{i} and ĵ\hat{j}. And indeed, if any vector v2v \in {\mathbb{R}}^{2} can be represented as the linear combination of î\hat{i} and ĵ\hat{j} (this is the definition of a basis), it makes sense both symbolically and geometrically that we can represent all linear transformations as the transformations of the basis vectors.

Example. To reflect all vectors v2v \in {\mathbb{R}}^{2} across the yy-axis, we can simply change the standard basis to

(10)(01)\begin{pmatrix} - 1 \\ 0 \end{pmatrix}\begin{pmatrix} 0 \\ 1 \end{pmatrix}

Then, any vector in 2{\mathbb{R}}^{2} using this new basis will be reflected across the yy-axis. Take a moment to justify this geometrically.

Writing a linear transformation as matrix

For any linear transformation T:𝔽m𝔽nT:{\mathbb{F}}^{m} \rightarrow {\mathbb{F}}^{n}, we can write it as an n×mn \times m matrix AA. That is, there is a matrix AA with nn rows and mm columns that can represent any linear transformation from 𝔽m𝔽n{\mathbb{F}}^{m} \rightarrow {\mathbb{F}}^{n}.

How should we write this matrix? Naturally, from our previous discussion, we should write a matrix with each column being one of our new transformed basis vectors.

Example. Our yy-axis reflection transformation from earlier. We write the bases in a matrix

(1001)\begin{pmatrix} - 1 & 0 \\ 0 & 1 \end{pmatrix}

Matrix-vector multiplication

Perhaps you now see why the so-called matrix-vector multiplication is defined the way it is. Recalling our definition of a basis, given a basis in VV, any vector vVv \in V can be written as the linear combination of the vectors in the basis. Then, given a linear transformation represented by the matrix containing the new basis, we simply write the linear combination with the new basis instead.

Example. Let us first write a vector in the standard basis in 2{\mathbb{R}}^{2} and then show how our matrix-vector multiplication naturally corresponds to the definition of the linear transformation.

(12)2\begin{pmatrix} 1 \\ 2 \end{pmatrix} \in {\mathbb{R}}^{2}

is the same as

1(10)+2(01)1 \cdot \begin{pmatrix} 1 \\ 0 \end{pmatrix} + 2 \cdot \begin{pmatrix} 0 \\ 1 \end{pmatrix}

Then, to perform our reflection, we need only replace the basis vector (10)\begin{pmatrix} 1 \\ 0 \end{pmatrix} with (10)\begin{pmatrix} - 1 \\ 0 \end{pmatrix}.

Then, the reflected vector is given by

1(10)+2(01)=(12)1 \cdot \begin{pmatrix} - 1 \\ 0 \end{pmatrix} + 2 \cdot \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} - 1 \\ 2 \end{pmatrix}

We can clearly see that this is exactly how the matrix multiplication

(1001)(12)\begin{pmatrix} - 1 & 0 \\ 0 & 1 \end{pmatrix} \cdot \begin{pmatrix} 1 \\ 2 \end{pmatrix} is defined! The column-by-coordinate rule for matrix-vector multiplication says that we multiply the nthn^{\text{th}} entry of the vector by the corresponding nthn^{\text{th}} column of the matrix and sum them all up (take their linear combination). This algorithm intuitively follows from our definition of matrices.

Matrix-matrix multiplication

As you may have noticed, a very similar natural definition arises for the matrix-matrix multiplication. Multiplying two matrices ABA \cdot B is essentially just taking each column of BB, and applying the linear transformation defined by the matrix AA!