Menu

Growing Linearly Independent Sets

A rough draft. There might be errors here.

The dimension bound theorem is linear algebra says puts a hard limit on the size of linearly independent sets in finite dimensional vector spaces. If $\dim(V) = d$ then any set of $d + 1$ vectors in $V$ must be linearly dependent.

Let $p$ be a prime and $V$ be the vector space $\mathbb{F}_p^d$. We’re interested in the “growth rate” of linearly independent sets in $V$. What do we mean by this?

Consider the following random process for “growing” a linearly independent set. We first randomly and uniformly select a vector $v_1 \in V$. If this vector is the zero vector then the set $\{v_1\}$ is dependent and we stop the process.

If our initial vector is not the zero vector, then we add a new randomly selected vector $v_2$ to the set. If our updated set of vectors $\{v_1, v_2\}$ is linearly dependent then we stop. The process continues like this until it hits the dimension bound, at which point the set must be dependent.

We’re interest in the question: when does the growth process stop? To do so, we calculate the expected value of the stopping time $T$. The tail sum formula gives:

\[ \mathbb{E}(T) = \sum_{k=1} \operatorname{Pr}(T \geq k). \]

We calculate the individual probabilities. To start, notice that
\[ \operatorname{Pr}(T \geq 1) = 1 \] because the process always goes for at least one step: we select the vector $\{v_1\}$.

To calculate $\operatorname{Pr}(T \geq 2)$ we need the probability that the process got to the second step. If the process gets to the second step then we know that $v_1 \neq 0$. And so, there are $p^d - 1$ choices for $v_1$ because the selection process needs to avoid the zero vector. This gives: \[ \operatorname{Pr}(T \geq 2) = \operatorname{Pr}(v_1 \neq 0) = \frac{p^d - 1}{p^d}. \] Before moving to the general formula, we consider $\operatorname{Pr}(T \geq 3)$. If the process arrives at the third step, two things must have happened: (i) the vector $v_1$ must be non-zero, and (ii) the set $\{v_1, v_2\}$ must be linearly independent. Independence implies means that their span has size $|{span}\{v_1, v_2\}| = p^2$. Thee fore, the choice of the third vector must avoid $p^2$ vectors in the span, and must come from the remaining $p^d - 2$ vectors in the vector space. This gives the following formula1 for $\operatorname{Pr}(T \geq 3)$. \[ \operatorname{Pr}(T \geq 3) = \left(\frac{p^d - p^2}{p^d - 2}\right)\left( \frac{p^d - 1}{p^d} \right) \] In general, we consider what needs to happen for the process to reach stage $k$. When picking $v_{k-1}$ we must avoid the span of $\{v_1, \dots, v_{k-2} \}$. This span has size $p^{k-2}$ because the vectors in $\{v_1, \dots, v_{k-2} \}$ are linearly independent. Also, note that we are selecting $v_{k-1}$ from a total of $p^d - (k-2)$ vectors, because the vectors $\{v_1, \dots, v_{k-2}\}$ have already been selected. If this all works out, then the set $\{v_1, \dots, v_{k-2}, v_{k-1}\}$ will be independent and the process will continue to stage $k$. And so we get the following probability2 of the process continuing up to stage $k$. \[ \operatorname{Pr}(T \geq k) = \prod_{i = 1}^{k-1} \frac{p^d - p^{i-1}}{p^d - (i-1)} \]

Applying the tail sum formula to this gives:

\[ \mathbb{E}(T) = \sum_{k=1}^{d+1} \prod_{i = 1}^{k-1} \frac{p^d - p^{i-1}}{p^d - (i-1)}. \]

The product has upper bound $k = d + 1$ because the process never continues passed $d + 1$; the dimension bound forces any set of size at least $d+1$ to be dependent, so the process must stop.

To get a sense for these numbers, we do a bit of calculation. The number we care about is $\mathbb{E}(T)$ for $V = \mathbb{F}_3^4$. In the table below, we calculate that that expected value as $\mathbb{E}(T) \approx 4.435849$.

$p$ $d$ $\mathbb{E}(t)$
3 4 4.435849
3 10 10.320266
3 50 50.317846
3 100 100.317846
5 10 10.698282
5 50 50.698266
5 100 100.698266
7 10 10.809090
7 50 50.809090
7 100 100.809090
11 10 10.890840
11 50 50.890840
11 100 100.890840
13 10 10.910221
13 50 50.910221
13 100 100.910221

There are a couple other thought provoking things here.


One follow up question which is worth exploring: Suppose that the process stops at $T = d+1$. This means that the vectors $\{ v_1, \dots, v_d \}$ are linearly independent. Therefore, they form a basis of $V$. The next vector $v_{d+1}$ must be a linear combination of them, \[ v_{d+1} = c_1v_1 + \dots + c_dv_d. \] How many of the coefficients $c_i$ do we expect to be non-zero?

If the coefficient vector $(c_1, \dots, c_d)$ is also uniformly distributed over $V$ then we should get: \[ \mathbb{E}(\#\{c_i \neq 0\}) = d\left( 1 - \frac{1}{p} \right). \] In the case $(p,d) = (3,4)$, we get: $\mathbb{E} = 8/3$. This means that there will be “two and two-thirds” non-zero co-efficients. Alternatively, “one and a third” of the coefficients will be zero.


  1. We might have this wrong. It is weird that the formula for $\operatorname{Pr}(T \geq k)$ only involves stuff at time $T = k$. This might be under-counting because really $\operatorname{Pr}(T \geq k) \geq \operatorname{Pr}(T = k) + \operatorname{Pr}(T = k + 1) + \cdots$ and so on. Optimistically, for the sake of saving this calculation, I think we might want: $\displaystyle \operatorname{Pr}(T \geq k) \geq \prod_{i = 1}^{k-1} \frac{p^d - p^{i-1}}{p^d - (i-1)}$. That is, our formula undercounts by a lots, but that’s okay because we want a big lower bound on $\mathbb{E}(T)$. ↩︎

  2. I got tripped up several times by the indices in this formula. It feels weird that the upper index in the produce is $i = k - 1$, and that for the process to coninute to stage $k$ we need to think so much about avoiding sets of vectors of size $k-2$. Notice, however, that when we plug in the upper bound $i = k - 1$ we get: \[ \frac{p^d - p^{(k-1)-1}}{p^d - ((k-1)-1)} = \frac{p^d - p^{k-2}}{p^d - (k-2)} \] Another thing to keep in mind: If the process gets to stage $k$, then the choice of vector $v_{k-1}$ must have “succeeded” and the set $\{v_1, \dots, v_{k-1}\}$ is independent. The choice of $v_{k-1}$ must avoid various things of size $k-2$, which is why we see this value cropping up so much. ↩︎


Published: Feb 10, 2026 @ 20:50.
Last Modified: Feb 20, 2026 @ 14:37.

Tags

#linear algebra

Backlinks

Navigation Menu

Home / Now / Blog / Notes / Reading / Office Camera / Tags / Bookmarks / RSS Feeds / Top of Page

Thanks for reading! If you have any comments or questions about the content, please let me know. Anyone can contact me by email.