Parker Adey | Growing Linearly Independent Sets

The dimension bound theorem is linear algebra says puts a hard limit on the size of linearly independent sets in finite dimensional vector spaces. If $\dim(V) = d$ then any set of $d + 1$ vectors in $V$ must be linearly dependent. We’re interested in the “growth rate” of linearly independent sets in $V$. What do we mean by this?

Let $p$ be a prime and $V$ be the vector space $\mathbb{F}_p^d$. Consider the following random process for “growing” a linearly independent set in $V$. We first randomly and uniformly select a vector $v_1 \in V$. If this vector is the zero vector then the set $\{v_1\}$ is dependent and we stop the process.

If our initial vector is not the zero vector, then we add a new randomly selected vector $v_2$ to the set. If our updated set of vectors $\{v_1, v_2\}$ is linearly dependent then we stop. The process continues like this until it hits the dimension bound, at which point the set must be dependent.

We’re interest in the question: when does the growth process stop? To do so, we calculate the expected value of the stopping time $T$. The tail sum formula gives:

We calculate the individual probabilities. To start, notice that
\[ \operatorname{Pr}(T \geq 1) = 1 \] because the process always goes for at least one step: we select the vector $\{v_1\}$.

To calculate $\operatorname{Pr}(T \geq 2)$ we need the probability that the process got to the second step. If the process gets to the second step then we know that $v_1 \neq 0$. And so, there are $p^d - 1$ choices for $v_1$ because the selection process needs to avoid the zero vector. This gives: \[ \operatorname{Pr}(T \geq 2) = \operatorname{Pr}(v_1 \neq 0) = \frac{p^d - 1}{p^d}. \] Before moving to the general formula, we consider $\operatorname{Pr}(T \geq 3)$. If the process arrives at the third step, two things must have happened: (i) the vector $v_1$ must be non-zero, and (ii) the set $\{v_1, v_2\}$ must be linearly independent. Independence implies means that their span has size $|{span}\{v_1, v_2\}| = p^2$. Therefore, the choice of the third vector must avoid $p^2$ vectors in the span, and must come from the remaining $p^d - 2$ vectors in the vector space. This gives the following formula¹ for $\operatorname{Pr}(T \geq 3)$. \[ \operatorname{Pr}(T \geq 3) = \left(\frac{p^d - p^2}{p^d - 2}\right)\left( \frac{p^d - 1}{p^d} \right) \] In general, we consider what needs to happen for the process to reach stage $k$. When picking $v_{k-1}$ we must avoid the span of $\{v_1, \dots, v_{k-2} \}$. This span has size $p^{k-2}$ because the vectors in $\{v_1, \dots, v_{k-2} \}$ are linearly independent. Also, note that we are selecting $v_{k-1}$ from a total of $p^d - (k-2)$ vectors, because the vectors $\{v_1, \dots, v_{k-2}\}$ have already been selected. If this all works out, then the set $\{v_1, \dots, v_{k-2}, v_{k-1}\}$ will be independent and the process will continue to stage $k$. And so we get the following probability² of the process continuing up to stage $k$. \[ \operatorname{Pr}(T \geq k) = \prod_{i = 1}^{k-1} \frac{p^d - p^{i-1}}{p^d - (i-1)} \]

\[ \mathbb{E}(T) = \sum_{k=1}^{d+1} \prod_{i = 1}^{k-1} \frac{p^d - p^{i-1}}{p^d - (i-1)}. \]

The product has upper bound $k = d + 1$ because the process never continues passed $d + 1$; the dimension bound forces any set of size at least $d+1$ to be dependent, so the process must stop.

To get a sense for these numbers, we do a bit of calculation. The number we care about is $\mathbb{E}(T)$ for $V = \mathbb{F}_3^4$. In the table below, we calculate that that expected value as $\mathbb{E}(T) \approx 4.435849$.

One follow up question which is worth exploring: Suppose that the process stops at $T = d+1$. This means that the vectors $\{ v_1, \dots, v_d \}$ are linearly independent. Therefore, they form a basis of $V$. The next vector $v_{d+1}$ must be a linear combination of them, \[ v_{d+1} = c_1v_1 + \dots + c_dv_d. \] How many of the coefficients $c_i$ do we expect to be non-zero?

If the coefficient vector $(c_1, \dots, c_d)$ is also uniformly distributed over $V$ then we should get: \[ \mathbb{E}(\#\{c_i \neq 0\}) = d\left( 1 - \frac{1}{p} \right). \] In the case $(p,d) = (3,4)$, we get: $\mathbb{E} = 8/3$. This means that there will be “two and two-thirds” non-zero co-efficients. Alternatively, “one and a third” of the coefficients will be zero.

$p$	$d$	$\mathbb{E}(t)$
3	4	4.435849
3	10	10.320266
3	50	50.317846
3	100	100.317846
5	10	10.698282
5	50	50.698266
5	100	100.698266
7	10	10.809090
7	50	50.809090
7	100	100.809090
11	10	10.890840
11	50	50.890840
11	100	100.890840
13	10	10.910221
13	50	50.910221
13	100	100.910221

We might have this wrong. It is weird that the formula for $\operatorname{Pr}(T \geq k)$ only involves stuff at time $T = k$. This might be under-counting because really $\operatorname{Pr}(T \geq k) \geq \operatorname{Pr}(T = k) + \operatorname{Pr}(T = k + 1) + \cdots$ and so on. Optimistically, for the sake of saving this calculation, I think we might want: $\displaystyle \operatorname{Pr}(T \geq k) \geq \prod_{i = 1}^{k-1} \frac{p^d - p^{i-1}}{p^d - (i-1)}$. That is, our formula undercounts by a lots, but that’s okay because we want a big lower bound on $\mathbb{E}(T)$. ↩︎
I got tripped up several times by the indices in this formula. It feels weird that the upper index in the produce is $i = k - 1$, and that for the process to coninute to stage $k$ we need to think so much about avoiding sets of vectors of size $k-2$. Notice, however, that when we plug in the upper bound $i = k - 1$ we get: \[ \frac{p^d - p^{(k-1)-1}}{p^d - ((k-1)-1)} = \frac{p^d - p^{k-2}}{p^d - (k-2)} \] Another thing to keep in mind: If the process gets to stage $k$, then the choice of vector $v_{k-1}$ must have “succeeded” and the set $\{v_1, \dots, v_{k-1}\}$ is independent. The choice of $v_{k-1}$ must avoid various things of size $k-2$, which is why we see this value cropping up so much. ↩︎

Growing Linearly Independent Sets

Tags

Backlinks

Navigation Menu