Sparse Tensor Algebra via Provenance Semirings

Abstract

The connection between linear algebra and relational algebra is not new. In 1997, Paul Stodghill wrote a 400 page PhD thesis on generating sparse matrix codes using the relational perspective. More recently, the SPORES paper shows how to (super)optimize sparse linear algebra operations by passing to relational algebra and applying equality saturation. This year at PLDI 2023, a paper by Scott Kovach et al. on "Indexed Streams" generalizes the connection further. In this article, we'll explain the connection between tensor algebra and relational algebra, and show how tensor algebra computations are captured by the concept of provenance semirings.

Abstract
Background
Provenance
- Provenance Semirings
Tensor Computations using Provenance
- Examples
Conclusion
- Related Work

Background

We briefly review some background material on tensor algebra and relational algebra.

Tensor algebra

For the purposes of this article, a tensor is simply a multidimensional array of values in some underlying domain $D$. The rank of a tensor is the number of dimensions it has, e.g. a rank-2 tensor is a matrix. A rank-$n$ tensor with dimensions $d_1, \ldots, d_n$ can be indexed via indices $j_i \in \{1, \ldots, d_i\}$ for $i \in \{1, \ldots, n\}$, which we denote by $A[j_1, \ldots, j_n] \in D$.

We are interested in computations that can be expressed index-wise in terms of sums and products. If you are familiar with numpy's einsum function operating in "explicit" mode, then this is almost the same thing.¹ As an example, if $A$ and $B$ are matrices (with appropriate dimensions), then the matrix product $C = AB$ can be written as \[ C_{ij} = A_{ik} B_{kj}, \] where the reduction over $k$ is implied by its absence on the left-hand side. Unlike einsum, we also allow elementwise addition, e.g. $C_{ij} = A_{ij} + B_{ij}$.

Examples:

Batched feed-forward layer with bias: $Y_{bh} = X_{bf} W_{fh} + \beta_{h}$. Note that $\beta$ is a vector indexed by $h$, so we broadcast it over $b$.
Batched matrix multiplication: $C_{bij} = A_{bik} B_{bkj}$.
The trace of a matrix: $t = A_{ii}$.
etc.

Relational algebra

Relational algebra is a formalism for reasoning about (relational) databases and queries (such as SQL). We'll give a somewhat informal overview here.

At a high level, a table (or relation) is composed of a schema together with a set of records (or rows).² A schema is just a set of identifiers called attributes, together with a set of allowable values for each attribute. A record is then a mapping from attributes to values, or equivalently a tuple of values indexed by their corresponding attributes. Graphically, each attribute in a table corresponds to a column, and each record corresponds to a row. The cardinality of a table is the number of records it contains.

For example, a table of students might have the following schema: \[ \textsf{Student}(\textsf{name}: \textsf{string}, \textsf{age}: \textsf{int}, \textsf{gpa}: \textsf{float}). \] A record in this table might be $(\textsf{name}, \textsf{age}, \textsf{gpa}) \mapsto (\text{Alice}, 20, 4.0)$. Graphically (with some extra rows thrown in): \[ \textsf{Student} = \begin{array}{| c | c | c |} \hline \textsf{name} & \textsf{age} & \textsf{gpa}\\ \hline \text{Alice} & 20 & 4.0\\ \text{Bob} & 21 & 3.5\\ \text{Charlie} & 20 & 3.8\\ \hline \end{array}. \]

Queries

Now that we've defined tables, we can consider queries over them. Broadly speaking, a query takes in some input tables and produces an output table. For example, if we have the $\textsf{Student}$ table above, as well as a table of majors for each student, we might want to produce a combined table where each row has the student's name, age, GPA, and major: \[ \begin{array}{| c | c | c |} \hline \textsf{name} & \textsf{age} & \textsf{gpa}\\ \hline \text{Alice} & 20 & 4.0\\ \text{Bob} & 21 & 3.5\\ \text{Charlie} & 20 & 3.8\\ \hline \end{array} ~??~ \begin{array}{| c | c |} \hline \textsf{name} & \textsf{major}\\ \hline \text{Alice} & \text{CS}\\ \text{Bob} & \text{Math}\\ \text{Charlie} & \text{English}\\ \hline \end{array} = \begin{array}{| c | c | c | c |} \hline \textsf{name} & \textsf{age} & \textsf{gpa} & \textsf{major}\\ \hline \text{Alice} & 20 & 4.0 & \text{CS}\\ \text{Bob} & 21 & 3.5 & \text{Math}\\ \text{Charlie} & 20 & 3.8 & \text{English}\\ \hline \end{array}, \] where $??$ is some operation that does what we want. (In this example, the $??$ is actually the natural join.) Below, we give definitions for the core relational algebra operations.

Natural join. The natural join $A \bowtie B$ of two tables $A$ and $B$ has one record for each pair of records in $A$ and $B$ that have the same values on all shared attributes. Each output record is defined on the union of the input tables' attributes, with values coming from the appropriate tables. In the example above, the tables share the $\textsf{name}$ attribute. If $A$ and $B$ share no attributes, then the natural join is just the Cartesian product $A \times B$.

Projection. The projection $\Pi_{\textsf{attr}_1, \ldots, \textsf{attr}_k}\,A$ of a table $A$ onto attributes $\textsf{attr}_1, \ldots, \textsf{attr}_k$ is the table with the same records as $A$, but with only the projected attributes. Using the example above, we would have \[ \Pi_{\textsf{age}}\,\textsf{Student} = \begin{array}{| c |} \hline \textsf{age}\\ \hline 20\\ 21\\ \hline \end{array}. \] Notice that the output table has only 2 records, as duplicates are removed.

Selection. The selection $\sigma_p\, A$ of a table $A$ by predicate $p$ is the table formed by taking only records in $A$ for which $p$ is true. For example, we might want to select only students with a GPA above 3.5: \[ \sigma_{\textsf{gpa} > 3.5}\,\textsf{Student} = \begin{array}{| c | c | c |} \hline \textsf{name} & \textsf{age} & \textsf{gpa}\\ \hline \text{Alice} & 20 & 4.0\\ \text{Charlie} & 20 & 3.8\\ \hline \end{array}. \]

Union. The union $A \cup B$ of two tables $A$ and $B$ with the same schema is the table formed by taking the union of their records. For example, if we had another table of students: \[ \begin{array}{| c | c | c |} \hline \textsf{name} & \textsf{age} & \textsf{gpa}\\ \hline \text{Bob} & 21 & 3.5\\ \text{Eve} & 21 & 3.7\\ \hline \end{array} \cup \textsf{Student} = \begin{array}{| c | c | c |} \hline \textsf{name} & \textsf{age} & \textsf{gpa}\\ \hline \text{Alice} & 20 & 4.0\\ \text{Bob} & 21 & 3.5\\ \text{Charlie} & 20 & 3.8\\ \text{Eve} & 21 & 3.7\\ \hline \end{array}. \] (Again, no duplicates.)

There are more variants of these operations, but these suffice for our purposes.

Sparse tensors and tables

As it turns out, many tensors of interest have a property known as sparsity, meaning that most of the entries are zero. For example, a square diagonal matrix of size $n$ has sparsity (proportion of zeros to nonzeros) $1 - 1/n$. This is a very useful property, since zero entries can be ignored in many computations (as zero is the additive identity and the multiplicative annihilator). Thus, sparse tensors can be compressed by storing only the non-zero entries and their indices. There is a mountain of work on sparse tensor storage and computation, but we will not go into it here.³

In the context of relational algebra, we can think of a sparse rank-$k$ tensor as a table with $k$ columns, where each record is a tuple of indices representing a nonzero entry in the tensor.⁴ For example, the following table $\textsf{A}$ corresponds to the (boolean) matrix $M_\textsf{A}$: \[ \textsf{A} = \begin{array}{| c | c |} \hline \textsf{row}: \{0,1,2\} & \textsf{col}: \{0,1,2\}\\ \hline 0 & 0\\ 0 & 1\\ 2 & 0\\ 2 & 2\\ \hline \end{array}, \quad M_\textsf{A} = \left[ \begin{array}{c c c} 1 & 1 & 0\\ 0 & 0 & 0\\ 1 & 0 & 1 \end{array} \right]. \]

Note: we encode the dimensions of the corresponding matrix by the domain of allowed values for each attribute in the schema.

Obviously, the cardinality of $\textsf{A}$ is equal to the number of nonzeros in $M_\textsf{A}$.

Provenance

As the word provenance suggests, database researchers have been interested in capturing and explaining the sources of query results for some time. For example, if we have input tables representing public and private information, then we'd like to know if the result of an arbitrary query contains any private information. Consider the running example before, except student GPAs are stored in a separate "private" table (indexed by student name). Then the query $Q = \Pi_{\textsf{name}}\, \sigma_{\textsf{gpa} > 3.5}\,(\textsf{Student} \bowtie \textsf{GPA})$ would give us the names of students with a GPA above 3.5. Even though the output table does not contain any private information, the query reveals that some students have a GPA above 3.5, which is private information.

To track this kind of information, we can associate each record in the inputs with a tag (not an attribute!) that indicates its privacy level:

\[ \textsf{Student} = \begin{array}{| c c | c |}\hline \textsf{name} & \textsf{age} & \textsf{privacy}\\ \hline \text{Alice} & 20 & \text{public}\\ \text{Bob} & 21 & \text{public}\\ \text{Charlie} & 20 & \text{public}\\ \hline \end{array}, \quad \textsf{GPA} = \begin{array}{| c c | c |}\hline \textsf{name} & \textsf{gpa} & \textsf{privacy}\\ \hline \text{Alice} & 4.0 & \text{private}\\ \text{Bob} & 3.5 & \text{private}\\ \text{Charlie} & 4.0 & \text{public}\\ \hline \end{array}. \]

We made Charlie's GPA 4.0 and public for fun (and to illustrate how privacy information flows through the query). Intuitively, joining a public record with a private record should result in a private record, while selection should leave privacy unchanged. Projection is slightly subtle: if a record becomes duplicated after projecting, private information can actually be inferred in some cases. For example, suppose we are allowed to see public records after making an arbitrary query. Then, we may observe that the (public) cardinality of $Q$ is 1. If we were to then query $Q' = \Pi_{\textsf{age}}\, Q$, the resulting cardinality must also be 1, otherwise we would be able to infer Alice's GPA since Alice and Bob have the same age. Thus, the privacy of a projected record is public if and only if any duplicate source record is public. The same logic applies to union.

By squinting at this a bit, the privacy of a record actually maps directly to boolean logic:

$\text{public}$ is true,
$\text{private}$ is false,
$\bowtie$ is logical and,
$\sigma$ is the identity, and
$\Pi$ and $\cup$ apply logical or over duplicates.

Working out the example above, we have the resulting tables: \[ Q = \begin{array}{| c c c | c |}\hline \textsf{name} & \textsf{age} & \textsf{gpa} & \textsf{privacy}\\ \hline \text{Alice} & 20 & 4.0 & \text{true} \land \text{false} = \text{false}\\ \text{Bob} & 21 & 3.5 & \text{true} \land \text{false} = \text{false}\\ \text{Charlie} & 20 & 4.0 & \text{true} \land \text{true} = \text{true}\\ \hline \end{array},\quad Q' = \begin{array}{| c | c |}\hline \textsf{age} & \textsf{privacy}\\ \hline 20 & \text{false} \lor \text{true} = \text{true}\\ 21 & \text{false}\\ \hline \end{array}. \]

Provenance Semirings

Seeking to generalize the above, Green, Karvounarakis, and Tannen introduced the notion of provenance semirings in 2007. By associating each record in the query inputs with a fresh indeterminate provenance variable, and associating query operations with semiring operations over these indeterminates, one can recover different provenance models by evaluating the resultant polynomial expression in specific semirings.

Semiring

A semiring is a set $S$ with binary operations $+,\cdot : S\times S \to S$ and elements $0_S, 1_S \in S$ such that:

$+$ is associative and commutative,
$\cdot$ is associative,
$0_S + x = x$,
$1_S \cdot x = x \cdot 1_S = x$,
$0_S \cdot x = x \cdot 0_S = 0_S$, and
$\cdot$ distributes over $+$ (both left and right).

Generalizing the above example, we would get something like this: \[ \textsf{Student} = \begin{array}{| c c | c |}\hline \textsf{name} & \textsf{age} & \textsf{provenance}\\ \hline \text{Alice} & 20 & A_0\\ \text{Bob} & 21 & A_1\\ \text{Charlie} & 20 & A_2\\ \hline \end{array}, \quad \textsf{GPA} = \begin{array}{| c c | c |}\hline \textsf{name} & \textsf{gpa} & \textsf{provenance}\\ \hline \text{Alice} & 4.0 & B_0\\ \text{Bob} & 3.5 & B_1\\ \text{Charlie} & 4.0 & B_2\\ \hline \end{array}, \] \[ Q = \begin{array}{| c c c | c |}\hline \textsf{name} & \textsf{age} & \textsf{gpa} & \textsf{provenance}\\ \hline \text{Alice} & 20 & 4.0 & A_0 \cdot B_0\\ \text{Bob} & 21 & 3.5 & A_1 \cdot B_1\\ \text{Charlie} & 20 & 4.0 & A_2 \cdot B_2\\ \hline \end{array},\quad Q' = \begin{array}{| c | c |}\hline \textsf{age} & \textsf{provenance}\\ \hline 20 & A_0 \cdot B_0 + A_2 \cdot B_2\\ 21 & A_1 \cdot B_1\\ \hline \end{array}. \] To recover the privacy provenance, we would simply substitute $\cdot$ with $\land$ and $+$ with $\lor$, and then evaluate the expression with the correct boolean values for $A_i$ and $B_i$.

Tensor Computations using Provenance

At this point, it's more or less obvious that (sparse) tensor computations can be done relationally using provenance over the real arithmetic semiring. The main insight is that zero entries behave as we would expect them to, even when they are not explicitly stored in the table. Multiplication $A_{ij} B_{ij} = 0$ if either $A_{ij} = 0$ or $B_{ij} = 0$, which is exactly what happens in a join when either table lacks a record for the given $i$ and $j$. Summation is analogous, where a result is present in a union if either table has a record for the given $i$ and $j$.

Examples

Matrix multiply. Let $A$ and $B$ be tables defined as follows: \[ A = \begin{array}{|c c|c|} \hline i & k & \text{provenance}\\ \hline 0 & 0 & A_{00}\\ 0 & 1 & A_{01}\\ 0 & 2 & A_{02}\\ \hline \end{array}, \quad B = \begin{array}{|c c|c|} \hline k & j & \text{provenance}\\ \hline 0 & 0 & B_{00}\\ 1 & 0 & B_{10}\\ 2 & 0 & B_{20}\\ \hline \end{array}. \] Now, the unreduced matrix product $C_{ikj} = A_{ik}B_{kj}$ can be computed by \[ A \bowtie B = \begin{array}{|c c c|c|} \hline i & k & j & \text{provenance}\\ \hline 0 & 0 & 0 & A_{00} B_{00}\\ 0 & 1 & 0 & A_{01} B_{10}\\ 0 & 2 & 0 & A_{02} B_{20}\\ \hline \end{array}. \] To reduce over $k$, we simply project out $k$: \[ \Pi_{ij}\,(A \bowtie B) = \begin{array}{|c c|c|} \hline i & j & \text{provenance}\\ \hline 0 & 0 & A_{00} B_{00} + A_{01} B_{10} + A_{02} B_{20}\\ \hline \end{array}. \]

Elementwise addition. With $A$ and $B$ as above, we can compute the sum $C_{ij} = A_{ij} + B_{ij}$ as \[ C = A \cup B = \begin{array}{|c c|c|} \hline i & j & \text{provenance}\\ \hline 0 & 0 & A_{00} + B_{00}\\ 0 & 1 & A_{01}\\ 0 & 2 & A_{02}\\ 1 & 0 & B_{10}\\ 2 & 0 & B_{20}\\ \hline \end{array}. \]

Trace of a matrix. Consider the matrix $D$ defined by the table \[ D = \begin{array}{|c c|c|} \hline i & j & \text{provenance}\\ \hline 0 & 0 & D_{00}\\ 1 & 1 & D_{11}\\ 2 & 2 & D_{22}\\ 0 & 1 & D_{01}\\ 1 & 2 & D_{12}\\ \hline \end{array}. \]

Then the trace can be computed by $t = \Pi_{\emptyset}\,\sigma_{i = j}\,D$. This produces a nullary relation of cardinality 1, which is hard to draw as a table but is well-defined (containing the empty tuple, corresponding to a scalar value).

Broadcasting. Broadcasting is slightly tricky, since the dimension to broadcast to must be encoded separately. Let $\textbf{v}$ be a (column) vector of dimension $3$ defined by the table \[ \textbf{v} = \begin{array}{|c|c|} \hline i & \text{provenance}\\ \hline 0 & v_0\\ 1 & v_1\\ 2 & v_2\\ \hline \end{array}. \] Suppose we wish to broadcast $\textbf{v}$ to a matrix $A$ of size $3 \times 2$. In tensor algebra, this is simply written $A_{ij} = \textbf{v}_i$. To do this relationally, we need to first define the target schema by $J = (j: \{0, 1\})$. Then, we can build the dense range table $R_J$ consisting of one record per possible value of $j$, each with unit provenance. Finally, the broadcast is just the natural join (or Cartesian product) of $R_J$ and $\textbf{v}$: \[ A = \textbf{v} \bowtie R_J = \begin{array}{|c|c|} \hline i & \text{provenance}\\ \hline 0 & v_0\\ 1 & v_1\\ 2 & v_2\\ \hline \end{array} \bowtie \begin{array}{|c|c|} \hline j & \text{provenance}\\ \hline 0 & 1\\ 1 & 1\\ \hline \end{array} = \begin{array}{|c c|c|} \hline i & j & \text{provenance}\\ \hline 0 & 0 & v_0\\ 0 & 1 & v_0\\ 1 & 0 & v_1\\ 1 & 1 & v_1\\ 2 & 0 & v_2\\ 2 & 1 & v_2\\ \hline \end{array}. \]

Conclusion

In this post, we have shown how relational provenance can be used to compute tensor operations. This bridges the worlds of relational algebra and tensor algebra - a fruitful connection that has been applied in practice with promising results. While the high-level correspondence is relatively obvious, it's likely that many tricks and techniques in each domain remain to be exploited in a unified manner.

For further reading, we give a (probably very incomplete) list of related work here. (Forgive me for the lack of proper summaries - I'll leave that as an exercise for the reader!)

This is almost but not quite the classic Einstein summation notation, since we are being explicit about output indices as well. ↩
These are the "set-semantics" - a popular alternative is the "bag-semantics" where instead a multiset is used (so that the same record can appear multiple times). ↩
Check out the TACO project for a modern sparse tensor compiler where formats are composable. ↩
This construction ignores the actual values of the tensor, but we'll fix that in the next section! ↩

Changelog

2023-05-24: Fix confusing notation clash in feed-forward tensor algebra example.
2023-05-19: Fix typo in semiring definition. Make table style consistent.