A Tutorial on Interactive Proof Systems

Posted 2015/01/07. Last updated 2015/01/07.

Before We Get Started

This post will introduce Interactive Proof Systems and show how they relate to the traditional complexity classes of NP, PSPACE, and NEXPTIME. It is meant to give the mathematical foundation which is often assumed or ignored when discussing Zero-Knowledge Proofs and is based on a tutorial paper I wrote for Princeton's Theory of Computation class, taught at the time by Prof. Robert Tarjan. It assumes some familiarity with Turing machines and the basic complexity classes, but hopefully you will find that this post is more or less self-contained. If you are interested in this material, I recommend Sipser's Introduction to the Theory of Computation if you are OK with some informal and handwavy arguments (the second edition is as good as the third). Moreover, Arora and Barak's Computational Complexity is excellent for a deeper understanding of the subject (with the added benefit that the authors have made available earlier drafts of the material freely available online). Finally, the Complexity Zoo page is fantastic if you want quick facts and references for the various complexity classes.

The Essentials

Traditionally, the NP complexity class is defined as the set (class) of languages that are decided by non-deterministic polynomial-time Turing machines, but it can be equivalently (see Sipser's book for a proof) thought of as the set of languages that have a polynomial-time verifier:

Definition 1 (Verifier, Certificate)

A verifier for a language L is a Turing machine V, such that L = {w | V accepts <w,c> for some string c}. c is called a certificate for w. We consider V to be polynomial time if it is polynomial in the length of w only.

We can thus think of NP as the set of languages L for which there is an all-powerful prover P that given w finds c, and then passes it to V which verifies (deterministically and in polynomial time in |w|) that c is a correct certificate (proof) of w's membership in the language L. As an example, for the language of composite natural numbers, a certificate for w would be a pair of integers (n,m) such that n,m>1 and n·m=w. Interactive proof systems extend this idea by allowing additional interaction between the the prover and the verifier and by adding randomness. As we show below, the randomness of the verifier is essential, while the randomness of the prover is not.

Interactive Turing Machines And Protocols

Interactive Proof Systems were introduced by Goldwasser, Micali, and Rackoff in their seminal 1985 paper The Knowledge Complexity of Interactive Proof Systems. The definition of Turing machines is amended to allow for randomness and communication as follows:

Definition 2 (Interactive Turing Machine, Coin Flip)

An Interactive Turing Machine (ITM) is a Turing machine that has:
• A read-only input tape
• A scratch tape
• A random tape
• A read-only communication tape
• A write-only communication tape
The random tape contains an infinite sequence of random bits, which are read left-to-right, and reading the next random bit is called flipping a coin.

By combining two ITMs, we can define a general two-party interactive protocol as follows:

Definition 3 (Interactive Protocol, Prover, Verifier)

An Interactive Protocol is an ordered pair of ITMs (P,V), called the Prover and the Verifier respectively such that:
P, V share the same input tape
P's write-only communication tape is V's read-only communication tape and vice-versa
P is computationally unbounded, whereas V's total internal computation time is polynomial in the length of the common input
• The two machines take turns being active, and V starts
• During an active state, the machine, after some internal computation using the input, scratch, random, and communication tapes writes to its write-only tape
• Termination occurs when V either accepts or rejects the input by outputting accept or reject in its write tape, and halting the protocol.

This definition makes it clear that only polynomially many messages can be exchanged, in polynomially many rounds, and each message is polynomially bounded in the input size, as V's runtime would otherwise explode. In addition, V can only use polynomially many random bits. An Interactive Protocol for a language L (usually over {0,1}*) with a high-enough probability of accepting wL and of rejecting wL is called an Interactive Proof System:

Definition 4 (Interactive Proof System)

The Interactive Protocol (P,V) is an Interactive Proof System (IPS) for language L if
wL ⇒ Pr[(P,V) accepts w] ≥ ⅔
wL ⇒∀P', Pr[(P',V) accepts w] ≤ ⅓
where the probabilities are taken over the randomness used by the prover and verifier.

V must reject with a high probability for all possible provers, because V has no guarantees that it is communicating with an honest party. Note that the given probabilities fix w, so by repeating the interaction m times, and accepting iff at least ⌊m/2⌋ individual runs were accepted, we can increase the accept probability of wL to at least 1-2-Ω(m) by the Chernoff bound. The case where wL is slightly more subtle, because we wish to show that for all Interactive Turing Machines P', any interaction succeeds with probability at most ⅓, regardless of what happened in the previous interactions with P' (the prover strategy may depend on these previous interactions). However, by the definition, this probability is at most ⅓ even for the provers that know everything about previous interactions, so again by the Chernoff bound we get that if wL, the accept probability is at most 2-Ω(m). In particular, taking m=O(ns) for some s>0 means we have polynomially many interactions, and probabilities at least 1-2-ns and at most 2-ns respectively. Thus, we can say that, intuitively, an IPS for a language L consists of an "honest" prover and an "honest" verifier such that when wL, the prover can convince the verifier of this fact with very high probability, but when wL, no prover can wrongly convince the verifier, except with very low probability.

The fractional constants are not important, and, using the same argument with Chernoff bounds, we can replace them by any values strictly in (½, 1) and (0, ½) respectively. This same argument also shows why we can use constants instead of using the original definition where ∀k and all sufficiently large w, the probabilities are ≥ 1-|w|-k and ≤ |w|-k respectively.

It should be noted that we cannot replacewith 0: having perfect soundness reduces the class to NP. Indeed, the certificate for a verifier (in the NP sense) is basically a transcript of a successful interaction with the prover, and there is no certificate that can convince the verifier of a false membership, concluding the proof. The interested reader may wish to compare this proof to the one showing that the randomness of the verifier is essential, as the core ideas are very similar.

Surprisingly, however, replacingwith 1 does not change the complexity class, i.e. having perfect completeness is no more restrictive. This result was originally proven in 1989 by using the fact that public coin and private coin proofs are essentially the same, but is somewhat complicated. The proof we give below that IP=PSPACE provides a much more elegant explanation.

Finally, we should remark that fixing the verifier V, but having a different prover Pw for all wL does not change the complexity class either. Indeed, the prover P is computationally unbounded, and can thus find and simulate each of the smaller provers, which can be assumed to be deterministic. P can also easily simulate the interaction with V, as V's randomness and the total length of the messages exchanged are bounded, so P could simply enumerate all possibilities, and ensure that the completeness condition is satisfied.

With these remarks in hand, we can finally define the IP class:

Definition 5 (The IP Class)

The Interactive Polynomial (IP) time class is the class of languages that have an Interactive Proof System. For polynomial q(n), we also denote by IP[q] the class of languages that have an IPS with q(n) rounds for w with |w|=n (whether wL or not).

Coin Tosses Are Essential… But Only For The Verifier

The non-determinism of the verifier is essential to make the IP class non-trivial. Indeed, suppose for the language L, there is an IPS (P,V), where V does not use any randomness. Then, on input w, at each step of the communication, after P sends out it’s i-th message mi, it already knows V’s response, ni, as V is deterministic. As a result, there exists a single-round IPS (P',V') for L, where P' simulates the interaction between P and V and sends the entire message history to V' (which is polynomial in length by definition), and V' simply simulates V on the exchanged message history and checks that the transcript is valid.

What this means is that if wL, there must be at least one c (sent to V' by P'), such that V' accepts on (w, c), else the accept probability would be 0. In addition, if wL, there can’t be any c such that V' accepts (w, c). Indeed, if it were so, a cheating prover could always send c to V', and in that case the probability of accepting would be 1 > ⅓, contradicting the definition of an IPS. This shows that V' is a verifier for L in the NP sense, establishing that the set of languages in IP with a deterministic verifier, denoted as dIP, is such that dIPNP. But by the introductory remarks, we also have that NPdIP, concluding the proof that dIP = NP.

However, the prover can be deterministic. Suppose we have a different prover Pw for each w. Then, fixing w and the prover's randomness r, and simulating the interaction with V over all values of V’s randomness (which we can do, as the randomness is bounded), we get that for each r, we have a probability pr that V accepts. Out of all such r, there must exist at least one r' with pr' ≥ ⅔, else (P,V) would accept with probability < ⅔. Thus, we can use the prover P'w which simulates Pw with a randomness of r', and is thus deterministic. Combining each individual prover to an all-around prover that is still deterministic shows that the prover’s non-determinism is truly non-essential.

Coin Tosses Can Be Public

Up until now, both verifier and prover did not reveal the outcome of their coin tosses. As a matter of fact, we cannot send the prover's coin tosses across, because they might not be polynomially bounded, but we can introduce a class where the verifier sends its coin tosses to the prover instead of a message. This was done by Babai (also in 1985), and lead to the complexity class AM:

Definition 6 (AM, Public Coins)

For every polynomial q(n), we define the class AM[q] as the subset of IP[q] where the verifier's messages are the random bits used. That is to say, only the coin flips are sent across and no coin flip is allowed if it is not sent across. These are called public coin or Arthur-Merlin proofs, where Arthur is the verifier and Merlin the prover. The class AM is defined to be AM[2], which means that the verifier sends the random bits, receives the prover's response and then deterministically accepts/rejects.

There is a related notion of the hierarchy classes MA, where the prover Merlin moves first, but we won't discuss it further, as they can be reduced to AM interactions with an extra "dummy" round, though it is unknown whether AMMA. Note that, as before, we can still require perfect completeness (i.e. making Arthur accept with probability 1) without changing the two complexity classes.

As Babai also showed in his original paper, we have that for any fixed k ≥ 2, AM[k] = AM[2], by showing that AM[k + 1] ⊆ AM[k], as the reverse direction is trivial. An even more surprising result due to Goldwasser and Sipser is that every private coin proof in IP has a public coin proof with just two more rounds of interactions. Because AM[poly] is by definition a subset of IP, this shows that AM[poly(n)] = IP.

The crux of the proof lies in the fact that for every function q : ℕ → ℕ computable in polynomial time, we have that IP[q] ⊆ AM[q + 2]. Though the actual proof is very technical and beyond the scope of this introductory post, the basic idea is quite elegant. As explained above, we can assume that the prover is deterministic, and that the private-coin verifier's random strings are bounded in length by q(n). Convincing the verifier that wL in the private-coin version essentially comes down to showing that the set of accepting strings has size more than ½ of the original set (i.e. size strictly more than 2q(n)-1). The "game" between the prover and the verifier then can be rephrased as follows: given a set S ⊆ {0,1}m, and a common number K, the prover tries to convince the verifier that |S| ≥ K, and the verifier needs to reject with high probability if |S| ≤ K/2. Goldwasser and Sipser then use pairwise independent hash functions to make concrete the protocol for proving the size of the set, but there is considerable detail in establishing completeness and soundness, and even more subtlety in showing that we can do this in a way that only increases the number of rounds by 2.

It's Just Polynomial Space

An interesting and perhaps somewhat unexpected theorem due to Shamir is that IP = PSPACE. The proof we present here in detail, however, is the one given in the Sipser and Arora/Barak books mentioned previously.

IPPSPACE

First off, we prove that IPPSPACE. The proof, although slightly technical uses a very simple idea: because the messages exchanged are polynomial in length, the entire tree of interactions can be explored in polynomial space, and for each fixed w, we can find the maximum probability that any prover can convince the verifier that wL (We are using the equivalent version of an IPS, where we have a distinct prover per w). Then, if this probability is ≥ ⅔, we declare that wL, and if it isn't (and is then forced to be ≤ ⅓, else this verifier does not give rise to an IPS), then wL.

To formalize this argument, given a language LIP, we suppose that the verifier V exchanges exactly p = p(|w|) messages, where we add dummy messages if necessary. We define Pr[V accepts w] = maxP Pr[(P, V) accepts w]. By definition, this is ≥ ⅔ if wL and ≤ ⅓ otherwise. We claim that given V only, which we can simulate as it is polynomial time, we can calculate this probability and thus decide L in polynomial space. The idea now is that given the exchanged messages Mj = m1#…#mj, we can calculate the probability that (P,V) accepts assuming they have exchanged Mj, and thus working backwards we can calculate the probability of accepting, starting without any message exchanges.

We thus write (P,V)(w,r,Mj) = accept if we can extend Mj with mj+1, …, mp such that:
• For 0 ≤ i < p, and i even, V(w, r, m1#…#mi) = mi+1
• For ji < p, and i odd, P(w, m1#…#mi) = mi+1
mp = accept.

This essentially says that given the randomness r, the verifier's messages are consistent, and we (deterministically) extend the prover's messages, until we have reached the end and have accepted. Then, we can define Pr[(P, V) accepts w starting at Mj] = Prr[(P, V)(w, r, Mj) = accept], where the probability is taken over all strings of randomness r that are consistent with Mj, or 0 if no such strings exist. Then, we can define Pr[V accepts w starting at Mj] = maxP Pr[(P, V) accepts w starting at Mj].

For Mj, where 0 ≤ j < p, we define NMj as follows:
• 1, if j = p, and Mp is consistent with V's messages for some r and mp = accept
• 0, if j = p, but the above is not true
• maxmj+1 NMj+1, if j < p is odd
• ∑mj+1 Prr[V(w, r, Mj) = mj+1] · NMj+1, if j < p is even.

If M0 represents the empty message stream, we wish to show that NM0 is computable in polynomial space and that it represents the probability that V accepts w. The first part is easy: knowing the values at j + 1, calculating the values at j either requires taking a maximum, or a weighted-average. The maximum is easy, and the weighted average simply requires going over all strings of randomness (whose lengths are polynomially bounded), seeing which ones are consistent, and checking the fraction of the remaining ones that cause to verifier to output mj+1. This does not require storing many values at a time: we can be averaging/maximizing for the next level as we are computing, and the depth of the recursion is polynomial, so this only requires polynomial space, as desired.

The second claim is proven by backwards induction, that is to say, for all 0 ≤ jp, and every Mj, we must show that NMj = Pr[V accepts w starting at Mj]. Indeed, the base case j = p is by definition is correct. Assuming we have it for j + 1 and all Mj+1, we must take cases:

If j is even:

NMj =
mj+1 Prr[V(w, r, Mj) = mj+1] · NMj+1 =
mj+1 Prr[V(w, r, Mj) = mj+1] · Pr[V accepts w starting at Mj+1] =
Pr[V accepts w starting at Mj],
where the first and last equalities are by definitions and the second by the induction hypothesis.

If j is odd:

NMj =
maxmj+1 NMj+1 =
maxmj+1 Pr[V accepts w starting at Mj+1] =
Pr[V accepts w starting at Mj],
where the first and second equalities are as above. The third comes from observing that it is ≤, as the prover that maximizes the probability at j needs to send an mj+1 maximizing the probability at j + 1. But it is also ≥, as sending anything else but the maximizing mj+1 can only lower the resulting probability.

This concludes the proof that IPPSPACE.

PSPACEIP

We now wish to show that PSPACEIP. We know that 3-CNF TQBF is complete in polynomial space, so it suffices to show that 3-CNF TQBF has an IPS. The proof relies on a beautiful concept called arithmetization. The variables xi can be thought of as taking the truth values 0, 1 over some finite field F. Then, xixj is true iff xi·xj = 1 and ¬xi is true iff 1 - xi = 1 in the field. Thus, given a clause of the form x1x2x3, the polynomial 1-(1-X1)X2X3 is 1 iff the {0,1}3 assignment to X1, X2, X3 makes the clause true and 0 otherwise.

Thus, if the original formula has m clauses in n variables, for the j-th clause, we can define the polynomial pj(X1, …, Xn) as above (which only depends on three variables), and we can define the overall polynomial PΦ(X1, …, Xn) = ∏ 0 ≤ jm pj(X1, …, Xn), whose value is 1 on satisfying assignments and 0 otherwise. This is the arithmetization of the boolean formula, and the crux of the IPS is that allowing this polynomial to be evaluated over Fn (as opposed to {0,1}n) gives the prover its required power.

Now, given a formula in alternating 3-CNF TQBF (which, again, is PSPACE complete) of the form
Ψ = ∀ x1x2 … ∃ xn Φ(x1, …, xn),
Ψ ∈ TQBF iff
b1 ∈ {0,1}b2 ∈ {0,1} … ∑bn ∈ {0,1} PΦ(b1, …, bn) ≠ 0

Because the above formula is only evaluated over {0,1}, we always have xk = x for k ≥ 1.

As a result, the linearization polynomial
LXi(p)(X1, …, Xn) = Xi · p(X1, …, Xi-1, 1, Xi+1, …, Xn) + (1-Xi) · p(X1, …, Xi-1, 0, Xi+1, …, Xn)
is both linear in Xi and agrees with p when Xi ∈ {0,1}. In addition, we have that
Xi p(X1, …, Xn) = p(X1, …, Xi-1, 0, Xi+1, …, Xn) · p(X1, …, Xi-1, 1, Xi+1, …, Xn), and
Xi p(X1, …, Xn) = p(X1, …, Xi-1, 0, Xi+1, …, Xn) + p(X1, …, Xi-1, 1, Xi+1, …, Xn)
so that the original product/sum value is non-zero iff
x1LX1x2LX2…∃xmLXnPΦ(X1, …, Xn) ≠ 0.

It is clear that the size of this expression was only expanded by O(n2), and the original polynomial P only requires O(m) space (before we expand it), as it uses m cubic polynomials.

Now, we inductively describe the protocol. Initially, the prover chooses a prime p ∈ (2n+m,22(n+m)] and sends it to the verifier (after a dummy "hello" message by the verifier). The verifier then either deterministically or probabilistically in polynomial time checks that p is prime. We suppose that for polynomial g(X1, …, Xk), the prover is able to convince the verifier that g(a1, …, ak) = C for any a1, …, ak, C with probability 1 when the claim is true and less than ε when the claim is false, where the equality is over Fp. We define the polynomial U(X1, …, Xl) = OXo g(X1, …, Xk), where O is one of ∃, ∀, L and the number of variables l is k - 1, k - 1, k respectively in these cases.

Denoting by d an upper bound on the degree of xi in U, we show that the prover can convince the verifier that U(a1, …, al) = C' with probability 1 for any a1, …, al, C' when the claim is true, and with probability at most ε + d/p if it is false. By renaming if necessary, we can assume i = 1. The prover sends across a polynomial s(X1) (which has degree ≤ d) and which is supposed to equal g(X1, a2, …, ak), and the verifier checks the following:

• If O = ∃, the verifier checks that s(0) + s(1) = C'
• If O = ∀, the verifier checks that s(0) · s(1) = C'
• If O = L, the verifier checks that a1s(0) + (1 - a1)s(1) = C'.
If the check doesn't work, then it rejects, otherwise, the verifier randomly picks aFp and asks the prover to prove that s(a) = g(a, a2, …, ak).

Clearly, if the prover is honest, all these claims will go through, so the prover convinces the verifier with probability 1. If the claim that U(a1, …, al) = C' is false, then if the prover actually returns g(X1, a2, …, ak), the verifier will always reject. So the prover must lie, and return some s'(X1) ≠ s(X1).

Because the polynomial f(X1)=s'(X1)-s(X1) is non-zero and has degree at most d by assumption, it has at most d roots. But when V picks aFp, Pra[s(a) ≠ s'(a)] ≥ 1-d/p. Thus, overall, the probability that V rejects is at least (1 - d/p)·(1 - ε) = 1-(ε +d/p) + εd/p ≥ 1-(ε + d/p), thus V accepts with probability at most ε + d/p, as desired. It should be noted that these claims over the finite field hold because p is chosen to be large enough, and would not necessarily work in any general field.

Thus, because we have a recursion depth of O(n2), our d is bounded above by 3m, and because p is exponential in n + m, the accept probability is very close to 0. The recursion depth is polynomial, at each step, the bits of randomness required is linear in n + m, and we only need to evaluate and send across a polynomial amount per stage, so this is really an IPS, concluding the proof that PSPACEIP.

In particular, this also means that every language LIP has an IPS of perfect completeness. Indeed, because we also have that LPSPACE, there is a polynomial time f such that wL iff f(w) ∈ TQBF, and thus we can run the above protocol to get the verifier to accept with probability 1, as desired.

One More Prover Gives Power… But Even More Do Not

The natural extension to the interactive proof model involves having more than 2 provers (which cannot communicate) trying to convince a single verifier of a string's membership in a language. Though this idea was initially introduced for the purpose of Zero-Knowledge Proofs for NP sets, in this section we discuss yet another result by Babai (et al.). First we need to generalize the single-prover definition:

Definition 7 (k-Prover Interactive Protocol)

P1, …, Pk are computationally unbounded Turing machines
V is a probabilistic, polynomial time Turing machine
• All TMs have a read-only input tape, a scratch tape and a random tape
• All provers Pi share an infinite read-only random tape of 0s and 1s
• Each Pi has a read-only and a write-only tape to communicate with V and vice-versa.

This definition means that even though the provers can "coordinate" through their common random tape, they can't communicate once the interaction with V has begun. Of course, for the definition to be precise, the Turing machine definition also needs to be amended, but this is done similarly to the definition of ITMs. The extension for a k-prover interactive proof system is thus natural:

Definition 8 (k-Prover Interactive Protocol)

We say that the language L ⊆ {0,1}* has a k-prover IPS if there exists V such that:
• ∃ P1, …, Pk such that (P1, …, Pk, V) is a k-prover interactive protocol and ∀wL, Pr[V accepts w] ≥ ⅔
• ∀ P1', …, Pk' such that (P1', …, Pk', V) is a k-prover interactive protocol and ∀wL, Pr[V accepts w] ≤ ⅓.

We can thus define the relevant complexity class:

Definition 9 (The MIP Class)

The Multi-Prover Interactive Polynomial (MIP) time class is the class of languages that have a 2-prover Interactive Proof System. For polynomial q(n), we also denote as IPq the class of languages that have an interactive proof system with q(|w|) provers on input w.

The first interesting result regarding this complexity class is that adding additional provers does not benefit us, i.e. IPq = MIP. The idea is that one prover P1' can simulate all the individual provers P1, …, Pq, while a second prover P2' can be used to simulate a specific prover, which is chosen randomly. This way, the second prover P2' can be used to check the non-collusion property of the system, verifying the transcript provided by P1'. Even though in the original paper this is only proven for fixed q(n) = k ≥ 2, it can be easily seen that repeating the sub-protocol sufficiently many polynomial times, the more general result follows.

The second very interesting result is that adding the second prover does in fact give us additional power under reasonable hierarchy assumptions. That is to say, MIP = NEXPTIME. The proof given by Babai is very lengthy and technical (unlike the above result which can be proven in less than half a page and just consists of formalizing the given idea), and even showing that MIPNEXPTIME is not as straightforward as its IPPSPACE counterpart. This result was first shown by Fortnow et al., and the key step is that a language L has a multi-prover IPS iff it is accepted by some probabilistic oracle. Then the non-deterministic exponential time machine guesses the values of the oracles on all relevant length strings (which are polynomially bounded), and accepts if more than half are satisfied, in a similar idea as the one used for transforming private-coin proofs to public coin proofs.

The reverse direction (that NEXPTIMEMIP) proven by Babai et al. used the fact that every LNEXPTIME can be transformed into a 3-CNF with exponentially many clauses and variables (also known as "SUCCINCT 3SAT"). The crux of the argument is then a modified version of the arithmetization argument we presented in the PSPACEIP proof, but with a lot of technicalities. For instance, it is neccesary to "test whether a function in n variables over ℤ, given as an oracle, is multilinear over a large interval". As the authors put it in the paper, the problem essentially exists when the provers can "gain an advantage by having the [arithmetization] function A not multilinear", as their protocol heavily relies on that fact. Even though the actual proof is complicated and relies on combinatorics to calculate eigenvalues and prove some bounds on graphs, the end result is that there exists a probabilistic polynomial-time Turing machine that given access to A as an oracle, always accepts if A is a "nice" multilinear function, and rejects with high probability if A cannot be approximated by a multilinear function on a "large" fraction of the domain, which is sufficient to proof soundness of the constructed protocol.

Putting It All Together

In this post, we discussed how Interactive Proof Systems can have perfect completeness, but not perfect soundness, even though we can get arbitrarily close. Moreover, we discussed that we can have different provers for each w, which can also be deterministic. However, the verifiers must use randomness, since dIP = NP, even though these coin tosses can be made public, because AM[poly] = IP. Though we only glanced over this fact, we proved in great detail that IP = PSPACE by using the ideas of simulation, arithmetization, and linearization. Finally, we gave insights into why an extra (non-colluding) prover adds power, and also why even more do not, so that IPq = MIP = NEXPTIME.

Overall, Interactive Proof Systems are truly remarkable, with the resulting complexity classes being closely related to many "traditional" classes, and it is unfortunate that they are often ignored or assumed as background when discussing Zero-Knowledge Proofs, a topic for another post.