Note: Latex formulas below require Java Scripts enabled for this page,
for https://cdn.jsdelivr.net and https://www.cs.bu.edu/fac/lnd/mjx.js
Besides, this html is less reliable than a better printable PDF version at
These are notes for the course CS-172 I first taught in the Fall 1986 at UC
Berkeley and subsequently at Boston University. The goal was to introduce the
undergraduates to basic concepts of Theory of Computation and to provoke their
interest in further study. Model-dependent effects were systematically ignored.
Concrete computational problems were considered only as illustrations of
general principles.
The notes are skeletal: they do have (terse) proofs, but exercises,
references, intuitive comments, examples are missing or inadequate. The notes
can be used for designing a course or by students who want to refresh the known
material or are bright and have access to an instructor for questions. Each
subsection takes about a week of the course. Versions of these notes appeared
in SIGACT News.
Acknowledgments. I am grateful to the University of California at
Berkeley, its MacKey Professorship fund and Manuel Blum who made possible for
me to teach this course. The opportunity to attend lectures of M. Blum and
Richard Karp and many ideas of my colleagues at BU and MIT were very beneficial
for my lectures. I am also grateful to the California Institute of Technology
for a semester with light teaching load in a stimulating environment enabling
me to rewrite the students' notes. NSF grants #DCR-8304498, DCR-8607492,
CCR-9015276 also supported the work. And most of all I am grateful to the
students who not only have originally written these notes, but also influenced
the lectures a lot by providing very intelligent reactions and criticism.
1 Models of Computations; Polynomial Time & Church's Thesis
1.1 Deterministic Computation
Sections Models,
Diagonal Results study deterministic
computations. Non-deterministic aspects of computations (inputs, interaction,
errors, randomization, etc.) are crucial and challenging in advanced theory and
practice. Defining them as an extension of deterministic computations is
simple. The latter, however, while simpler conceptually, require elaborate
models for definition. These models may be sophisticated if we need a precise
measurement of all required resources. However, if we only need to define what
is computable and get a very rough magnitude of the needed resources, all
reasonable models turn out equivalent, even to the simplest ones. We will pay
significant attention to this surprising and important fact. The simplest
models are most useful for proving negative results and the strongest ones for
positive results.
We start with terminology common to all models, gradually making it more
specific to those we actually study. We represent computations
as graphs: the edges reflect various relations between nodes
(events). Nodes, edges have attributes: labels, states, colors,
parameters, etc. (affecting the computation or its analysis).
Causal edges run from each event to all events essential for
its occurrence or attributes. They form a directed acyclic graph (though cycles
may be added artificially to mark the external input parts of the computation).
We will study only synchronous computations. Their nodes
have a time parameter. It reflects logical steps, not
necessarily a precise value of any physical clock. Causal edges only span short
(typically, moments) time intervals. One event among the causes of a
node is called its parent. Pointer edges
connect the parent of each event to all its other possible causes and reflect
connections that allow simultaneous events to interact and have a joint effect.
Pointers with the same source have different labels. The (labeled) subgraph of
events/edges at a given time is an instant memory configuration
of the model.
Each non-terminal configuration has active nodes/edges
around which it may change. The models with only a small active area at any
step of the computation are sequential. Others are called
parallel.
Complexity
The following measures of computing resources of a machine on input
will be used throughout:
Time: The greatest depth of causal chains is
the number of computation steps. The volume is the combined number
of active edges during all steps. Time is used (depending on the
context) as either depth or volume, which are close for sequential models.
Note that time complexity is robust only up to a constant factor:
a machine can be modified into a new one with a larger alphabet of labels,
representing several locations in one. It would produce identical results
in a fraction of time and space (provided that the time limits suffice
for transforming the input and output into the other alphabet).
Space: or of a synchronous computation
is the greatest (over time) size of its configurations. Sometimes excluded are
nodes/edges unchanged since the input.
Growth Rates (typically expressed as functions of bit length
of input/output /):
: .
(These are customary but somewhat misleading notations.
The clear notations would be like .) .
( and ).
Here are a few examples of frequently appearing growth rates:
negligible ; moderate (called polynomial
or P, like in P-time); infeasible: , also
, .
(A rougher estimate follows by computing using
that is bounded by the
total variation of . So for monotone
and the is , .)
The reason for ruling out exponential (and neglecting logarithmic) rates is
that the visible Universe is too small to accommodate exponents. Its radius is
about 46.5 giga-light-years Plank units.
A system of atoms packed in Plank Units radius collapses
rapidly, be it Universe-sized or a neutron star.
So the number of atoms is .
1.2 Rigid Models
Rigid computations have another node parameter:
location or cell. Combined with time, it
designates the event uniquely. Locations have structure or
proximity edges between them. They (or their short chains)
indicate all neighbors of a node to which pointers may be directed.
Cellular Automata (CA)
CA are a parallel rigid model. Its sequential restriction is the
Turing Machine (TM). The configuration of CA is a (possibly
multi-dimensional) grid with a finite, independent of the grid size, alphabet
of states to label the events. The states include, among other
values, pointers to the grid neighbors. At each step of the computation, the
state of each cell can change as prescribed by a transition
function (also called program) applied to the previous states of the cell and
its pointed-to neighbors. The initial state of the cells is the input for the
CA. All subsequent states are determined by the transition function.
An example of a possible application of CA is a VLSI (very large scale
integration) chip represented as a grid of cells connected by wires (chains of
cells) of different lengths. The propagation of signals along the wires is
simulated by changing the state of the wire cells step by step. The clock
interval can be set to the time the signals propagate through the longest wire.
This way the delays affect the simulation implicitly.
An example: the Game of Life (GL)
GL is a plane grid of cells, each holds a 1-bit state (dead/alive) and pointers
to the 8 adjacent cells. A cell remains dead or alive if the number of its
live neighbors is . It becomes (or stays) alive if . In all other
cases it dies (of overcrowding or loneliness) or stays dead.
A simulation of a machine by is a
correspondence between memory configurations of and which is
preserved during the computation (may be with some time dilation). Such
constructions show that the computation of on any input can be
performed by as well. GL can simulate any CA (see a sketch of an
ingenious proof in the last section of [Berlekamp,
Conway, Guy 82] in this formal sense:
We fix space and time periods . Cells of GL are mapped to cell
of CA (compressing
blocks). We represent cell states of by states of blocks of
. This correspondence is preserved after any number steps of and
steps of regardless of the starting configuration.
Turing Machines (TMs)
TM is a minimal CA. Its configuration - tape - is an
infinite to the right chain of cells. Each state of a cell has a pointer to one
of the cell's two adjacent neighbors. No adjacent cells can both point away
from each other. Only the two cells pointing at each other are
active, i.e. can change state. The cell that just turned its
pointer is the TM's moving head working on the tape symbol - its target. The
input is an array of non-blank cells (only one is rightward) followed by blanks
at the right.
Another type of CA represents a TM with several non-communicating heads.
At most heads fit in a cell. They can vanish, split, or merge only in
the first cell (which, thus, controls the number of active cells). The input
makes an unchangeable "ink" part of each cell's state. The rest of the
cell's state is in "pencil" and can be changed by . The computation halts
when all heads drop off. The output is the pencil part of the tape's
state. This model has convenient theoretical features. E.g. with linear (in
) number () of state changes (volume) one can solve the
Bounded Halting Problem: find out whether the machine with a program
stops on an input within volume of computation.
Exercise: Find a method to transform any given multi-head TM into
another one such that the value of the output of (as a binary
integer) and the volumes of computation of and of are all equal
within a constant factor (for all inputs ).
Hint:-cells may have a field to simulate and maintain (in
other fields) two binary counters (with density of heads) for
the number of heads of and for 's volume. Their least significant
digits are at the leftmost cell. adds its most significant digit to the
same position in at each step of . To move the carry on a head
is borrowed from . These -heads move right in till they empty their
carry into its digit. Then empty -heads move back to , in a separate
field/track, possibly first continuing right to find a free slot in this return
track. (The heads area in extends to -th cell only by dropping the carry
there, with frequency . Then it shrinks to in steps
since heads enter it slower than they move away.) Borrowed or returned heads
make low or high head-density areas in which shift left until absorbed at
the leftmost cell.
1.3 Pointer Machines
The memory configuration of a Pointer Machine (PM), called
pointer graph, is a finite directed labeled multigraph. One
node is marked as root and has directed paths to all nodes.
Nodes see and change the configuration of their
out-neighborhood of constant (2 suffices) depth. Edges
(pointers) are labeled with colors from a
finite alphabet common to all graphs handled by a given program. The pointers
coming out of a node must have different colors (which bounds the outdegree).
Some colors are designated as working and not used in
inputs/outputs. One of them is called active as also are
pointers carrying it and nodes seeing them. Active pointers must have inverses,
form a tree to the root, and can be dropped only in leaves.
All active nodes each step execute an identical program.
At its first pulling stage, node acquires copies of all
pointers of its children using "composite" colors: e.g., for a two-pointer path
colored , the new pointer is colored , or an
existing -colored pointer is recolored . also spawns
a new node with pointers to and from it. Next, transforms the colors of its
set of pointers, drops the pointers left with composite colors, and vanishes if
no pointers are left. Nodes with no path from the root are forever invisible
and considered dropped. The computation is initiated by inserting an active
loop-edge into the root. When no active pointers remain, the graph, with all
working colors dropped, is the output.
Exercise: Design a PM transforming the input graph into the same one
with two extra pointers from each node: to its parent in a BFS spanning tree
and to the root. Hint: Nodes with no path to the root can never be
activated but can be copied with pointers, copies connected to the root,
the original input removed.
PM can be parallel, PPM[Barzdin'
Kalnin's 74] or sequential, SPM.
SPM differ in that only pointers to the root, their sources, and nodes
that have pointers with inverses to these sources can be active.
A Kolmogorov or Kolmogorov-Uspenskii
Machine (KM) [Kolmogorov Uspenskii 58], is a
special case of Pointer Machine
[Schoenhage 80] with the restriction that all
pointers have inverses. This implies the bounded in/out-degree of the graph
which we further assume to be constant.
Fixed Connection Machine (FCM) is a variant of the PKM with
the restriction that pointers once created cannot be removed, only
re-colored. So when the memory limits are reached, the pointer structure
freezes, and the computation can be continued only by changing the colors of the
pointers.
PPM is the most powerful model we consider: it can simulate the others in
the same space/time. E.g., cellular automata make a simple special case of a
PPM which restricts the Pointer Graph to be a grid.
Exercise: Design a machine of each model (TM, CA, KM, PPM) which
determines if an input string has a form , . Analyze
time (depth) and space. KM/PPM takes input in the form of colors of edges
in a chain of nodes, with root linked to both ends. The PPM nodes also have
pointers to the root. Below are hints for TM,SPM,CA. The space is in
all three cases.
Turing and Pointer Machines. TM first finds the middle of by
capitalizing the letters at both ends one by one. Then it compares letter by
letter the two halves, lowering their case. The complexity is:
. SPM acts similarly, except that the root keeps and updates
the pointers to the borders between the upper and lower case substrings. This
allows constant time access to these borders. So, .
Cellular Automata. The computation starts with the leftmost cell
sending right two signals. Reaching the end the first signal turns back. The
second signal propagates three times slower, so they meet in the middle of
and disappear. While alive, the second signal copies the input field of
each cell into a special field . The symbols will try to move right
whenever the next cell's field is blank. So the chain of these symbols
alternating with blanks expands right from the middle of . Upon reaching
the end they will push the blanks out and pack themselves back into a copy of
the left half of shifted right. When a symbol does not have a blank at
the right to move to, it compares itself with the field of the same cell.
If they differ, a signal is generated which halts all activity and rejects .
If all comparisons are successful, the last generates the accepting signal.
The depth is .
1.4 Simulation
We have considered several models of computation. We will see now how the
simplest of them - Turing Machine - can simulate all others: these powerful
machines can compute no more functions than TM.
Church-Turing Thesis is a generalization of this conclusion: TMs
can compute every function computable in any thinkable physical model of
computation. This is not a math theorem because the notion of model is not
formally specified. But the long history of studying ways to design real and
ideal computing devices makes it very convincing. Moreover, this Thesis has a
stronger Polynomial Time version which bounds the volume of
computation required by that TM simulation by a polynomial of the volume used
by the other models. Both forms of the Thesis play a significant role in
foundations of Computer Science.
PKM Simulation of PPM. For convenience, we assume all PPM nodes have
pointers to root. PPM configuration is represented in PKM with extra colors
used in a -colored binary tree added to each node so that all
(unlimited in number) PPM pointers to are reconnected to its leaves, and
inverses, colored , added to all pointers. The number of pointers
increases at most 4 times. To simulate PPM, gets a binary
name formed by the colors on its path through the root
tree, and broadcasts it down its own tree. For pulling stage extends its
tree to double depth and merges (with combined colors) its own pointers to
nodes with identical names. Then re-colors its pointers as PPM program
requires and rebalances its tree. This simulation of a PPM step takes
polylogarithmic parallel time.
TM Simulation of PKM. We assume the PKM keeps a constant degree and
a roughly balanced root tree (to yield short node names as described above). TM
tape reflects its configuration as the list of all pointers sorted by source
name, then by color. The TM's transition table reflects the PKM program. To
simulate PKM's pulling stage TM creates a copy of each pointer and sorts copies
by their sinks. Now each pointer, located at source, has its copy near its
sink. So both components of 2-pointer paths are nearby: the special
double-colored pointers can be created and moved to their sources by resorting
on the source names. The re-coloring stage is straightforward, as all relevant
pointers having the same source are located together. Once the root has no
active pointers, the Turing machine stops and its tape represents the PKM
output. If a PPM computes a function in steps, using
nodes, the simulating TM uses space , ( bits for
each of pointers) and time , as TM sorting takes quadratic
time.
Squaring matters ! TM cannot outperform Bubble Sort. Is its
quadratic overhead a big deal? In a short time all silicon gates on your PC
run, say, clock cycles combined. Silicon
parameters double almost annually. Decades may bring micron-thin things that
can sail sunlight in space in clouds of great computing and physical (light
beam) power. Centuries may turn them into a Dyson Sphere enveloping the solar
system. Still, the power of such an ultimate computer is limited by the number
of photons the Sun emits per second: .
Giga-years may turn much of the known universe into a computer, but its might
is still limited by its total entropy .
Faster PPM Simulations. Parallel Bubble-Sort on CA or Merge-Sort on
sequential FCM take nearly linear time. Parallel FCM can do much better
[Ofman 65]. It represents and updates pointer
graphs as the above TM. All steps are straightforward to do locally in parallel
polylog time except sorting of pointers. We need to create a fixed connection
sorting network. Sophisticated networks sort arbitrary arrays of integers
in parallel steps. We need only a simpler polylog method.
Merge-Sort splits an array of two or more entries in two halves and sorts each
recursively. Batcher-Merge combines two sorted lists in steps.
Batcher Merge. A bitonic cycle is the combination of
two sorted arrays (one may be shorter), connected by max-to-max and
min-to-min entries. Entries in a contiguous half (high-half) of
the cycle are than all entries in the other (low) half.
Each half (with its ends connected) forms a bitonic cycle itself.
A flip of an array entry is one with the highest address
bit flipped. Its shift has the highest bit of its address
cycle-shifted to the end. Linking nodes in a -nodes array to their flips
and shifts forms a Shuffle Exchange graph. We merge-sort two
sorted arrays given as a bitonic cycle on such a graph as follows.
Comparing each entry with its flip (half-a-cycle away), and switching if
wrongly ordered, fits the high and low halves into respectively the first and
last halves of the array by shifting the dislocated segment of each (thus
rotating each cycle). This repeats for each half recursively (decrementing
via graph's shift edges).
2 Universal Algorithm; Diagonal Results
2.1 Universal Turing Machine
The first computers were hardware-programmable. To change the function
computed, one had to reconnect the wires or even build a new computer. John von
Neumann suggested using Turing's Universal Algorithm. The function computed can
be then specified by just giving its description (program) as part of the input
rather than by changing the hardware. This was a radical idea, since in the
classical mathematics universal functions do not exist (as we will see
in Sec. Uncomputability; Goedel Theorem).
Let be the class of all TM-computable functions: total (defined for all
inputs) and partial (which may diverge). Surprisingly, there is a universal
function in . For any Turing Machine that computes in time
and space , uses a program of length listing the commands
and initial head state of . Then simulates in time
and space . It operates in cycles, each simulating one step of .
After steps of , let be the head's state, be the left
from it part of the tape, be the rest of the tape. After cycles has the tape configuration and looks up to find the
command corresponding to the state and the first symbol of . It
modifies accordingly. When halts, erases the (penciled)
from the tape and halts too. Universal Multi-head TM works similarly but
can also determine in time whether it halts in steps (given
and an appropriate program).
Exercise: Design a universal multi-head TM with a constant factor
overhead. Hint: When heads split or merge in the first cell, the room needs
for their programs creates sparse or dense content regions that propagate right
(sparse faster).
We now describe in detail a simpler but slower universal
Ikeno TM . It simulates any other TM that
uses only bits on the tape. (Other alphabets are encoded as bit strings
for this.) lacks the blank symbol that usually marks the end of input. Thus
input needs to be given in some prefixless form, e.g. with a padding that
encodes input length as a binary string preceded by a string of
zeros. In the cells carrying this padding, two counters are initiated
that monitor the distance to both ends of the used part of 's tape
(initially the input). 's head moving on the tape pulls these counters along
and keeps them updated. When the right end of the used tape is reached, any
subsequent characters are treated as blanks.
has 11 states + 6 symbols; its transition table is below. It shows the
states and tape digits only when changed, except that the prime is always
shown. The head is on the tape: lower case states look left, upper case -
right. The external choice, halt, etc. commands are special states for ; for
they are shown as A/B or =. works in cycles, simulating one transition
of each. The tape is infinite to the right (the leftward head in the
leftmost cell halts.) It consist of segments, each preceded with a *.
Some symbols are primed. The rightmost infinite part of one or two segments is
always a copy of 's tape. The other segments describe one transition each: a
command for to change state to , tape bit
to and turn left or right.
Ikeno UTM Transition Table
1
0
*
1 '
0 '
* '
A
f
f
e0
B
F
F
e1
f,F
b*
a*
F
c
D
d '
--
e '
E
=
--
'
'
'
e '
d
'
'
'
'
D
b
'
'
'
a '
D
a
'
'
'
b '
F
E '
c
'
'
=
F
E '
e
'
'
'
B
A
A/B
The transition segments are sorted in order of and never change,
except for priming. Each transition is represented as , where is the
bit to write, the direction to turn, points to the next
state represented as , if it is segments to the left, or (if to
the right). Each cycle starts with 's head in state or , located at
the site of 's head. Primed are the digits of in the prior command and
all digits to their left. An example of the configuration:
first reads the bit of an 's cell changing the state from or
to , puts a * there, moves left to the primed state segment , finds
from it the command segment and moves there. It does this by repeatedly priming
nearest unprimed * and 1s of (or unpriming 0s) while alternating the states
or . When is exhausted, the target segment stars away
is reached. Then reads (changing state from to or ) the
rightmost symbol of the command, copies it at the * in the area,
goes back, reads the next symbol , returns to the just overwritten (and
first unprimed) cell of area and turns left or right. As CA, and
have in each cell three standard bits: present and previous pointer directions
and a "content" bit to store M's symbol. In addition needs just one "trit"
of its own! (See its simulator at
www.cs.bu.edu/teaching/cs332/TM/simulator.)
2.2 Uncomputability; Goedel Theorem
Universal and Complete Functions
Notations: Let us choose a special mark and after its -th occurrence, break
any string into Prefix and Suffix. Let be
Prefix and be Suffix. We say -simulates iff for some Prefix and all
, . The prefix can be intuitively viewed as a program which
simulating function applies to the suffix (input). We also consider a
symmetric variant of relation "-simulate" which makes some proofs easier.
Namely, -intersects iff for some
prefix and all . E.g., length preserving functions can intersect
but cannot simulate one another.
We call universal for a class , any which
-simulates all functions in for a fixed . When contains
for each , universality is equivalent to (or implies, if only ) completeness: -intersects all . Indeed,
-simulates iff it -intersects ; -intersects if
it -simulates .
Exercise: Describe explicitly a function, complete
for the class of all linear (e.g., or ) functions.
A negation of a (partial or total) function is the total
predicate which yields iff and yields otherwise.
Obviously, no closed under negation class of functions contains a complete one.
So, there is no universal function in the class of all (computable or not)
predicates. This is the well known Cantor Theorem that the set of all sets of
strings (as well as the sets of all functions, reals etc.) is not countable.
Goedel Theorem
There is no complete function among the total computable ones,
as this class is closed under negation. So the universal in function
(and ) has no total computable extensions.
Formal proof systems are computable functions which check if is
an acceptable proof and output the proven statement. means
for some . is rich iff it allows computable translations
of statements "," provable whenever true, and refutable
(), whenever . is consistent iff
at most one of any such pair is provable, and
complete iff at least one of them always (even when
diverges) is. Rich consistent and complete formal systems cannot exist,
since they would provide an obvious total extension of (by
exhaustive search for to prove or refute ). This is the famous
Goedel's Theorem -- one of the shocking surprises of the 20th century
science. (Here is any extension of the formal Peano Arithmetic; we skip the
details of its formalization and proof of richness.)
(A closer look at this proof reveals another famous Goedel theorem:
Consistency of (expressible in as divergence of the search for
contradictions) is itself an example of unprovable . Indeed,
intersects for some prefix . implies that extends
and, thus, both diverge. So, . This
proof can be formalized in Peano Arithmetic, thus . But implies converges, so
contradicts : Consistency of is provable in if and only if false !)
Recursive Functions
Another byproduct is that the Halting (of ) Problem would yield a total
extension of and, thus, is not computable. This is the source of many other
uncomputability results. Another source is an elegant Fixed Point
Theorem by S. Kleene: any total computable transformation of
programs (prefixes) maps some program into an equivalent one. Indeed, the
complete/universal intersects computable . This implies,
e.g., Rice theorem: the only computable invariant (i.e. the same on programs
computing the same functions) property of programs is constant
(exercise).
Computable (partial and total) functions are also called
recursive (due to an alternative definition).
Their ranges (and, equivalently, domains) are called (recursively)
enumerable or r.e. sets. An r.e. set with an
r.e. complement is called recursive (as is its yes/no characteristic function)
or decidable. A function is recursive iff its graph is r.e.
An r.e. graph of a total function is recursive. Each infinite r.e. set is the
range of an injective total recursive function ("enumerating" it, hence the
name r.e.).
We can reduce membership problem of a set to the one of a set by
finding a recursive function s.t. . Then is
called m- (or many-to-1-)
reducible to . A more complex Turing
reduction is given by an algorithm which, starting from input , interacts
with by generating strings and receiving answers to
questions. Eventually it stops and tells if . R.e. sets (like Halting
Problem) to which all r.e. sets can be m-reduced are called r.e.-complete. One
can show a set r.e.-complete (and, thus, undecidable) by reducing the Halting
Problem to it. So Ju.Matijasevich proved r.e.-completeness of Diophantine
Equations Problem: given a multivariate polynomial of degree 4 with integer
coefficients, find if it has integer roots.
The above (and related) concepts and facts are broadly used in Theory of
Algorithms and should be learned from any standard text, e.g.,
[Rogers 67].
2.3 Intractability; Compression and Speed-up Theorems
The -restriction of aborts and outputs if
does not halt within steps, i.e. computes the
-Bounded Halting Problem (-BHP). It remains complete for
the closed under negation class of functions computable in steps.
( overhead is absorbed by and padding .)
So, is not in the class, i.e. cannot be computed in time [Tseitin 56]. (And neither can be any function
agreeing with -BHP on a dense (i.e. having strings with each
prefix) subset.) E.g. -BHP requires exponential time.
However for some trivial input programs the BHT can obviously be answered
by a fast algorithm. The following theorem provides another function
(which can be made a predicate) for which there is only a finite number of such
trivial inputs. We state the theorem for the volume of computation of
Multi-Head Turing Machine. It can be reformulated in terms of time of Pointer
Machine and space (or, with smaller accuracy, time) of regular Turing Machine.
Definition: A function is constructible if
it can be computed in volume .
Here are two examples: is constructible, as
. Yet, , where is
or , depending on whether halts within steps, is not.
Compression Theorem [Rabin 59]:
For any constructible function , there exists a function such that
for all functions , the following two statements are equivalent:
There exists an algorithm such that
computes in volume for all inputs .
is constructible and.
Proof. Let -bounded Kolmogorov Complexity of given be the length of the shortest program for the
Universal Multi-Head Turing Machine transforming into with
volume of computation. Let be the smallest , with for all . is computed in volume by generating
all of low complexity, sorting them and taking the first missing. It
satisfies the Theorem, since computing faster would violate the
complexity bound defining it. (Some extra efforts can make boolean.) Q.E.D.
Speed-up Theorem[Blum 67].
There exists a total computable predicate such that for
any algorithm computing in volume , there exists
another algorithm doing it in volume .
Though stated here for exponential speed-up, this theorem remains true
with replaced by any computable unbounded monotone function.
In other words, there is no even nearly optimal algorithm to compute .
The general case. So, the complexity of some predicates cannot
be characterized by a single constructible function , as in Compression
Theorem. However, the Compression Theorem remains true (with harder proof) if
the requirement that is constructible is dropped (replaced with being
computable). In this form it is general enough so that every computable
predicate (or function) satisfies the statement of the theorem with an
appropriate computable function .
(The proof stands if constructibility of is weakened to being
semi-constructible, i.e. one with an algorithm running
in volume and such that if . The sets of programs
whose volumes (where finite) satisfy either (1) or (2) of the Theorem
(for computable ) are in (i.e. defined with 2 quantifiers).
Both generate monotone classes of constructible functions closed under
. Then any such class is shown to be the
for some semi-constructible.)
There is no contradiction with Blum's Speed-up, since
the complexity (not constructible itself) cannot be reached.
See a review in [Seiferas, Meyer].
3 Games; Alternation; Exhaustive Search; Time vs. Space
3.1 How to Win
In this section we consider a more interesting provably intractable
problem: playing games with full information, two players and zero sum. We will
see that even for some simple games there cannot be a much more efficient
algorithm than exhaustive search through all possible configurations.
The rules of an -player game are set by families
of information and value functions and
a transition rule. Each player at each step
participates in transforming a configuration (game position) into the
new configuration by choosing a move based only
on his knowledge of . The game proceeds until a
terminal configurations is reached. Then
is the loss or gain of the -th player. Our games will have
zero sum and full information:, , where points to
the active player. We consider binary, two-players, no draw
games, taking , , ,
sign, , and .
Our examples will assure "" by implicitly prepending
non-terminal configurations with a counter of remaining steps.
An example of such games is chess. Examples of games without full
information are card games, where only a part (player's own hand) of
the position is known. Each player may have a strategy
providing a move for each position. A strategy is winning
if it guarantees victory whatever the opponent does, even if he knows . We
can extend on to on all positions with a winning strategy for one
side so that . ( taken as .)
Evaluating or solving a game, means
computing . This ability is close to the ability to find a good move in a
modified game. Indeed, modify a game into by adding a preliminary
stage to it. At this stage the player offers a starting position for
and her opponent chooses which side to play. Then may either start
playing or decrement the counter of unused positions and offer another one.
Obviously, wins if he can determine the winning side of every position. If
he cannot while can, wins. Also, any game can be modified into one with
two moves: by breaking a string move into several bit-moves.
(A position of the new game consists of a position of the old one and a
prefix of a move. The active player keeps adding bits to until is
complete and the next position generated by .)
Evaluating such games is obviously sufficient for choosing the right move.
Theorem. Each position of any full information game has a
winning strategy for one side.
(This theorem [Neumann, Morgenstern 44] fails
for games with partial information: either player may lose if
his strategy is known to the adversary. E.g.: 1. Blackjack (21); 2. Each player
picks a bit; their equality determines the winner.) The game can be solved by
playing all strategies against each other. There are positions of length
, strategies and pairs of
them. For a 5-bit game that is . The proof of this Theorem gives a
much faster (but still exponential time!) strategy.
Proof: Make the graph of all -bit positions and moves;
Set ; reset on . Repeat until idle: If , set
. The procedure stops with empty
since in our games keep decreasing.
Games may be categorized by the difficulty to compute . We will consider
only computable in linear space . Then, the possible
moves can be computed in exponential time, say . The algorithm
tries each move in each step. Thus, its total running time is :
extremely slow ( for a 13-byte game) but still
much faster than the previous (double exponential) algorithm.
Exercise: the Match Game. Consider 3 boxes with 3 matches each:
.
The players alternate turns taking any positive number of matches
from a single box. One cannot leave the table empty. Use the above
algorithm to evaluate all positions and list the evaluations after each its
cycle.
Exercise: Modify the chess game by giving one side the right to make
(if it chooses to) an extra move out of turn during the first 10 moves. Prove
that this side have a non-loosing strategy.
3.2 Exponentially Hard Games
A simple example of a full information game is Linear Chess,
played on a finite linear board. Each piece has a 1-byte type, including
loyalty to one of two sides: W (weak) or S (shy), gender M/F
and a 6-bi rank. All cells of the board are filled and all W's are
always on the left of all S's. Changes occur only at the active
border where W and S meet (and fight). The winner of a fight is determined by
the following Gender Rules:
1. If S and W are of the same sex, W (being weaker) loses.
2. If S and W are of different sexes, S gets confused and loses.
The party of a winning piece A replaces the loser's piece B by its own piece
C. The choice of C is restricted by the table of rules listing all allowed
triples (ABC). We will see that this game cannot be solved in a
subexponential time. We first prove that (see
[Chandra, Kozen, Stockmeyer 1981]) for an
artificial game. Then we reduce this Halting Game to Linear
Chess showing that any fast algorithm to solve Linear Chess, could be used to
solve Halting Game, thus requiring exponential time. For Exp-Time Completeness
of regular (but ) Chess, Go, Checkers see:
[Fraenkel, Lichtenstein 1981],
[Robson 1983, 1984].
Exptime Complete Halting Game
We use a universal Turing Machine (defined as 1-pointer cellular automata)
which halts only by its head rolling off of the tape's left end, leaving a
blank. Bounded Halting Problem BHP determines if stops (i.e. the
leftmost tape cell points left) within steps. This cannot be done
in steps. We now convert into the Halting Game.
The players are: claiming halts in time (and should have winning
strategy iff this is true); His opponent is . The board has
four parts: the diagram, the input to , positive integers (position)
and (time in the execution of ):
The diagram shows the states of cell at time and of cells , at time . include the pointer direction;
may be replaced by "?". Some board configurations are illegal:
if (1) two of point away from each other, or (2) differs from the
result prescribed by the transition rules for , or (3) , while . (At , is just starting, so its tape has the input
at the left starting with the head in the initial state, followed by blanks at
the right.) Here are the Game Rules:
The game starts in the configuration shown below.
moves first, replacing the ?s with symbols that claim to reflect the state
of cells at step of . in its move chooses , copies
into , fills all with ?s, adds to , and decrements .
Start: ;
puts: ;
puts:
Note that may lie (i.e fill in "?" distorting the actual computation of
), as long as he is consistent with the above "local" rules. All
can do is to check the two consecutive board configurations. He cannot refer to
past moves or to actual computation of as an evidence of 's
violation.
Strategy: If does indeed halt within steps, then
the initial configuration is true to the computation of . Then has an
obvious (though hard to compute) winning strategy: just tell truly (and thus
always consistently) what actually happens in the computation. will lose
when and cannot decrease any more. If the initial configuration is a lie,
can force to lie all the way down to . How?
If the upper box of a legal configuration is false then the lower boxes
cannot all be true, since the rules of determine uniquely from
them. If correctly points the false and brings it to the top on his
move, then is forced to keep on lying. At time the lie is exposed:
the configuration doesn't match the actual input string , i.e. is illegal.
Solving this game amounts to deciding correctness of the initial
configuration, i.e. halting in steps: impossible in in time
. This Halting Game is artificial, still has a BHP flavor, though
it does not refer to exponents. We now reduce it to a nicer game (Linear Chess)
to prove it exponentially hard, too.
3.3 Reductions; Non-Deterministic and Alternating TM; Time and Space
To reduce (see definition in
sec. Recursive Functions) Halting game to
Linear Chess we introduce a few concepts.
A non-deterministic Turing Machine (NTM) is a TM that
sometimes offers a (restricted) transition choice, made by a
driver, a function (of the TM configuration) of unrestricted
complexity. A deterministic (ordinary) TM accepts a string if
yes; an NTM does if there exists a driver s.t. yes. NTM
represent single player games -- puzzles -- with a simple transition rule, e.g.,
Rubik's Cube. One can compute the winning strategy in exponential time by
exhaustive search of all .
Home Work: Prove all such games have P-time winning strategies, or
show some have not. Will get you grade A for the course, a
Award and a senior faculty rank at a school of your choice.
The alternating TM (ATM) is a variation of the NTM driven by
two alternating drivers (players) , . A string is accepted if there is
such that for any yes. Our games could be viewed as ATM
returning the result of the game in linear space but possibly exponential time.
prompts and alternatingly to choose their moves (in several steps if
the move is specified by several bits) and computes the resulting position,
until a winner emerges. Accepted strings describe winning positions.
Linear Chess (LC) Simulation of TM-Games
The simulation first represents the Halting Game as an ATM computation
simulated by the Ikeno TM (using the "A/B"
command for players' input). The UTM is viewed as an array of 1-pointer
cellular automata: Weak cells as rightward, Shy leftward. Upon termination, the
TM head is set to move to the end of the tape, eliminating all loser pieces.
This is viewed as a game of 1d-Chess (1dC), a variant of
LC, where the table, not the "Gender Rule," determines the victorious
piece, and not only the vanquished piece is replaced, but also the winning
piece may be "promoted to" another type of the same side. The types are states
of Ikeno TM showing Loyalty (pointer direction) , gender (=
previous ), and 6/6/6/5 ranks (trit with bit
).
Exercise: Describe LC simulation of 1dC.
Hint: Each 1dC transition is simulated in several LC stages.
Let L,R be the two pieces active in 1dC. In odd stages L (in even stages R)
changes gender while turning pointer twice. The last stage turns pointer
only once and possibly changes gender. In the first stage L appends its rank
with R's bit. All other stages replace old 1dC rank with the new one.
R appends its old bit (only if ) to its new rank. Subsequent
stages drop both old bits, marking L instead if it is the new 1dC head.
Up to 4 more stages are used to exit any mismatch with 1dC new bits.
Space-Time Trade-off
Deterministic linear space computations are games
where any position has at most one (and easily computable) move.
We know no general superlinear lower bound or subexponential upper
bound for time required to determine their outcome. This is a big
open problem.
Recall that on a parallel machine: time is the number of
steps until the last processor halts; space is the amount of
memory used; volume is the combined number of steps of all
processors. "Small" will refer to values bounded by a
polynomial of the input length; "large" to exponential. Let us
call computations narrow if either time or
space are polynomial, and compact if both (and, thus, volume
too) are. An open question: Do all exponential volume algorithms (e.g.,
one solving Linear Chess) allow an equivalent narrow computation?
Alternatively, can every narrow computation be converted into
a compact one? This is equivalent to the existence of a P-time algorithm
for solving any fast game, i.e. a game with a P-time
transition rule and a move counter limiting the number of moves to a
polynomial. The sec. How to Win algorithm can
be implemented in parallel P-time for such games. Converse also holds,
similarly to the Halting Game.
[Stockmeyer, Meyer 1973] solve compact games
in P-space: With run depth-first search on the tree of all
games - sequences of moves. On exiting each node it is marked as the active
player's win if some move leads to a child so marked; else as his opponent's.
Children's marks are then erased. Conversely, compact games can simulate any
P-space algorithms. Player A declares the result of the space-, time-
computation. If he lies, player B asks him to declare the memory state in
the middle of that time interval, and so by a k-step binary search catches A's
lie on a mismatch of states at two adjacent times. This has some flavor of
trade-offs such as saving time at the expense of space in dynamic programming.
Thus, fast games (i.e. compact alternating computations) correspond to
narrow deterministic computations; general games (i.e. narrow alternating
computations) correspond to large deterministic ones.
Part II Mysteries
We now enter Terra incognita by extending deterministic computations with tools
like random choices, non-deterministic guesses, etc., the power of which is
completely unknown. Yet many fascinating discoveries were made there in which
we will proceed to take a glimpse.
4 Nondeterminism; Inverting Functions; Reductions
4.1 An Example of a Narrow Computation: Inverting a Function
Consider a P-time function . For convenience, assume ,
(often suffices). Inverting means finding, for
a given , at least one , i.e. such that .
We may try all possible for . Assume runs in linear time on
a Pointer Machine. What is the cost of inverting ? The space used
is space. But time is :
absolutely infeasible. No method is currently proven much better in the worst
case. And neither could we prove some inversion problems to require
super-linear time. This is the sad present state of Computer Science!
An Example: Factoring. Let be the product of
integers . For simplicity, assume are primes. A fast
algorithm in sec. A Monte-Carlo Primality
Tester determines if an integer is prime. If not, no factor is given, only
its existence. To invert means to factor . The density of -bit
primes is . So, factoring by exhaustive search takes
exponential time!
In fact, even the best known algorithms for this ancient problem run in time
about , despite centuries of efforts by most brilliant
people. The task is now commonly believed infeasible and the security of many
famous cryptographic schemes depends on this unproven faith.
One-Way Functions: are those easy to compute and hard to invert for most . Even their existence is
sort of a religious belief in Computer Theory. It is unproven, though many
functions seem to be one-way. Some functions, however, are proven to
be one-way, IFF one-way functions EXIST. Many theories and applications are
based on this hypothetical existence.
Search and NP Problems
Let us compare the inversion problems with another type: the search problems
specified by computable in time relations : given ,
find s.t. . There are two parts to a search problem: (a) decision
problem: decide if (called witness) exist, and (b) a
constructive problem: actually find .
Any inversion problem is a search problem and any search problem can be
restated as an inversion problem. E.g., finding a Hamiltonian cycle in a
graph , can be stated as inverting a , which outputs
if is in fact a Hamiltonian cycle of . Otherwise, .
Similarly any search problem can be reduced to another one equivalent to
its decision version. For instance, factoring reduces to bounded
factoring: given find such that , (where decisions
yield construction by binary search).
Exercise: Generalize the two above examples to reduce any search
problem to an inverting problem and to a decision problem.
The language of a problem is the set of all acceptable
inputs. For an inversion problem it is the range of . For a search problem
it is the set of all s.t. holds for some . An NP
language is the set of all inputs acceptable by a P-time
non-deterministic Turing Machine (sec.
Reductions). All three classes of
languages - search, inversion and NP - coincide (NP search is
straightforward).
Interestingly, polynomial space bounded deterministic and
non-deterministic TMs have equivalent power. It is easy to modify TM to have a
unique accepting configuration. Any acceptable string will be accepted in time
, where is the space bound. Then we need to check :
whether the TM can be driven from the configuration to in time and space . For this we need for every , to check
and , which takes space . So,
[Savitch 1970].
Search problems are games with P-time transition rules and one move
duration. A great hierarchy of problems results from allowing more moves and/or
other complexity bounds for transition rules.
4.2 Complexity of NP Problems
We discussed the (equivalent) inversion, search, and NP types of problems.
Nobody knows whether all such problems are solvable in P-time (i.e.
belong to P). This question (called P=?NP) is probably the most famous one in
Theoretical Computer Science. All such problems are solvable in exponential
time but it is unknown whether any better algorithms generally exist. For many
problems the task of finding an efficient algorithm may seem hopeless, while
similar or slightly modified problems have been solved. Examples:
Linear Programming: Given integer matrix and vector
, find a rational vector with . Note, if and entries in
have -bits and exists then an -bit exists, too.
Solution: The Dantzig's Simplex algorithm finds quickly
for many . Some , however, take exponential time. After long frustrating
efforts, a worst case P-time Ellipsoid Algorithm was finally found in
[Yudin Nemirovsky 1976].
Primality test: Determine whether a given integer has a factor?
Solution: A bad (exponential time) way is to try all possible
integer factors of . More sophisticated algorithms, however, run fast (see
section Monte-Carlo Primality Tester).
Graph Isomorphism Problem: Are two given graphs isomorphic ?
I.e., can the vertices of be re-numbered so that it becomes equal ?
Solution: Checking all enumerations of vertices is impractical (for
, this exceeds the number of atoms in the known Universe).
[Luks 1980] found an steps algorithm
where is the degree. This is a P-time for .
Independent Edges (Matching): Find a given number of independent
(i.e., not sharing nodes) edges in a given graph.
Solution: Max flow algorithm solves a bipartite graph case.
The general case is solved with a more sophisticated algorithm by J. Edmonds.
Many other problems have been battled for decades or centuries and no P-time
solution has been found. Even modifications of the previous four examples have
no known answers:
Linear Programming: All known solutions produce rational . No
reasonable algorithm is known to find integer .
Factoring:
Given an integer, find a factor. Can be done in about exponential time
. Seems very hard: Centuries of quest for fast algorithm were
unsuccessful.
Sub-graph isomorphism: In a more general case of finding isomorphisms
of a graph to a part of another, no P-time solution has been found, even for
-degree graphs.
Independent Nodes: Find independent (i.e., not sharing edges) nodes
in a given graph. No P-time solution is known.
Exercise:
Restate the above problems as inverting easy to compute functions.
We learned the proofs that Linear Chess and some other games have
exponential complexity. None of the above or any other search/inversion/NP
problem, however, have been proven to require super-P-time. When, therefore, do
we stop looking for an efficient solution?
NP-Completeness theory is an attempt to answer this question. See
results by S.Cook, R.Karp, L.Levin, and others surveyed
in [Garey, Johnson],
[Trakhtenbrot]. A P-time
function reduces one NP-predicate to iff , for all . is NP-complete if all NP problems can
be reduced to it. Thus, each NP-complete problem is at least as worst-case hard
as all other NP problems. This may be a good reason to give up on fast
algorithms for it. Any P-time algorithm for one NP-complete problem would yield
one for all other NP (or inversion, or search) problems. No such solution has
been discovered yet and this is left as a homework (10 years deadline).
Faced with an NP-complete problem we can sometimes restate it, find a
similar one which is easier (possibly with additional tools) but still gives
the information we really want. We will do this in Sec.
Monte-Carlo Primality Tester for factoring.
Now we proceed with an example of NP-completeness.
4.3 An NP-Complete Problem: Tiling
Tiling Problem. Invert the function which, given a tiled square, outputs
its first row and the list of tiles used. A tile is one of the possible
squares containing a Latin letter at each corner. Two tiles may be placed next
to each other if the letters on the shared side match. An example:
.
We now reduce to Tiling any search problem: given , find
satisfying a P-time computable property .
Padding Argument. First, we need to reduce it to some "standard" NP
problem. An obvious candidate is the problem "Is there ?", where
is the universal Turing Machine, simulating for .
But does not run in P-time, so we must restrict to which stops
within some P-time limit. How to make this fixed degree limit sufficient to
simulate any polynomial (even of higher degree) time ? Let the TM
for simulate steps of . If the
padding of 's in is sufficiently long, will have
enough time to simulate , even though runs in quadratic time, while
's time limit may be, say, cube (of a shorter "un-padded" string). So the NP
problem is reduced to by mapping instances into , with determined by the time limit for . Notice that
program for is fixed. So, if some NP problem cannot be solved
in P-time then neither can be the problem . Equivalently,
if the latter IS solvable in P-time then so is any search
problem. We do not know which of these alternatives is true. It remains to
reduce the search problem to Tiling.
The Reduction. We compute (where ) by a
TM represented as an array of 1-pointer cellular automata that runs for
steps and stops if does NOT solve the relation . Otherwise it
enters an infinite loop. An instance has a solution iff runs
forever for some and .
In the space-time diagram of computation of we set to 's
time (and space) . Each row in it represents the configuration of
in the respective moment of time. The first row is the initial configuration:
. It has a special symbol "?" as a placeholder for the
solution which is filled in at the second step below it. If a table is
filled in wrongly, i.e. doesn't reflect any actual computation, then it must
have four cells sharing a corner that couldn't possibly appear in the
computation of on any input.
Proof: As the input and the guessed solution are the same in
both the right and the wrong tables, the first 2 rows agree. The actual
computation starts on the third row. Obviously, in the first mismatching row
a transition of some cell from the previous row is wrong. This is visible from
the state in both rows of this cell and the cell it points to, resulting in an
impossible combination of four adjacent squares.
For a given , the existence of satisfying is equivalent to
the existence of a table with the prescribed first row, no halting state, and
permissible patterns of each four adjacent squares (cells).
Converting the table into the Tiling Problem: Cut each
cell into 4 parts by a vertical and a horizontal lines through its center and
copy cell's content in each part. Combine into a tile each four parts sharing a
corner of 4 cells. If these cells are permissible in the table, then so is the
respective tile.
So, any P-time algorithm extending a given first row to the whole table of
matching tiles from a given set could be used to solve any NP problem by
converting it to Tiling as shown.
Exercise:
Find a polynomial time algorithm for Tiling Problem.
5 Probability in Computing
5.1 A Monte-Carlo Primality Tester
The factoring problem seems very hard. But to test if a number has factors
turns out to be much easier than to find them. It also helps if we supply the
computer with a coin-flipping device. See: [Rabin
1980], [Miller
1976], [Solovay, Strassen 1977].
We now consider a Monte Carlo algorithm, i.e. one that with high probability
rejects any composite number, but never a prime.
Residue Arithmetic
means divides . means . denotes the residue of when divided by , i.e. . Residues can be added, multiplied and subtracted with the
result put back in the range via shifting by an appropriate
multiple of . E.g., means for residues . We use to mean either or .
The Euclidean Algorithm finds - the greatest (and divisible by
any other) common divisor of and : ; , for . By induction, , where
integers and are produced as a byproduct of
Euclid's Algorithm. This allows division by any coprime with (i.e. ), and operations
obey all usual arithmetical laws.
We will need to compute in polynomial time. We cannot do
multiplications. Instead we compute all numbers
. Then we represent
in binary, i.e. as a sum of powers of and multiply the needed
's.
Fermat Test. The Little Fermat Theorem for every prime
and says: .
Indeed, the sequence is a permutation of .
So, .
This test rejects typical composite .
Other composite can be actually factored by the following tests.
Square Root Test. For each y and prime p,
has at most one pair of solutions .
Proof:. Let be two solutions: . Then . So, divides and,
if prime, must divide either or .
(Thus either or .) Otherwise is
composite, and actually gives its factor.
Miller-Rabin Test
completes the Fermat Test: it factors a composite , given that
kills (i.e. for all )
and a random choice of . For prime , .
Let , with odd . sets ,
, .
If then factors (if killed ,
else Fermat test rejects ). If , or one of is ,
gives up for this . Otherwise for some ,
while , and the Square Root Test factors .
First, for each odd composite , we show that succeeds with
some, coprime with . If , then works for
Fermat Test: , since . Otherwise
, . Take the greatest such that
for some coprime with . It exists:
for odd . So, . (Or , so Fermat test works.) Then , while .
So, either or is .
Now, succeeds with most as it does with (or
): the function is 1-1 and cannot fail with both and
. This test can be repeated for many randomly chosen . Each time
fails, we are twice more sure that is prime. The probability of
failures on a composite is , its inverse exceeds the number
of atoms in the known Universe.
5.2 Randomized Algorithms and Random Inputs
Las-Vegas algorithms, unlike Monte-Carlo, never give wrong
answers. Unlucky coin-flips just make them run longer than expected. Quick-Sort
is a simple example. It is about as fast as deterministic sorters, but is
popular due to its simplicity. It sorts an array of numbers
by choosing in it a random pivot, splitting the remaining array
in two by comparing with the pivot, and calling itself recursively on each
half.
For easy reference, rename the array entries with their positions
in the sorted output (no effect on the algorithm). Denote the
(random) time is chosen as a pivot. Then will ever be compared with
iff either or is the smallest among .
This has out of chances. So, the expected number of comparisons
is .
Note, that the expectation of the sum of variables is
the sum of their expectations (not true, say, for product).
The above Monte-Carlo and Las-Vegas algorithms require choosing strings
at random with uniform distribution. We mentally picture that as flipping
a coin. (Computers use pseudo-random generators rather than
coins in hope, rarely supported by proofs, that their outputs have all the
statistical properties of truly random coin flips needed for the analysis of
the algorithm.)
Random Inputs to Deterministic Algorithms are analyzed similarly to
algorithms that flip coins themselves and the two should not be confused.
Consider an example: Someone is interested in knowing whether or not certain
graphs contain Hamiltonian Cycles. He offers graphs and pays 100 if we show
either that the graph has or that it has not Hamiltonian
Cycles. Hamiltonian Cycle problem is NP-Complete, so it should be very hard for
some, but not necessarily for most graphs. In fact, if our
patron chooses the graphs uniformly, a fast algorithm can earn us the 100
most of the time! Let all graphs have nodes and, say, edges and be equally likely. Then we can use the following (deterministic)
algorithm: output "No Hamiltonian Cycles" and collect the 100, if
the graph has an isolated node. Otherwise, pass on that graph and the money.
Now, how often do we get our 100. The probability that a given node of
the graph is isolated is . Thus, the
probability that none of nodes is isolated (and we lose our
100) is and vanishes fast. Similar
calculations can be made whenever . If ,
other fast algorithms can actually find a Hamiltonian Cycle. See:
[Johnson 1984],
[Karp 1976],
[Gurevich 1985]. See also
[Levin, Venkatesan] for a proof that another
graph problem is NP-complete even on average. How do this HC algorithm and the
above primality test differ?
The primality algorithm works for all instances. It
tosses the coin itself and can repeat it for a more reliable answer. The HC
algorithms only work for most instances (with isolated nodes or
generic HC).
In the HC algorithms, we must trust the customer to follow the presumed
random procedure. If he cheats and produces rare graphs often, the
analysis breaks down.
Symmetry Breaking
Randomness comes into Computer Science in many other ways besides those we
considered. Here is a simple example: avoiding conflicts for shared resources.
Dining Philosophers. They sit at a circular table. Between each pair
is either a knife or a fork, alternating. The problem is, neighboring diners
must share the utensils, cannot eat at the same time. How can the philosophers
complete the dinner given that all of them must act in the same way without any
central organizer? Trying to grab the knives and forks at once may turn them
into fighting philosophers. Instead they could each flip a coin, and sit still
if it comes up heads, otherwise try to grab the utensils. If two diners try to
grab the same utensil, neither succeeds. If they repeat this procedure enough
times, most likely each philosopher will eventually get both a knife and a fork
without interference.
We have no time to actually analyze this and many other scenaria, where
randomness is crucial. Instead we will take a look into the concept of
Randomness itself.
5.3 Arithmetization: One-Player Games with Randomized Transition
The results of section Games can be extended to
Arthur-Merlin games which have one player -- Merlin -- but a
randomized transition function, effectively using a dummy second player --
Arthur -- whose moves are just coin-flips. We will reduce generic games to
games in which any Merlin's strategy in any losing position has exponentially
small chance to win.
The trick achieving this, called arithmetization, expresses
the boolean functions as low degree polynomials, and applies them to
-tokens (let us call them bytes) instead of bits. It was
proposed in Noam Nisan's article widely distributed over email in the Fall of
1989 and quickly used in a flood of follow-ups for proving relations between
various complexity classes. We follow [Shamir 1992],
[Fortnow, Lund 1993].
Let be the (ATM-complete) game of 1d-Chess (Linear
Chess Simulation), with ,
be its transition rule. Configurations include and a remaining moves
counter . They are terminal if , winning to the player .
Intermediate configurations have claimed as a prefix of .
Let be if , else . 1d-Chess is simple, so
can be expressed as a product of multilinear -sized terms, any
variable shared by at most two terms. Thus is a polynomial, quadratic in
each . Let be if the active player has a strategy to
win in the moves left, i.e. , , where
for or
for shorter .
( stands for concatenation.)
will allow Merlin to prove is winning i.e., .
Configurations of replace bits with bytes and add
reflecting Merlin's claimed . The polynomial is
quadratic in each , as is. Then has degree 4 in
and has degree 6 in .
Merlin starts with choosing a -bit prime . Then at each step with
Merlin gives an -degree polynomial with
for -byte or for shorter . Arthur
then selects a random and becomes for
-byte or for shorter .
If the original is correct, then Merlin's obvious winning strategy is to
always provide the correct polynomial. If the original is wrong then either
or must be wrong, too, so they will agree only with a wrong . A
wrong can agree with the correct one only on few (bounded by degree)
points. Thus it will give a correct value only to exponentially small fraction
of random . Thus the wrong will propagate throughput the game until it
becomes obvious in the terminal configuration.
This reduction of section Games settings yields a
hierarchy of Arthur-Merlin games powers, i.e. the type of computations that
have reductions to of such games and back. The one-player games with
randomized transition rule running in space linear in the size of initial
configuration are equivalent to exponential time deterministic computations. If
instead the running time of combined for all steps is limited by a
polynomial, then the games are equivalent to polynomial space deterministic
computations.
An interesting twist comes in one move games with polylog , too tiny
to examine the initial configuration and the Merlin's move .
But not only this obstacle is removed but the equivalence to NP is achieved
with a little care. Namely, is set in an error-correcting code, and is
given coin-flips and random access to the digits of .
Then the membership proof is reliably verified by the randomized .
See [Holographic proof] for details and references.
6 Randomness
6.1 Randomness and Complexity
Intuitively, a random sequence is one that has the same properties as a
sequence of coin flips. But this definition leaves the question, what
are these properties? Kolmogorov resolved these problems with a new
definition of random sequences: those with no description noticeably shorter
than their full length. See survey and history in
[Kolmogorov, Uspensky 1987],
[Li, Vitanyi 2019].
Kolmogorov Complexity of the string given is the
length of the shortest program which lets algorithm transform into
: . There exists a Universal Algorithm such
that, , for every algorithm . This constant is
bounded by the length of the program needs to simulate . We abbreviate
as , or for empty .
An example: For , , so
.
Can we compute by trying all programs to
find the shortest one generating ? This does not work because some programs
diverge, and the halting problem is unsolvable. In fact, no algorithm can
compute or even any its lower bounds except .
Consider the Berry paradox expressed in the phrase:
"The smallest integer which cannot be uniquely and clearly defined
by an English phrase of less than two hundred characters."
There are English phrases of characters.
So there must be integers not expressible by such phrases and the
smallest one among them. But isn't it described by the above phrase?
A similar argument proves that is not computable. Suppose an algorithm
computes a lower bound for . We can use it to compute
that finds with , but and
, so : a
contradiction. So, and its non-constant lower bounds are not computable.
An important application of Kolmogorov Complexity measures the Mutual
Information:
. It has many uses which we cannot consider here.
Deficiency of Randomness
Some upper bounds of complexity are close in some important cases. One such
case is of generated at random. Define its rarity
for uniform on distribution as .
What is the probability of , for uniformly random -bit ?
There are strings of length . If , then . There are programs of such length, generating
strings. So, the probability of such strings is (regardless of )! Even for ,
the probability of is absolutely negligible
(provided was indeed generated by fair coin flips).
Small rarity implies all other enumerable properties of random strings.
Indeed, let such property "" have a negligible probability and
be the number of -bit strings violating , so is
small. To generate , we need only the algorithm enumerating and the
-bit position of in that enumeration. Then the rarity is large. Each violating will thus also violate the
"small rarity" requirement. In particular, the small rarity implies
unpredictability of bits of random strings: A short algorithm with high
prediction rate would assure large . However, the randomness can only
be refuted, cannot be confirmed: we saw, and its lower bounds are not
computable.
Rectification of Distributions
We rarely have a source of randomness with precisely known distribution. But
there are very efficient ways to convert "roughly" random sources into perfect
ones. Assume, we have such a sequence with weird unknown distribution. We only
know that its long enough ( bits) segments have min-entropy , i.e.
probability , given all previous bits. (Without such we
would not know a segment needed to extract even one not fully predictable bit.)
No relation is required between , but useful are small and
huge . We can fold into an matrix. We also need a
small matrix , independent of and really uniformly
random (or random Toeplitz, i.e. with restriction ). Then
the product has uniform, with accuracy ,
distribution. This follows from [Goldreich, Levin],
which uses earlier ideas of U. and V. Vazirani.
6.2 Pseudo-randomness
The above definition of randomness is very robust, if not practical. True
random generators are rarely used in computing. The problem is not
that making a true random generator is impossible: we just saw efficient ways
to perfect the distributions of biased random sources. The reason lies in many
extra benefits provided by pseudorandom generators. E.g., when experimenting
with, debugging, or using a program one often needs to repeat the exact same
sequence. With a truly random generator, one actually has to record all its
outcomes: long and costly. The alternative is to generate
pseudo-random strings from a short seed. Such methods were
justified in [Blum
Micali], [Yao]:
First, take any one-way permutation (see sec.
crypt) with a hard-core bit
(see below) which is easy to compute from , but infeasible to
guess from with any noticeable correlation. Then take a random
seed of three -bit parts and Repeat:
().
We will see how distinguishing outputs of this generator from strings of
coin flips would imply the ability to invert . This is infeasible if is
one-way. But if P=NP (a famous open problem), no one-way , and no
pseudorandom generators could exist.
By Kolmogorov's standards, pseudo-random strings are not random: let be
the generator; be the seed, , and . Then , thus violating Kolmogorov's definition. We can
distinguish between truly random and pseudo-random strings by simply trying all
short seeds. However this takes time exponential in the seed length.
Realistically, pseudo-random strings will be as good as a truly random ones if
they can't be distinguished in feasible time. Such generators we call
perfect.
Theorem: [Yao] Let
run in time . Let a probabilistic algorithm in
expected (over internal coin flips) time accept and truly random
strings with different by probabilities. Then, for random , one can use
to guess from in time with
correlation .
Proof: Let be the probability that accepts modified by
replacing its first digits with truly random bits. Then is the
probability of accepting and must differ by from the probability
of accepting random string. Then , for
randomly chosen . Let and be the probabilities of accepting
and for , and random -bit .
Then averages to , while
averages to and to .
So, has the stated correlation with . Q.E.D.
If the above generator was not perfect, one could guess from the
sequence with a polynomial (in ) correlation.
But, can be produced from . So, one could
guess from with correlation , which cannot be
done for hard-core .
Hard Core
The key to constructing a pseudorandom generator is finding a hard core for a
one-way . The following is hard-core for any one-way , e.g., for
Rabin's OWF in sec. Cryptography.
[Knuth 1997] has more details and references.
Let .
[Goldreich Levin] converts any method of
guessing from with correlation into an
algorithm of finding , i.e. inverting (slower times than
).
Proof. (Simplified with some ideas of Charles Rackoff.)
Take , , .
Let and .
Assume, for , guesses with
correlation , where
abbreviates , since are fixed throughout the proof.
averaged over random
pairwise independent deviates from its mean (over all ) by (and so is ) with probability . The same for
.
Take a random binary matrix . The vectors ,
are pairwise independent. So, for a fraction
of , sign. We
could thus find for all with probability if we knew .
But is short: we can try all its possible values and check
for each !
So the inverter, for a random and all , computes
. It uses Fast Fourier on to compute
. The sign of is the -th bit for the
-th member of output list. Q.E.D.
6.3 Cryptography
Rabin's One-way Function. Pick random prime numbers
with two last bits , i.e. with odd . Then
is called a Blum number. Its length should make factoring infeasible.
Let be the set of squares, i.e.
quadratic residues (all residues are assumed ).
Lemma. Let be a Blum number, .
Then (1) is a permutation on and (2) The ability to invert
on random is equivalent to that of factoring .
Proof: (1) is odd, so is an
integer. Let , , Since is a multiple of both and
, by the Little Fermat Theorem divides both
and (and, thus ). Then .
(2) The above inverts . Conversely, let for a fraction
of . Each has with
, both with equal chance to be chosen at random. If
generates while the
Square Root Test has both for factoring . Q.E.D.
Picking random primes is easy: they have density . Indeed, one
can see that is divisible by every prime but by no
prime or prime power . So, is an upper bound on the number of primes in
and a lower bound on that in (and in as a simple calculation
shows). And fast VLSI exist to multiply long numbers and check primality.
Public Key Encryption
A perfect way to encrypt a message is to add it bit by bit to a
random string of the same length . The resulting encryption
has the same uniform probability distribution, no matter what is. So it is
useless for the adversary who wants to learn something about , without
knowing . A disadvantage is that the communicating parties must share a
secret as large as all messages to be exchanged, combined. Public
Key Cryptosystems use two keys. One key is needed to encrypt the
messages and may be completely disclosed to the public. The
decryption key must still be kept secret, but need not be sent
to the encrypting party. The same keys may be used repeatedly for many
messages.
Such cryptosystem can be obtained [Blum,
Goldwasser 1982] by replacing the above random by pseudorandom ; . Here a Blum number is chosen
by the Decryptor and is public, but are kept secret. The Encryptor
chooses at random and sends .
Assuming factoring is intractable for the adversary, should be
indistinguishable from random strings (even with known ). Then this
scheme is as secure as if were random. The Decryptor knows and can
compute (see above) and . So, he can find
, and then and .
Another use of the intractability of factoring is digital signatures
[Rivest, Shamir, Adleman 1978],
[Rabin, 1979]. Strings can be released as
authorizations of . Verifying , is easy but the ability of
forging it for generic is equivalent to that of factoring .
Go On!
You noticed that most of our burning questions are still open. Take them on!
Start with reading recent results (FOCS/STOC is a good source). See where
you can improve them. Start writing, first notes just for your friends, then
the real papers. Here is a little writing advice:
A well written paper has clear components: skeleton, muscles, etc. The
skeleton is an acyclic digraph of basic definitions and statements, with
cross-references. The meat consists of proofs (muscles) each
separately verifiable by competent graduate students having to read no
other parts but statements and definitions cited. Intuitive comments, examples
and other comfort items are fat and skin: a lack or excess will not make the
paper pretty. Proper scholarly references constitute clothing, no paper should
ever appear in public without! Trains of thought which led to the discovery are
blood and guts: keep them hidden. Metaphors for other vital parts, like open
problems, I skip out of modesty.
Writing Contributions.
Section 1 was originally prepared by
Elena Temin, Yong Gao, and Imre Kifor (BU), others by Berkeley students:
2.3 by Mark Sullivan,
3.1 by Eric Herrmann and Elena Eliashberg,
3.2 by Wayne Fenton and Peter Van Roy,
3.3 by
Carl Ludewig, Sean Flynn, and Francois Dumas,
4.1 by
Jeff Makaiwi Brian Jones and Carl Ludewig,
4.2 by David Leech and Peter Van Roy,
4.3 by Johnny and Siu-Ling Chan,
5.2 by Deborah Kordon,
5.3 by Carl Ludewig,
5.4 by Sean Flynn, Francois Dumas,
Eric Herrmann, 5.5 by Brian Jones.
References
L.Levin. Fundamentals of Computing: a Cheat-List.
SIGACT News; Education Forum. Special 100-th issue, 27(3):89-110,
1996. Errata: ibid. 28(2):80. Earlier version: ibid. 22(1) 1991.
Jon Kleinberg, Eva Tardos. Algorithm design. 2006.
Pearson Education.
M. Blum. A machine-independent theory of the complexity
of recursive functions. J. ACM 14, 1967.
M.Davis, ed. The Undecidable.
Hewlett, N.Y. Raven Press, 1965. (The reprints of the original
papers of K.Goedel, A.Turing, A.Church, E.Post and others).
Shinichi Ikeno. A 6-symbol 10-state Universal Turing Machine.
Proceedings, Institute of Electrical Communications Tokyo, 1958.
Joel I. Seiferas, Albert R. Meyer. Characterization of Realizable
Space Complexities. Annals of Pure and Applied Logic 73:171-190,
1995.
M.O. Rabin. Speed of computation of functions and classification
of recursive sets. Third Convention of Sci.Soc. Israel, 1959, 1-2.
Abst.: Bull. of the Research Council of Israel, 8F:69-70, 1959.
G.S. Tseitin. Talk: seminar on math. logic, Moscow university,
11/14, 11/21, 1956. Also pp. 44-45 in: S.A. Yanovskaya, Math. Logic
and Foundations of Math., Math. in the USSR for 40 Years,
1:13-120, 1959, Moscow, Fizmatgiz, (in Russian).
J. v.Neumann, O. Morgenstern.
Theory of Games and Economic Behavior. Princeton Univ. Press, 1944.
L. Stockmeyer, A. Meyer.
Word problems requiring exponential time. STOC-1973, pp.1-9.
Ashok K. Chandra, Dexter C. Kozen, Larry J. Stockmeyer.
Alternation. J. ACM, 28(1):114-133, 1981.
J.M. Robson. N by N checkers is EXPTIME-complete.
SIAM J. Comput 13(2), 1984. Also: The complexity of Go.
Proc. 1983 IFIP World Computer Congress, p. 413-417.
A.S. Fraenkel and D. Lichtenstein. Computing a perfect
strategy for chess requires time exponential in .
J. Combin. Theory (Ser. A) 31:199-214. ICALP-1981.
W.J. Savitch. Relationships between nondeterministic
and deterministic tape complexities.
J. Comput. Syst. Sci. 4:177-190, 1970.
D.B. Yudin and A.S. Nemirovsky. Informational Complexity and
Effective Methods for Solving Convex Extremum Problems. Economica i
Mat. Metody 12(2):128-142; transl. MatEcon 13:3-25, 1976.
E.M. Luks: Isomorphism of Graphs of Bounded Valence
Can Be Tested in Polynomial Time. FOCS-1980.
M.R.Garey, D.S.Johnson. Computers and Intractability.
W.H.Freeman & Co. 1979.
B.A.Trakhtenbrot. A survey of Russian approaches to
Perebor (brute-force search) algorithms.
Annals of the History of Computing, 6(4):384-400, 1984.
M.O.Rabin. Probabilistic Algorithms for Testing Primality.
J. Number Theory, 12: 128-138, 1980.
G.L.Miller. Riemann's Hypothesis and tests for Primality.
J. Comp. Sys. Sci. 13(3):300-317, 1976.
R. Solovay, V. Strassen.
A fast Monte-Carlo test for primality. SIComp 6:84-85, 1977.
R. Karp. Combinatorics, Complexity and Randomness.
(Turing Award Lecture) Communication of the ACM, 29(2):98-109, 1986.
David S. Johnson. The NP-Completeness Column.
J. of Algorithms 5, (1984) 284-299.
R. Karp. The probabilistic analysis of some
combinatorial search algorithms. Algorithms and Complexity.
(J.F.Traub, ed.) Academic Press, NY 1976, pp. 1-19.
Y. Gurevich, Average Case Complexity.
Internat. Symp. on Information Theory, IEEE, Proc. 1985.
A.N.Kolmogorov, V.A.Uspenskii. Algorithms and Randomness.
Theoria Veroyatnostey i ee Primeneniya = Theory of
Probability and its Applications, 3(32):389-412, 1987.
M. Li, P.M.B. Vitanyi. Introduction to Kolmogorov
Complexity and its Applications. Springer Verlag, New York, 2019.
M. Blum, S. Micali. How to generate Cryptographically
Strong Sequences. SIAM J. Comp., 13, 1984.
A. C. Yao. Theory and Applications of Trapdoor Functions.
FOCS-1982.
O.Goldreich, L.Levin. A Hard-Core Predicate
for all One-Way Functions. STOC-1989, pp. 25-32.
R.Rivest, A.Shamir, L.Adleman. A Method for Obtaining Digital
Signature and Public-Key Cryptosystems. Comm. ACM, 21:120-126, 1978.
M. Blum, S. Goldwasser. An Efficient Probabilistic
Encryption Scheme Hiding All Partial Information. Crypto-1982.
M. Rabin. Digitalized Signatures as
Intractable as Factorization. MIT/LCS/TR-212, 1979.