]>
SOME RESULTS ON RECURRENCE AND ENTROPY
DISSERTATIO
Presented in Partial Fulfillment of the Requirements for
the Degree Doctor of Philosophy in the Graduate
School of the Ohio State University
By
Ronald Lee Pavlov Jr., B.S., .
The Ohio State University
2007
Dissertation Committee:
. Approved by
Professor Vitaly Bergelson, Advisor
Professor Alexander Leibman
Advisor
Professor Manfred Einsiedler Graduate Program in
Graduate Program in Mathematics
ABSTRACT
This thesis is comprised primarily of two separate portions. In the first portion, we
exhibit, for any sparse enough increasing sequence of integers, a totally minimal,
totally uniquely ergodic, and topologically mixing system and for
which the averages fail to converge on a residual set in , answering
negatively an open question of Bergelson. We also construct here a totally minimal,
totally uniquely ergodic, and topologically mixing system and for
which .
In the second portion, we study perturbations of multidimensional shifts of finite
type. Given any shift of finite type for and any word in the language of
, denote by the set of elements of in which does not appear. If satisfies
a uniform mixing condition called strong irreducibility, we obtain exponential upper
and lower bounds on dependent only on the size of . This result
generalizes a result of Lind about shifts of finite type.
ii
To Dilip
ill
ACKNOWLEDGMENTS
First and foremost, I would like to thank my advisor Vitaly Bergelson. This the-
sis could never have been completed without his help. His infectious enthusiasm
for mathematics, along with a willingness to share his knowledge, were a constant
inspiration.
I would also like to thank my other committee members, Professors Alexander Leib-
man and Manfred Einsiedler, for agreeing to serve on my committee and for many
fruitful mathematical discussions.
Finally, I thank my friends and loved ones, without whom I would never have been
able to get as far as I have. In no particular order:
Thank you to Mom and Dad, who have always been willing to do anything to
support me and have always believed in me.
Thank you to Greg, without whose friendship and sense of humor graduate school
would have been much less bearable.
Thank you to Jasmine, who was always there for me. I can never thank you
enough for your constant love and support.
IV
VITA
2000-Present . . . . . . . . . . . . . . . . . . . . . . . . . . Graduate Teaching Associate,
The Ohio State University
1998-1999 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Student Instructional Associate,
The Ohio State University
2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . in Mathematics,
The Ohio State University
2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . in Mathematics,
The Ohio State University
PUBLICATIONS
Research Papers
Some Counterexamples In Topological Dynamics (Ergodic Theory and Dynam-
ical Systems, to appear)
Perturbations of Multidimensional Shifts of Finite Type (submitted)
FIELDS OF STUDY
Major Field: Mathematics
Specialization: Ergodic Theory and Dynamical Systems
vi
TABLE OF CONTENTS
Abstract ii
Dedication .
Acknowledgments . 1V
Vita
List of Figures ix
CHAPTER PAGE
1Introduction 1
2Recurrence and convergence of ergodic averages along sparse sequences 8
2.1 Introduction 8
2.2 Some general symbolic constructions 16
2.3 Some symbolic counterexamples 31
2.4 Some general constructions on connected manifolds . 42
2.5 Some counterexamples on connected manifolds 55
2.6 A counterexample about simultaneous recurrence 61
2.7 Questions 65
3Perturbations of multidimensional shifts of finite type 67
3.1 Introduction 67
3.2 Some measure-theoretic preliminaries 84
3.3 A replacement theorem 93
3.4 The proof of the main result 114
3.5 A closer look at the main result 146
3.6 An application to an undecidability question 155
3.7 Questions 159
Vll
Bibliography
161
Vlll
LIST OF FIGURES
FIGURE PAGE
3.1 's action on a sample element of 73
3.2 A portion of a sample element of 76
3.3 77
3.4 77
3.5 85
3.6 A standard replacement of . 95
3.7 A sample element of 97
3.8 An element of associated to two standard replacements 98
3.9 The suboctants of 99
3.10 Elements , of whose difference is a multiple of tOO
3.11 An element of which contains 106
3.12 107
3.13 108
3.14 Intersecting occurrences of 116
3.15 120
3.16 Disallowed and allowed pairs of overlapping 127
3.17 A point 138
ix
3.18 The correspondence between copies of and points in
139
3.19 How a copy of is filled if 140
3.20
141
(JIAPTER 1
INTRODUCTION
This introduction will serve as a brief mathematical and historical overview of both
of the main problems that we will examine in this thesis. Due to their somewhat
disparate natures, the two portions of this thesis will each contain a more in-depth
introduction as well. For this reason, we will for the most part relegate formal defi-
nitions to their pertinent introduction.
Ergodic theory is the study of average or long-term behavior of systems which evolve
with time. For example, one can consider a particle bouncing in a box at fixed speed.
Its position and velocity at any time can be represented by a vector in . One can
then model this behavior by taking to be the space of all possible positions and
velocities for the particle, and a self-map of which, given a position and velocity,
gives the position and velocity one second later. This pairing of a space with a
self-map is called a dynamical system.
The more general setup is that of a group acting on a space with some sort
of structure by maps preserving this structure. In this thesis, is always
for some N. Measure-theoretic ergodic theory occurs when is a measure
space with a probability measure invariant under each , and so in this case we call
1
a measure-preserving dynamical system. Topological dynam-
ics occurs when is a compact topological space and each is a homeomorphism,
and so in this case we call a topological dynamical system. In ei-
ther type of system, if , then for any integer , and so we shorten
to and to . (Here )
These cases are not as disjoint as they may first appear: the famed Bogoliouboff-
Krylov theorem states that any topological dynamical system has at
least one invariant probability measure as long as is an amenable group. (An
examination of amenability is beyond the scope of this thesis, but we mention that
amenable groups are a class which contain all abelian groups. We only consider
in this thesis, so all topological dynamical systems examined here will possess
invariant measures.) This sometimes allows one to examine topological properties of
a system via properties of invariant measures. For instance, in Chapter 3, we will
prove some purely combinatorial properties of a -action by homeomorphisms of a
Cantor set by means of studying the invariant measures of this action.
Chapter 2 deals primarily with ergodic averaging. One of the fundamental results of
ergodic theory is Birkhoff's ergodic theorem:
Theorem 1.0.1. ([Bi]) For any measure-preserving dynamical system
and any , exists for -almost every .
Several results have been proven about the convergence of such averages when one
averages not along all powers of , but only along some distinguished subset of the
2
integers. ([Bou], [Bou2], [Wi]) In particular, when one averages along for a
polynomial with integer coefficients, there is the following result of Bourgain:
Theorem 2.1.12. ([Bou], p. 7, Theorem t) For any measure-preserving dynamical
system , for any polynomial , and for any with
, exists for -almost every .
Theorem 2.1.12 can be interpreted as follows: for any polynomial , any
measure-preserving system , and any measure-theoretically "nice" func-
tion , the set of points where fails to converge is of mea-
sure zero, or negligible measure-theoretically. It is then natural to wonder whether or
not there is a topological parallel to this result using topological notions of "niceness"
(continuity) and negligibility (first category), and in fact such a question was posed
by Bergelson:
Question 2.1.13. ([Be], p. 51, Question 5) Assume that a topological dynamical
system is uniquely , and let and . Is it true that for
all but a first category set of points exists?
One of our results is a (quite negative) answer to Question 2.1.13:
Theorem 2.1.15. For any increasing sequence of integers with upper Banach
density , there exists a totally minimal, totally uniquely ergodic, and topologically
1A topological dynamical system is uniquely ergodic if there exists exactly one T-
invariant measure .
set has upper Banach density zero if .
3
mixing topological dynamical system and a continuous function on with
the property that fails to converge for a residual set of .
We also examine recurrence along distinguished subsets of the integers, motivated
primarily by the following result:
Theorem 2.1.17. ( , p. 14, Corollary 1.8) For any , any topolog-
ical dynamical system and any polynomials , , ,
with for , for a residual set of there exists a sequence
of positive integers such that for , where is the
standard orthonormal basis of .
In particular, this implies that for any minimal topological dynamical system
and any polynomial with , the set of with
is residual. The following result shows that for with degree at least two, this residual
set is not necessarily all of .
Theorem 2.1.18. For any increasing sequence of integers with upper Banach
density zero, there exists a totally minimal, totally uniquely ergodic, and topologically
mixing topological dynamical system (X,T) and an uncountable set A such that
for every x , the sequence does not have x as a limit point, i.e. there is
no sequence of positive integers such that x converges to x.
Another consequence of Theorem 2.1.17 is that for any commuting minimal homeo-
morphisms and of a compact space , there exists a residual set of for which
sA topological dynamical system is minimal if there exist no nonempty proper closed
-invariant subsets of .
4
there exists a sequence of positive integers such that and .
The following theorem shows that for some systems, it is the case that this residual
set of is not all of .
Theorem 2.1.21. There exists a totally minimal topological dynamical system (X,T)
and a point x such that for any positive integers r , any sequence of
integers satisfying and is eventually zero.
The systems which we construct to prove Theorems 2.1.15 and 2.1.18 are symbolic
dynamical systems. A symbolic dynamical system is defined by first choosing a
finite set , called the alphabet. endowed with the product topology is a
compact space, and for any , we may define the shift homeomorphism by
for Q. Any closed set has a topology induced by , and
if is invariant under each , then is a topological dynamical system
which we call a symbolic dynamical system.
In Chapter 3, we examine symbolic dynamical systems exclusively. There, we consider
a specific type of symbolic dynamical system called a shift of finite type. A -shift
of finite type is a symbolic dynamical system where is defined by
specifying a finite set of finite words (a word is a function from a finite subset of
to ) and taking to be the set of all elements of in which none of the words
in appear. The shift of finite type specified by a finite set of words in this
way is denoted by . For example, the set of all biinfinite sequences of zeroes and
ones in which no two ones occur consecutively is a shift of finite type, as is the set of
5
all arrays of zeroes and ones in which no three-by-four blocks of all zeroes or all
ones appear.
We are particularly interested in the effects of forbidding a particular word from a
shift of finite type . Given any word , define to be the set of elements of in
which does not appear. Then clearly is a subset of , and so the topological
entropy of is not greater than the topological entropy of . It
is natural to wonder how much the topological entropy drops by when is removed,
as it is a sort of measure of how important is to the information-retaining capacity
of . In [L], Lind proved the following: (the condition means that
and that appears in some element of )
Theorem 3.1.16. ([L], p. 360, Theorem 3) For any topologically transitive Z-shift
of finite type with positive topological entropy , there exist constants
, , and such that for any and any word , if we denote
by the shift of finite type , then
.
Our main result is a generalization of Theorem 3.1.16 for -shifts of finite type
which satisfy a mixing condition called strong irreducibility. (We formally define
strong irreducibility in Definition 3.1.19.)
Theorem 3.1.22. For any d 1 and any strongly irreducible -shift X
of finite type with uniform filling length R and positive topological entropy ,
6
there exist constants and such that for any and any word
, if we denote by the shift of finite type , then
.
One way in which -shifts of finite type are more difficult to deal with for is
that given a finite collection of patterns, it is undecidable whether or not the shift
of finite type induced by is even nonempty. Our methods yield a situation in
which this question can be answered:
Theorem 3.6.1. For any alphabet A, there exist F, G such that for any m
and any finite set of words : satisfying
and for , .
7
(JIAPTER 2
RECURRENCE AND CONVERGENCE OF ERGODIC
AVERAGES ALONG SPARSE SEQUENCES
2.1 Introduction
In this chapter, we are concerned with the convergence of averages of the form
for an increasing sequence of integers . We begin with some
definitions.
Definition 2.1.1. A measure-preserving dynamical system
consists of a measure space , a probability measure with -algebra 8 of measurable
sets, and a group action of transformations : with
for all .
Definition 2.1.2. A measure-preserving dynamical system (X,B, is er-
godic if any set A satisfying A for all g has {0,1}.
Definition 2.1.3. A topological dynamical system (X, consists of
compact topological space X and a group action of homeomorphisms :
X .
8
Definition 2.1.4. Given a topological dynamical system (X,, a Borel prob-
ability measure on X is called ergodic if (X,, , is an ergodic
measure-preserving dynamical system, where is the Borel -algebra of X.
In this chapter, all dynamical systems have , so as already described we
will use the notations and for measure-preserving and topological
dynamical systems respectively.
Definition 2.1.5. A topological dynamical system (X,T) is minimal if for any
closed set K with , K or K . (X,T) is totally minimal if
(X, is minimal for every n O.
Definition 2.1.6. A topological dynamical system (X,T) is uniquely ergodic if
there is only one Borel measure on X such that for every Borel
set X. (X,T) is totally uniquely ergodic if (X, is uniquely ergodic for
every n .
Definition 2.1.7. A topological dynamical system (X,T) is topologically mixing
if for any open sets U, , there exists N such that for any n N,
.
Definition 2.1.8. For any set , the upper Banach density of A is defined
by
.
9
Definition 2.1.9. For any set , the upper density of A is defined by
.
Definition 2.1.10. For a set , the density of A is defined by
if this limit exists.
Definition 2.1.11. Given a topological dynamical system (X,T) and a T-invariant
Borel probability measure , a point x is (T,-generic if for every f ,
.
In the measure-preserving setup, there are several positive results about convergence
of averages of the form , including this theorem of Bourgain:
Theorem 2.1.12. ([Bou], p. 7, Theorem t) For any measure-preserving dynamical
system , for any polynomial , and for any with
, exists for -almost every .
The following question regarding a possible topological version of Theorem 2.1.12 was
posed by Bergelson:
Question 2.1.13. ([Be], p. 51, Question 5) Assume that a topological dynamical
system is uniquely ergodic, and let and . Is it true that for
all but a first category set of points exists?
to
Bergelson added the hypothesis of unique ergodicity because it is a classical result
that a system is uniquely ergodic with unique -invariant measure if and
only if for every and , , and so in
the topological setup this is a natural assumption to make about to achieve
the desired result.
However, Bergelson was particularly interested in the convergence of these averages
to the "correct limit," i.e. where is the unique -invariant measure on
. To have any hope for such a result, it also becomes necessary to assume ergod-
icity for all powers of in order to avoid some natural counterexamples related to
distribution (mod ) of for positive integers . For example, if , is
the permutation on defined by (mod 3), is normalized
counting measure on , and , then is obviously uniquely ergodic with
unique invariant measure , but
, .and
To avoid such examples, we would need to be totally ergodic as well as uniquely
ergodic, and so it makes sense to assume total unique ergodicity to encompass both
properties. Bergelson's revised question then looks like this:
Question 2.1.14. Assume that a topological dynamical system (X,T) is totally uniquely
11
ergodic with unique -invariant measure , and let and . Is it true
that for all but a first category set of points ?
We answer Questions 2.1.13 and 2.1.14 negatively in the case where the degree of
is at least two, and in fact prove some slightly more general results. The level of
generality depends on what hypotheses we place on the space . In particular, we
can exhibit more counterexamples in the case where is a totally disconnected space
than we can in the case where is a connected space such as .
Theorem 2.1.15. For any increasing sequence of integers with upper Banach
density zero, there exists a totally minimal, totally uniquely ergodic, and topologically
mixing topological dynamical system and a continuous function on with
the property that fails to converge for a residual set of .
Theorem 2.1.16. For any increasing sequence of integers with the property
that for some integer , for all suffiffifficiently large , there ex-
ists a totally minimal, totally uniquely ergodic, and topologically mixing topological
dynamical system and a continuous function on with the property that
fails to converge for a residual set of . In addition, the space is
a connected 9-manifold.
We note that Theorems 2.1.15 and 2.1.16 answer Question 2.1.13 negatively for
with degree at least two, since the sequence for any nonlinear
satisfies the hypotheses of both theorems.
Theorems 2.1.15 and 2.1.16 are about nonconvergence of ergodic averages along cer-
tain sequences of powers of . We also prove two similar results about nonrecurrence
12
of points. As motivation, we note that a minimal system has the property that every
point is recurrent. In other words, if is minimal, then for all , it is the
case that . If is totally minimal, then all points are recurrent
even along infinite arithmetic progressions: for any nonnegative integers , , and for
all , . It is then natural to wonder if the same is true for other
sequences of powers of , and in this vein there is the following result of Bergelson
and Leibman, which is a corollary to their Polynomial van der Waerden theorem:
Theorem 2.1.17. ( , p. 14, Corollary 1.8) For any , any minimal topolog-
ical dynamical system and any polynomials , , ,
with for , for a residual set of there exists a sequence
of positive integers such that for , where is the
standard orthonormal basis of .
In particular, Theorem 2.1.17 implies that for a minimal system and any
polynomial with , the set of such that is residual. If
is assumed to be totally minimal, then by definition if the degree of is one,
then every point is in . The following two results imply that if the
degree of is greater than one, total minimality of does not necessarily imply
that every point is in .
Theorem 2.1.18. For any increasing sequence of integers with upper Banach
density zero, there exists a totally minimal, totally uniquely ergodic, and topologically
mixing topological dynamical system (X,T) and an uncountable set A such that
13
for every , the sequence does not have as a limit point, . there is
no sequence of positive integers such that converges to .
Theorem 2.1.19. For any increasing sequence of integers with the property
that for some integer , for all suffiffifficiently large , there exists
totally minimal, totally uniquely ergodic, and topologically mixing topological dynam-
ical system and a point such that the sequence does not have
as a limit point, . there is no sequence of positive integers such that
converges to . In addition, the space is a connected 7-manifold.
The following simple lemma shows that if is topologically mixing, then The-
orems 2.1.18 and 2.1.19 cannot be improved too much, i.e. there is no increasing
sequence for which we can exhibit topologically mixing examples with a second
category set of such nonrecurrent points.
Lemma 2.1.20. If a topological dynamical system (X,T) is topologically mixing, then
for any increasing sequence , the set of x for which x is of first
category.
Proof: Define . It is clear that is closed. We
claim that contains no nonempty open set, which shows that it is nowhere dense,
implying that the set of points for which is not a limit point of
is of first category. Suppose, for a contradiction, that there is a nonempty
open set with for some . Then, there exists with diam(V such that
. By topological mixing, there exists such that . This
14
implies that there exists so that . Since diam(V , .
However, , so we have a contradiction.
Theorem 2.1.18 shows that it is possible for this set of points nonrecurrent along
to be uncountable though.
We mention that some mixing condition is necessary for a statement like Lemma 2.1.20;
as a simple example, consider an irrational circle rotation : on the cir-
cle T. There is clearly some increasing sequence of integers such that
(mod ) . Then, for any , , and so for every ,
does not have as a limit point.
We also prove a result about simultaneous recurrence. Theorem 2.1.17 implies that
for any commuting minimal homeomorphisms and of a compact space , there
exists a residual set of for which there exists a sequence of positive integers
such that and . The following theorem shows that for some
systems, it is the case that this residual set of is not all of X. (In this particular
example, and are powers of the same homeomorphism.)
Theorem 2.1.21. There exists a totally minimal topological dynamical system (X,T)
and a point x such that for any positive integers r , any sequence of
integers satisfying and is eventually zero.
Before proceeding with the proofs, we now give a brief description of the content of
this chapter. In Section 2.2, we will describe some general symbolic constructions of
15
topological dynamical systems with particular mixing properties. At the end of this
section, we will arrive at a construction of a system which is totally minimal, totally
uniquely ergodic, and topologically mixing, and which has as a parameter a sequence
of integers .
In Section 2.3, by taking this sequence to grow very quickly, we will show
that the examples constructed in Section 2.2 are sufficient to prove Theorems 2.1.15
and 2.1.18. Some interesting questions also arise and are answered in Section 2.3
pertaining to the upper Banach density of countable unions of sets of upper Banach
density zero.
In Section 2.4, we create a flow under a function with base transformation a skew
product, which acts on a connected manifold, and which is totally minimal, totally
uniquely ergodic, and topologically mixing. This transformation has as a parameter
a function . We use conditions of Fayad ([Fa]) on flows under functions
to achieve topological mixing, and some conditions of Furstenberg ([Fu]) on skew
products to prove total unique ergodicity.
In Section 2.5, by a judicious choice of , we use the examples of Section 2.4 to prove
Theorems 2.1.16 and 2.1.19.
In Section 2.6, we prove Theorem 2.1.21.
Finally, in Section 2.7 we give some open questions about strengthening our results.
2.2 Some general symbolic constructions
Definition 2.2.1. An alphabet is any finite set, whose elements are called letters.
16
Definition 2.2.2. A symbolic dynamical system with alphabet is a topological
dynamical system where for any , is the shift homeomorphism
of defined by for , and is any closed subset of
invariant under each .
Definition 2.2.3. A word on the alphabet A is any element of for some finite
set F . F is called the shape of w.
For any words of length and of length , we denote by their concatenation,
i.e. the word of length . We denote by the
word torn . . . given by the concatenation of copies of .
Definition 2.2.4. Given any symbolic dynamical system (X,, the language
of X, denoted by , is the set of all words which appear as subwords of X.
All symbolic dynamical systems appearing in this chapter have alphabet {0, 1},
Z. We also restrict ourselves in this chapter to words with shape 1, , for some
. In other words, a word can be thought of for now as a finite string of letters
. The number of letters in a word is called its length.
Definition 2.2.5. A word w of length n is a subword of a sequence u if there
exists k such that for . Analogously, w is a subword
of a word v of length m if there exists -n such that for
.
We will outline three constructions which algorithmically create such that
have certain properties. (Here the "certain properties" in question depend on which
17
construction is used.) It should also be mentioned that many ideas from these con-
structions are taken from work of Hahn and Katznelson , where they also
algorithmically constructed symbolic topological dynamical systems with certain er-
godicity and mixing properties.
Construction 1: (Minimal) We define inductively and , which are, respec-
tively, sequences of positive integers and sets of words on the alphabet {0, 1}. Each
word in is of length . (We will use the term " -word" to refer to a member of
from now on.) We define these as follows: always define and .
Then, for any , is defined to be any integer greater than or equal to
which is also a multiple of , and then is chosen to be the set of words of
length which are concatenations of -words, containing each word in the
concatenation at least once.
We then define to be the set of all which are limits of shifted words In
other words, if there exist and such that for all large enough
, . Since is compact, such exist by a standard diagonalization
argument. It is easily checkable that is closed and a-invariant. The claim is that
regardless of the choice of the integers , as long as divides , and
, is minimal. It suffices to show that for any , .
Choose any and . By the definition of , there exists and some
word such that is a subword of . is an -word, so by definition,
every -word contains , and therefore , as a subword. Finally, note that again
by the definition of Construction 1, is an biinfinite concatenation of words
18
This implies that . . . contains some word, and therefore , as a
subword, and so there exists so that begins with . Since was an
arbitrary word in , this implies that , and so is minimal.
So, we have now demonstrated a way of constructing minimal . We will now
make this construction a bit more complex in order to make totally minimal.
Construction 2: (Totally minimal) We define inductively and , which are,
respectively, sequences of positive integers and sets of words on the alphabet {0, 1}.
Each word in is of length . We define these as follows: always define
and . Then, for any , is defined to be any integer greater than
or equal to , and then is chosen to be the set of words of
length which are concatenations of -words and the word 1 with the following
properties: the word 1 does not appear at the beginning or end of , only a single
1 can be concatenated between two -words, and for every , and for every
, appears in at an (mod )-indexed place. That is, there exists
(mod ) with . . . . From now on, to refer
to this second condition, we say that every occurs in at places indexed by
all residue classes modulo .
Since this construction is a bit complicated, a few quick examples may be in or-
der. Suppose that , , and we choose . Say that
is an : each word ap-
pears at least once beginning with a letter of with an odd index, and at least
19
once beginning with a letter of with an even index. Examples of words which
would not be -words include abcdllabcddabcdababcabcd (1 is concatenated twice
between and ), abcdo lbcdldbcdaabcbbbbdc (occurrences of the word begin only
with even-indexed letters), or abcdldcbabcdabcdabcdadb (wrong number of letters.)
We can then define to again be the set of all which are limits of shifted
-words. For this definition to make sense, it suffices to show that is nonempty
for all , since if this is true then by compactness of and a diagonalization
argument just as in Construction 1. is nonempty, so it suffices to show that
implies for all . For any , assume that . Then, enumerate the
elements of by efine the words . . .
and , where (mod ). Since
, exists, and is a concatenation of -words and the
word 1 with length . In , at most a single 1 is concatenated between any two
-words and 1 does not appear at the beginning or end of . Also, since the length
of is divisible by , and since all -words are subwords of , all words
appear in at places indexed by all residue classes modulo , and so all
words appear in at places indexed by all residue classes modulo as well.
Therefore, , and so is nonempty.
We will show that is totally minimal. Fix any . We wish to show that
for any , . Choose any such , and fix any word .
By definition of , there exists and an word such that is a subword of
. Without loss of generality, we assume that . By the construction,
20
occurs in every -word, and it occurs at places indexed by every residue class
modulo . Since , in particular this implies that , and therefore , occurs
in every -word at places indexed by every residue class modulo . By definition
of Construction 2, is a biinfinite concatenation of -words and the word 1.
Therefore, . . . contains some -word as a subword, and so
occurs in . . . at places indexed by every residue class modulo .
There then exists so that begins with . Since was an arbitrary word of
, this shows that , and since was arbitrary, that is
totally minimal.
We now define one more general type of construction, again more complex than the
last, so that the system created is always totally uniquely ergodic and topologically
mixing, in addition to being totally minimal. For this last construction, we first need
a couple of definitions.
Definition 2.2.6. For any integers and , and and
, we define to be the ratio of the number of occurrences of as
concatenated -rvord at (mod )-indexed places in to the total number of
-rvords concatenated in .
We consider any positive integer to be equal to 0 (mod t) for the purposes of this
definition. An example is clearly in order: if , , (in Construc-
tions 2 and 3, is always taken to be {0, 1}, but here we deviate from this for
21
illustrative purposes) and is the word Ot (here vertical bars show
where breaks in the concatenation occur), then occurs twice out of four -words,
so . Since one of these occurrences begins at and one begins at
, . We make a quick note here that there could
be some ambiguity here if an -word could be decomposed as a concatenation of
words and ones in more than one way. For this reason, we just assume that when
computing , the definition of the word includes its representation
as a concatenation of -words and ones. (i.e. in the example given, is defined
as the concatenation Of of -words and ones, rather than the nine-letter
word 011010110.)
Definition 2.2.7. Given any words of length and w of length , and any
integers , define (w, to be the number of occurrences of w at
(mod m)-indexed places in , divided by .
Taking the previous example again, , since 01 occurs three times as
a subword of 011010110. Since two of these occurrences begin at letters of with
even indices and one begins at a letter of with odd index, and
.
Construction 3: (Totally minimal, totally uniquely ergodic, and topologi-
cally mixing) We define inductively and , which are, respectively, sequences
of positive integers and sets of words on the alphabet {0, 1}. Each word in is of
length . We define these as follows: always define and . Then,
we fix any sequence of positive reals such that and define, for each
22
, some for any integer and prime
(We may choose such a by Bertrand's postulate. ) Note that
this implies that for all . We then define to be the set of words
of length with all of the same properties as in Construction 2, along with the
property that, for any , and for any , .
is again taken to be the set of which are limits of shifted words For
this definition to make sense, it suffices to show that is nonempty for all , since
if this is true then by compactness of and a diagonalization argument
just as in Construction 1. is nonempty, so it suffices to show that
implies for all . For any , assume that . Then, enumerate the
elements of by , , , , and define the words . . .
and . Clearly for large , exists, and is a concatenation
of -words and the word 1 with length . In , at most a single 1 is concatenated
between any two -words and 1 does not appear at the beginning or end of . Since
, for every and , appears in exactly once as a
concatenated -word at an (mod )-indexed place. Therefore, appears in
exactly times as a concatenated -word at (mod )-indexed places, and
so . Since and were arbitrary, , and so .
is totally minimal by the same proof used for Construction 2. We claim that
is also totally uniquely ergodic. Take any word , and any fixed integer
. We define two sequences and as follows: is the minimum value
23
of , where and ranges over all word and is the
maximum value of , where and ranges over all -words.
Suppose that and are known, and that . We wish to show that
and are very close to each other. Let us consider any element of and,
for any fixed , see how few occurrences of there could possibly be at
(mod )-indexed places in . By the definition of Construction 3, for every ,
and , the ratio of the number of times occurs as a concatenated word
in whose first letter is a letter of whose index is equal to (mod ) to the total
number of -words concatenated in is at least . Since divides , then for
any , the ratio of the number of times that occurs as a concatenated
word at (mod )-indexed places in to the total number of -words concatenated
in is at least . Since the total number of -words concatenated in is at
least , this implies that the number of such occurrences of in is at least
for any and . For any and , the number of times that occurs at
(mod )-indexed places in as a subword of an occurrence of that occurs at an
(mod )-indexed place in is then at least .
Summing over all and , the number of occurrences of in at
(mod )-indexed places is at least
.
Since was arbitrary in and was arbitrary,
24
.
Let us now bound from above the number of occurrences of in at (mod )-
indexed places. By precisely the same reasons as above, for any , the
number of occurrences of at (mod )-indexed places which lie entirely within a
concatenated -word in is not more than
.
(The denominator of the first fraction changed because there are at most
words concatenated in ) However, it is possible that there are occurrences of
in which do not lie entirely within a concatenated -word in . The number
of such occurrences of is not more than times the number of concatenated
words in , which in turn is less than or equal to . This means that
the number of occurrences of at (mod )-indexed places in is bounded from
above by
,
and since was arbitrary, this implies that
.
This implies that
25
.
Since for every and , for large this shows that
, which clearly approaches zero as .
We now note that since
,
and since by definition for all , we see that
, and so
.
By almost completely analogous reasoning, for large
.
Therefore,
a
26
,
so . In a completely analogous fashion,
. We know that converges, and since for
all , converges as well. Therefore, we see that the sequences and
are Cauchy, and converge. Since we also showed that , we
know that they have the same limit, call it .
This implies that for very large , is very small for every and
. We claim that this, in turn, implies that for very large , is
very small for every word of length : fix any , and take such that
: for every and , and such that . Then
for any word of length at least , is a subword of a concatenation
of -words and copies of the word 1. The number of full -words appearing in
the concatenation formi iiss aatt lleeaasstt , and aatt mmoosstt . So, the number
of occurrences of at (mod )-indexed places in which are contained entirely
within a concatenated -word is at least
, and at most .
Since there are at most occurrences of not contained entirely
within a concatenated -word, this implies that is at least ( , and
at most a . Since for any , this statement is true for any long enough word
and , we see that a uniformly for .
Since was arbitrary, and since characteristic functions of cylinder sets are dense in
, approaches a uniform limit for all , and so
27
is uniquely ergodic for every N. Since an invariant measure for would be
invariant for any as well, the unique invariant measure is the same for every
.
Finally, we claim that is also topologically mixing. Consider any words ,
. By construction, there exists so that there are words , with a
subword of and a subword of . We also claim that for any , there
exists an word where . . . , and similarly
with . . . . We show only the existence of , as the proof for
is trivially similar. Consider any , and take (mod ). Then,
if we enumerate the elements of by , , , , first define the word
, and then define the word .
has the property that . . . is a concatenated -word in as
long as . . . lies in the subword of , which is
true for large and . This means that if we reorder , , in the
definition of , we may create a word where . . . .
, since for every and , appears exactly
times in as a concatenated -word at (mod )-indexed places, implying that
(This uses the fact that ( , , which was already shown.)
We create in the same way for each . Since is a subword of and is a subword
of , for every , it is easy to choose a word to be
for properly chosen so that . . . , and similarly so that
. . . . For large , this means that we can construct such
and for any .
28
We will now use these and to prove that for any , there exists a
word of length such that rvxrv' . We do this by proving a lemma:
Lemma 2.2.8. For any t , and for any , j such that there exists an
word x where . . . and . . . are
concatenated -rvords in x, and for any trvo words z and , there exists an
word where and . . .
.
Proof: We prove this by induction. First we prove the base case ; take an
word where . . . and . . .
are concatenated -words in , call them and respectively. Since is an
word, there exists an occurrence of at an ( (mod !)) (mod )-indexed
place, i.e. there exists (mod (A +1)!) such that . . .
. Similarly, there exists (mod (A +1)!) such that
2) . . . . We now create by leaving almost all of alone, but defining
. . . , . . . , . . . ,
and . . . . This new word is still a concatenation of
words and ones, and since we switched two pairs of -words which occurred at
indices with the same residue class modulo ,
for all and . Therefore, is an -word, with and
occurring at the proper places, completing our proof of the base case.
Now, let us assume that the inductive hypothesis is true for a certain value of ,
and prove it for . Consider an word where . . . and
29
. . . are concatenated -words in x, call them a and b. Call
the concatenated -word that is a subword of , and denote
by the corresponding -word for . . . . From now on, when
we speak of these words , , , , we are talking about the pertinent occurrences at
the places within already described. There are two cases; either and are the
same; i.e. the same -word in , occurring at the same place, or they are not. If
and do occur at the same place, then by the inductive hypothesis, there exists an
-word with an occurrence of at the same place as occurs in , and an
occurrence of at the same place as occurs in . If we can replace
by in , then we will be done. If and do not occur at the same place, then
since is a concatenated -word in , by the inductive hypothesis there exists
an -word such that has occurring at the same place where occurs in .
Similarly, there exists an -word such that has an occurrence of at the same
place where occurs in . If we replace by and by in , then we will be
done. So regardless of which case we are in, our goal is to replace one or two chosen
-words within with one or two other -words. We will show how to replace
two, which clearly implies that replacing one is possible. We wish to replace by
and by . We do this in exactly the same way as in the base case; say that
. . . and . . . . Since , there
exists (mod ) and (mod ) such that . . .
and . . . . As in the base case, we create by making
. . . and . . . , . . . ,
30
and . . . . Then is an word, and by construction
. . . and . . . .
Choose any sequence of -words for all . For any such , take
{ : is a concatenated word in }. Since
is a concatenation of -words and ones, if we write the elements of as
, then for any , a . For any
, and , , by Lemma 2.2.8, there exists an word with the
property that and .
This implies that there is a subword of of the form rvxrv' where the length of is
. We note that can take any integer value between
and inclusive. Therefore, the set of possible lengths of for which rvxrv'
contains
.
When is increased by one, is increased by at most . This, along with
the fact that the intervals have length , which for large exceeds
, implies that this set of possible lengths of contains [ -
, . Since this entire
argument could be made for any , we see that for any , there exists
of length so that rvxrv' . Then, for any nonempty open sets
31
, , there exist and such that and . By the above
arguments, there exists so that for any , , implying that
. This shows that is topologically mixing.
2.3 Some symbolic counterexamples
Proof of Theorem 2.1.15: We take the continuous function for all
, and first note that
: does not
: . . .
: ),
and that the latter set, call it , is clearly a . We will choose so that is dense
in . This will imply that is a dense , and since is a complete metric space,
by the Baire category theorem, that is residual, which will prove Theorem 2.1.15.
Now let us describe the construction of .
Recall that we have assumed that the sequence has upper Banach density zero.
We define . We also define the intervals of integers
for every , and take any partition of into infinitely many
disjoint infinite sets in , call them , , . . .. Define the set , and
32
then define the set . Next, choose some large enough so that
, and define , and then define .
Continuing in this way, we may inductively define , for all so that for
1ll , for some with the property that , and
. We will verify some properties of these sets. Most
importantly, we denote by the union , and claim that . We
show this by noting that has a certain structure; consists of shifted subintervals
of , separated by gaps which approach infinity. More rigorously:
Lemma 2.3.1. There exist intervals and integers such that
and such that
.
Proof: Take the set , and denote its members by . . ..
Then, for any , is a subset of some . The interval is then defined to be
, (which means and ) and is defined to be .
It is just a rewriting of the definition of the that with
these notations. All that must be checked is that
. We will show that , which
implies the desired result. Since , . So, we must
simply show that . Suppose that is a subset of . Then .
We also note that by construction, . But, since , ,
and so . Therefore, , and so , which
33
clearly shows that this quantity approaches , since is an increasing sequence
of integers.
We now prove a general lemma that implies, in particular, that .
Lemma 2.3.2. If , and if there exist intervals and integers
such that ( , then the
set has upper Banach density zero.
Proof: Fix . By the fact that , there exists such that for any interval
of integers of length at least , . Take to be any interval of integers of
length exactly . Since ,
there is some such that if has nonempty intersection with for some
, it is disjoint from for every . Therefore, for intervals
of integers of length with large enough minimum element, consists of a
subset of a shifted copy of , and so , for some interval of integers
whose length is also . This means that in this case, . We have then shown
that for every , there exist , such that for any interval of integers of length
with , . We will show that this slightly modified definition
still implies that . Again fix , and define and as was just done.
Now consider any interval of integers I with length at least . Then, partition
into subintervals: define , and then break into consecutive
subintervals of length , called , , , . There may be one last subinterval left
34
over of length less than ; call it (which may be empty.) Note that ,
or . We see that
.
Since was arbitrary, .
By combining Lemmas 2.3.1 and 2.3.2, . We will now create . We note
that this part of our construction uses only the fact that , and no other
properties. We take and . We recall that must be a sequence
of integers with the following properties: for all ,
for some positive integer and prime . We also require
to grow quickly enough so that for all , and for any interval of integers I of length
at least , . That we may choose such is a consequence of the
fact that . Using these , we define as in Construction 3. We now
prove a lemma:
Lemma 2.3.3. For any k , m , and for any u {0, , there exists an
-rvord such that -m) for all i .
Proof: This is proved by induction on . Clearly the hypothesis is true for
and for any , . Now suppose it to be true for a particular . We will show
35
that it is true for and every , . We again construct an auxiliary word
: enumerate the elements of by , , , . Then, we again define the
word . Define , where
as above. We note that for any and ,
occurs exactly times as a concatenated -word at (mod )-indexed
places in . (This uses the fact that for all , which has already
been shown.) Now, fix . We wish to construct an word such
that for all [ , a ]. We begin with the
word . Clearly it is not necessarily true that for all
. We force this condition to be true by changing some of the
-words concatenated in . We show that this is possible; for any concatenated
word in , say , the necessary condition is
that ones or zeroes (depending on ) be introduced at digits whose indices are of the
form for all [ , , a ]. To do this, we replace this
word by , which by the inductive hypothesis has the correct digits of at the
desired places. So, we may change into a concatenation of -words and ones,
call it , which has the proper digits of in all desired places. This may be done
by changing at most -words.
Therefore, since for any and , occurred in as a concatenated
-word at (mod )-indexed places exactly times, occurs in as
a concatenated -word at (mod )-indexed places between and
times. This implies that , and since
36
and were arbitrary, that is an -word. By induction, Lemma 2.3.3 is
proved.
This implies in particular that for every , there exists an -word with
for all , , ]. By a standard
diagonalization argument, there exists a sequence and such that for
all , for all large enough . Since for every ,
for all , clearly
for all . Since is a limit of shifted -words, . As mentioned above,
this entire construction could be done with any set of zero upper Banach density in
place of , which lets us state the following corollary:
Corollary 2.3.4. For any C with , there exists (X, totally mini-
mal, totally uniquely ergodic, topologically mixing, and with the property that for any
sequence u {0, , there exists such that for all i .
We use Corollary 2.3.4 to create which proves Theorem 2.1.15. Recall that
, where for all , for some
, and for all . For each , we write the elements of
in increasing order as , , . We now decompose into two disjoint subsets;
define { : for some where for odd },
and { : for some where for even }.
Since , we use Corollary 2.3.4 to create a totally minimal, totally uniquely
37
ergodic, and topologically mixing and with for and
for . Recall that we wish to show that for the continuous function
: from to {0, 1}, the set of points such that does
not converge is residual. We showed earlier that it is sufficient to show that the set
:
:
is dense in . Fix any . By the construction of ,
for some , for all , and for all . In particular,
for all in for odd and for all in for even .
But then for any odd integer , for all integers ,
and so
,
which is clearly larger than for sufficiently large . Similarly, for any even integer
, for all integers , and so
,
which is clearly less than for sufficiently large . Therefore, . Since
was arbitrary, the orbit of is a subset of . Since is minimal, is dense, which
completes the proof of Theorem 2.1.15.
38
We note that this proof in fact shows that the set
:
:
is a residual set in , and so we can also say that for a residual set of ,
and
.
Proof of Theorem 2.1.18: We use Corollary 2.3.4. Consider any set
with . Choose any set with , , ,
and . Denote the elements of by . . ..
By Corollary 2.3.4, we may construct a totally minimal, totally uniquely ergodic,
topologically mixing with the property that for every , there exists
with for all . For every , define some
by , for all , and for all C.
Then, for any such , for all , , and
for all . Since for all , , ,
whereas . It is then clear that is not a limit point of . Since
for all , , for , and so the set is
uncountable.
39
It is natural to wonder about one aspect of the proof; why is it that in our construc-
tion, we only force certain digits to occur along shifted subsets of , rather than
along entire shifted copies of ? The reason comes from a combinatorial fact which
is somewhat interesting in its own right:
Example 2.3.5. There exists a set with with the property that for
any infinite set G of integers, the set :d g has upper
Banach density one.
Proof: We begin with the sequence , where is the maximal integer
so that . So, begins 1, 3, 1, 9, 1, 3, 1, 27, 1, 3, 1, 9, 1, 3, 1, . . . Then, define
. is then an increasing sequence of integers, with the property that
the gap is for all . We first claim that . Choose any
positive integer , and any consecutive elements , , of the sequence
. There must be some integer such that . This means
that , and so that . This means that
any interval of integers of length less than can contain at most elements of
for any , which implies that since . Therefore,
if we construct a new sequence by increasing the gaps , it will still have upper
Banach density zero. We will change countably many times, never decreasing
any element. In other words, we inductively construct, for every , a sequence
, so that these sequences are nondecreasing in , i.e. for all , .
We add the additional hypothesis that for every , , in other
40
words that only a density zero subset of the elements of have been changed after
any step.
Step 1: We change for some infinite, but density zero, set of , so that for every
positive integer , there exists such that . This is clearly possible; for
example, by increasing for a density zero set of odd . Call the resulting sequence
.
Step : Assume that we have already defined , a sequence of integers
with the property that for all , and that .
Define the set by . Since { :
has density zero, there exist infinitely many intervals of integers such
that for all , and so that for all for any . Therefore,
by the construction of the sequence , each contains a subinterval of integers
of length so that for every , and for all , . We may also assume,
by passing to a subset if necessary, that the union of all has density zero. We now
take any bijection from to , and for every , if ,
and if , define , , , . After
making these changes on each , for any not in any , define . This
defines for all . Since for every , , by the inductive hypothesis we
see that for all . Also, since , and since by the
inductive hypothesis, , we see that ,
completing the inductive step.
It is a consequence of this construction that if we define
41
for every , the sets are pairwise disjoint. Therefore, the sequences have
a pointwise limit, call it . By the construction, for any , and for any k-tuple
of integers all greater than , there exists such that for
. And, since the are disjoint, this means that for .
Now, define the sequence for all . As already noted, since for
all , has upper Banach density zero. We claim that for any infinite set
of integers, contains arbitrarily long intervals of integers. Fix any infinite
, with . For any , there exist , , . . . , , so
that for every . Then, by construction, there exists
so that for . This means that
for . But then for each such , . .
This implies that is an interval of integers of
length which is a subset of . Since was arbitrary, contains arbitrarily
long intervals and so has upper Banach density one.
This answers our question: if we had, in the proof of Theorem 2.1.15, tried to force
ones to occur along infinitely many shifted copies of our set of upper Banach
density zero, it's possible that no matter what set of shifts we used, we would
be attempting to force for arbitrarily long intervals of integers and some
, which would imply that , yielding the closed invariant set
and contradicting the minimality of .
42
2.4 Some general constructions on connected manifolds
We will now construct some totally minimal, totally uniquely ergodic, and topo-
logically mixing topological dynamical systems for which is a connected
manifold, and in Section 2.5 we will use such examples to prove Theorems 2.1.16 and
2.1.19. The constructions in question will use both skew products and flows under
functions. We will be repeatedly using the topological space , which can most easily
be considered as the half-open interval with 0 and 1 identified and the operation
of addition (mod 1). (Whenever we refer to the addition of elements of , it should
be understood to be addition (mod 1).) We also note that is a metric space for
any with metric defined by , where is the
Euclidean metric in . We denote by Lebesgue measure on for any .
For any , irrational ( and continuous self-map of , we define the
homeomorphism on as follows: for any ,
, , and for every , .
Theorem 2.4.1. For any f differentiable with for all x , is
totally minimal and totally uniquely ergodic with respect to .
Proof: During the proof, since , (, and are taken to be fixed, we suppress
notational dependence and refer to simply as . Fix any rectangles
and in , where and 77: are intervals of length for every
. Now, fix any positive integer such that . Fix any
, , , and define the set
43
.
(Here and elsewhere, : is the usual projection map for )
Since , by choice of we know that is an interval
with . Now, define the set
.
We will examine the structure of by bounding from above and
below. For this, we note that , and
make the observation that for any , is equal modulo one to an increasing
function in whose slope is between and . Therefore, considered as a function
of , is equal modulo one to an increasing function from
to whose derivative is in for all . This implies that is a union of many
intervals separated by gaps of length less than , where the length of all intervals but
the first and last is greater than . For large , this implies that contains some
set a union of intervals of length for some constant , where
for some constant .
We proceed inductively: for any , assume that we are given a union of
intervals whose lengths are for some constant , where
for some constant , and such that for any , for
. We now wish to define . Define the set
44
.
For any interval I of length in , let's examine . We note that
,
where is some function of , . . . ' which does not depend on . So, as a
function of , is equal modulo one to an increasing function
from to whose derivative is between and for some constants , .
This implies that for large , is a union of many intervals separated by gaps
of length less than , where the length of all but the first and last is greater than
. For large , this implies that contains , a union of intervals of
length where for some constants , . By taking
to be the union of all , we see that is a union of intervals of length ,
where for some constants and . Since , for
any , for .
By inductively proceeding in this way, we eventually arrive at a set where
for some constant , and where for every . By
integrating over all possible , , , we see that . We have
then shown that for any with , .
Denote by the set of such . Then if we define to be the transformation
on , then , where . It is
easily checked that . This means that for any , ,
45
M,...,,
which approaches as by total unique ergodicity of with respect
to . Then, for large ,
Therefore,
. (2.1)
We have shown that Equation 2.1 holds for and arbitrary congruent cubes in .
It is clear that it also holds for and disjoint unions of congruent cubes. Suppose
that for some , there exists a Lebesgue measurable -invariant set
with . By taking complements if necessary, without loss of generality we
may assume that . By regularity of Lebesgue measure, there exists 6 and
a union of cubes with side length such that is also a union of cubes of side
length , , and . Then, for any , since
and , as well. Similarly,
. Therefore, for all
. Since , this contradicts Equation 2.1.
So, is ergodic with respect to for every . We claim that this also
implies unique ergodicity of for every , which follows from an argument of
46
Furstenberg, and rests on the fact that is a skew product over an irrational circle
rotation. The following fact is shown in the proof of Lemma 2.1 in [Fu] on p. 578:
For any minimal system which is uniquely ergodic with respect to a measure
, and any skew product which acts on by
where : is a continuous function, if is ergodic with respect to ,
then is minimal and uniquely ergodic with respect to .
Denote by the action of on its first coordinates for any . Since
they are factors of , each is ergodic with respect to . Also, for each
, is a skew product as described above with . We
may then use Furstenberg's result and the fact that is minimal and uniquely
ergodic with respect to (since it is an irrational circle rotation) to see that
is minimal and uniquely ergodic with respect to . We can continue inductively in
this fashion to arrive at the fact that is minimal and uniquely ergodic with respect
to . Since was arbitrary, is totally minimal and totally uniquely ergodic with
respect to .
We will now use these skew products to define flows under functions which have all
of the previous properties and are also topologically mixing. Define the continuous
function by
.
47
Note that for all , . We then define the space :
, , , where and
are identified for all , , . is then homeomorphic to the mapping torus of
and a continuous map, and so is a connected -manifold. For any irrational
, , we then define the continuous map " ,, : by
.
Finally, we define a -invariant Borel
probability measure on . We will prove the following:
Theorem 2.4.2. For any differentiable with for all and any
irrational ( , which are linearly independent and which satisfy and
, where and are the digits in the continued fraction expansions
of and respectively, is totally minimal, totally uniquely ergodic,
and topologically mixing.
Again, since ( , , and are fixed, for now we suppress the dependence on these
quantities in notation and denote the transformations " and by
and respectively. We also make the notation, for any integer ,
, and define . The proof of Theorem 2.4.2 rests
mostly on the following lemma, which is essentially taken from [Fa] .
Lemma 2.4.3. For any suffiffifficiently large integer , , , and
any with ,
48
.
Proof: , where
,
.
The following facts are proved in [Fa], p. 454:
For all , , . (2.2)
For all , , , . (2.3)
For all , , , . (2.4)
For any , . (2.5)
For any , . (2.6)
We will use these to prove Lemma 2.4.3. It is easy to check that
49
.
We bound the first term from above and below and the rest from above.
, and since ,
. By (2.2) and (2.5), . Since ;' ,
. Therefore, .
Next, by (2.3),
.
We similarly have
.
Also, from (2.2), we can conclude that for large
.
Finally, from (2.6) and (2.2), we see that
,
50
and so since and , this implies that
.
By combining all of these bounds,
,
and since , for large we have
.
The proof of the following fact is trivially similar:
Lemma 2.4.4. For any suffiffifficiently large integer n , x , p ,
and any with ,
.
Proof of Theorem 2.4.2: Consider any , and any cubes
and where and are intervals of length for .
Take intervals and of length central in and respectively.
51
Define . We also make the following definition: is the integer
such that . Alternately, for any ,
.
Now, fix any and . We First define the set
.
For large , for some constant , and is a union of many
intervals where all but the first and last have length . By removing those, we can
define a union of intervals of length where for some constant
. We then define the set
: , ,
.
We wish to use Lemma 2.4.3 to analyze the structure of . First we note that since
1 for all , , by definition of ,
, and so for large enough , , or
. By our assumption on , this means that for any , , ,
. Now, fix any interval I of length in , and let
us examine . By the preceding remarks and Lemma 2.4.3,
for every . Since ,
(2.7)
52
for every . This means that has the same sign for all ,
and without loss of generality we assume it to be positive. Let us define to be the
set of possible values for for . (Since is continuous for every
, is an interval of integers.) Then for any fixed , define :
:
. Since ,
by (2.7) , or for all except the
smallest and largest, for which could be smaller.
Since , this means that the number of elements in approaches infinity
as does. Now, we note that , , where is the set of
where , , and . Then
, where is the rotation on given by
and : ,
, . Since ( , and are rationally independent, , is
uniquely ergodic, and so as , .
Due to the already established bounds on for , this implies that there is
a constant so that for every interval I in . By
removing the possibly shorter first and last subintervals of , we have
a union of intervals of length greater than with for some
constant . We take to be the union of all , and then for
some constant . Finally, we define
53
.
Note that is a union of intervals , and so fix any such interval of length greater
than . By definition, for any , and
ranges monotonically from 0 to as increases
over . Since, by Lemma 2.4.3, ,
for some constant . By taking the union over all , for some
.
Consider any . We know that , and by
the proof of Theorem 2.4.1, this implies that if we define the set :
, 1 }, then for some constant
. But this means that for any ,
is in by definitions of E3 and . So, there exists a constant such
that . By integrating over all
possible , , we see that there is a constant such that
.
This argument works for any large enough for some . An analogous
argument using Lemma 2.4.4 and which involves varying instead of shows that
the same is true for any large enough . However, this implies that
54
for all sufficiently large and any congruent cubes and ,
, which implies that is totally ergodic with respect to and
topologically mixing. It remains to show that is in fact totally uniquely ergodic.
Our proof is similar to the argument of Furstenberg used earlier, however since this is
about a flow under a function and his argument was about skew products, we present
the proof in its entirety here. Fix any . Since is ergodic with respect to ,
. every point of is -generic. Since is shift-invariant, and since shifts
in the last coordinate commute with , if a point is -generic,
is as well for all . This implies that for . ,
the fiber consists of -generic points. Denote by this set
of which give rise to -generic fibers. Choose any . We
know that for large , , and that is increasing
in . Therefore, the set has positive density. The skew
product on defined by is totally uniquely
ergodic with respect to for the same reason as in the proof of Theorem 2.4.1. (The
only difference is that here the base case is the rotation on given by
instead of an irrational rotation on T. However, since ,
and are rationally independent, this rotation is also totally uniquely ergodic and
totally minimal.) This implies that the set {I : , } has density one.
Together, these facts imply that there exists such that , or
that for some and R. This means that
is -generic, and so is as well. Since was arbitary,
every point in is -generic, and so is uniquely ergodic with respect to
55
. Since for every nonempty open set , is minimal as well. Since
was arbitrary, is totally uniquely ergodic, totally minimal, and topologically
mixing.
2.5 Some counterexamples on connected manifolds
Proof of Theorem 2.1.19: Our transformation is for properly chosen
(, , , and . We always take ( to be the golden ratio because of a classical
fact from the theory of continued fractions:
Lemma 2.5.1. For any n , the distance from no to the nearest integer is greater
than .
and can be any irrational elements of satisfying the hypotheses of Theo-
rem 2.4.2. What remains is to define . Before doing so, we use our sequence to
construct another increasing sequence of integers. For any , take .
In other words, for any , .
(Here and .) As before, for large , .
We claim that for all large enough . This is because
:
( , , ,
56
,
,
Therefore, since , for large for
the same reasons as before. This implies for large that
since . We
wish to choose so that is bounded away from 0, where is the zero
vector.
We define as an infinite sum: is a function from
to , and (mod t) is a self-map of T. In this sum, with
, and will be chosen later, and the function for any
and is a function defined by
The pertinent properties of are that it is nonzero only on the interval
, it attains a maximum of at , and that its derivative is bounded from
above in absolute value by . Since each term in the definition of is a
differentiable function with derivative bounded from above in absolute value by ,
and since and the identity function has derivative one everywhere, is
a differentiable function with for all T. This shows that for any
57
choice of with , and for any choice of , , by Theorem 2.4.2, is
totally minimal, totally uniquely ergodic, and topologically mixing.
We wish to choose so that is bounded away from 0. The only quantities still
to be chosen are , and . We note that for any and any ,
. This can be proved by a quick induction, and is
left to the reader. In particular, if we make the notation for any
, then . Our goal is to choose , , and
so that for all sufficiently large . To do this, we choose for all
, and , by Lemma 2.5.1. This guarantees that
for any , . This means that each choice of that
we make changes the values of for only , and allows us to finally inductively
define .
Recall that our goal is to ensure that for all
sufficiently large . Note that since is an increasing sequence of integers,
for all . We have already shown that for all large , and so
, and so
for all large , and so converges, a fact which will be important
momentarily. We choose so that . The procedure for
defining the sequence is then as follows: for . For any ,
assume that for . Then, we choose so that . Note that
for any , for some : .
This means that taking gives . Note that
58
for large , and recall that for all
by Lemma 2.5.1. This means that , which,
by the hypothesis on the sequence , is less than , again for
sufficiently large . This means that by
definition of . We have then chosen so that for all
. For , note that , since ( ( Q. This means that
for all , .
However, by definition, for all . Therefore,
is bounded away from (0, 0, 0, 0), and so since is totally uniquely
ergodic, totally minimal, and topologically mixing, we are done.
We note that there was nothing special about the number in this proof, and so the
proof of the following corollary is trivially similar:
Corollary 2.5.2. For any increasing sequence of integers with the property that
for some integer , for all suffiffifficiently large , and for any
sequence , there exists satisfying the hypotheses of Theorem 2.4.1 so that
for all suffiffifficiently large , .
Proof of Theorem 2.1.16: Our transformation is for the same ,
and as before. We use the same strategy as we did to prove Theorem 2.1.15; in
other words, we will be forcing certain types of nonrecurrence behavior along a set
comprised of a union of infinitely many shifted subsequences of .
59
We now proceed roughly as we did in proving Theorem 2.1.15. We define a sequence
by shifting different by different amounts. First, define the intervals of integers
for every , and take any partition of into infinitely
many disjoint infinite sets , , . . .. We denote the elements of , written in
increasing order, by , - Choose some large enough so that .1
for (inf ) !, define the set , and then define for all
. Next, choose some large enough so that for (inf ) !,
and define , and then define for all . Continuing in
this way, we may inductively define for all so that for some
with the property that for (inf ) !, and then define
for all . For any , . Note that by the construction, for
any , where , it must be the case that , and therefore that
, and so that . Therefore, since for all ,
, and so is increasing. Since (inf ) !,
, and so in particular . This implies that
for all . Finally, we see that for
all large enough . Since as , this means that
for large enough .
We again define a sequence : for any , take . In other words,
for any , . (Here
and .) For exactly the same reasons as in the proof of
Theorem 2.1.19, for large .
60
Therefore, by using Corollary 2.5.2, for any sequence , we may choose
such that for all . We define for
any where for with odd , and for any where
for with even . For any not defined by these conditions, may
be anything. (We can take for such if it is convenient.) Take
such that if , if , ,
and . Now, we note that
: does not
:
: ),
and that the latter set, call it , is clearly a . We will show that is dense in .
Choose any nonempty open set . By minimality of , there is some so that
. By construction, for all . Also by construction,
, and for
and odd. But then for any odd integer , for any
, and so
,
which is clearly larger than for sufficiently large . Similarly, for any even integer
, for any , and so
61
,
which is also greater than for sufficiently large . This implies that
, and so that is nonempty. Since was arbitrary, this shows that is dense,
and so a dense . Therefore, is a residual set by the Baire category theorem, and
for every , does not converge.
We note that exactly as in the proof of Theorem 2.1.15, this in fact shows that
for a residual set of , and
.
2.6 A counterexample about simultaneous recurrence
Given a totally minimal system , any point , and any two positive integers
and , since and are minimal it must be true that
and . It is then the case that there exist sequences of positive integers
and such that and . By Theorem 2.1.17, for a
residual set of , it is possible to choose . To prove Theorem 2.1.21, we must
exhibit an example of a totally minimal system for which this residual set is not all
of .
Proof of Theorem 2.1.21: For any , we define a symbolic dynamical system
as follows: define by ( (mod 1)), where
62
is any sequence of rationally independent irrational reals, and
. Then, define a to be the left shift on , and to be the countable
product of a on : . is then defined to be the
orbit closure of under .
We claim that and prove Theorem 2.1.21. Consider any , and
choose . Consider a sequence of integers where and
. Then clearly and . Since , this implies
that for all large enough . Therefore, (mod 1),
(mod ) for all large . This implies that (mod ) (mod )
(mod t) for all large . However, , so (mod ) and
(mod ) . We then see that (mod 1) . If
(mod ) , then for some positive integers , .
Since , , it must be the case that , and then we have a contradiction since
. Therefore, (mod t) , and since
(, for all large enough .
It remains to be shown that is totally minimal. Note that for any ,
is a system defined similarly to , with the sequence of irrationals
given by ( for N. Therefore, it suffices to show that is
minimal, since the definition of requires only that the set of irrationals
is rationally independent, and if this is true, it will be true of as well. To
show that is minimal, we must show that the orbit of any
under is dense in . It suffices to show that for any finite set of words
63
with to; a subword of for , that there exists m such that
for .
Choose any . Since is a subword of , there exists such that .
This implies that if we denote by the length of , then the set contains
(mod 1), where (mod t) if and
(mod 1) if . We now claim that the fact that is nonempty
implies that it contains an open interval. Note that for each , is either a union
of half-open intervals or a union of half-open intervals with a singleton, and that due
to the irrationality of ( and the rationality of all endpoints of intervals in , the
set of endpoints for intervals in is disjoint from the set of endpoints of intervals
in , for any . Since , there exist such
that , each is either a half-open interval or a singleton, at most one
is a singleton, and the have no endpoints in common. This implies that either
contains an interval, implying that does as well, or that for some
, is the singleton { (mod 1)}. In this case, is not a singleton for
, and so , contains an interval , and { (mod 1)} lies in
the interior. Choose such that . Since is a singleton,
(mod 1). Choose such that . Then (mod ) , and by
our choice of , (mod t) I as well. Therefore,
(mod 1) contains an interval, and since for and
(mod ) , we see that contains an interval as well.
For each , denote by some interval contained in Choose
64
some . We now note that since the sequence is rationally
independent, the set (mod 1), (mod 1), . . . , (mod is dense
in . Therefore, there exists such that the set (mod 1),
(mod 1), . . . , ( (mod 1)) is -dense, meaning that every point in is
within a distance of less than from some point in . This means that
(mod 1), (mod 1), . . . , ( (mod 1)) , . . . , (mod t)
is also -dense for any . We note that then for any 1 and any
, (mod 1) implies that (mod 1) , and so that
. . . , or that . Define .
Now, choose any . Since , there exists such that
. . . . . . for . Then, by the above
comments, there exists such that . . .
for . Therefore, (rn ) (rn ) . . . (rn )
1) . . . for , and since for , for
. Since were arbitrary, we have shown that the orbit of under
is dense in , and since was arbitrary, that is minimal. We already
showed that this in fact implies that is totally minimal, and so are done.
2.7 Questions
There are some natural questions motivated by these results. For any totally minimal
and totally uniquely ergodic system , any nonempty open set with
65
, and any , take the set {a : }. Since is
uniquely ergodic, the density of equals , and
so the sequence of the elements of written in increasing order does not satisfy
the hypotheses of any of our theorems. However, since , and since
for all , is bounded away from .
For a similar example, take to be a totally minimal and totally uniquely ergodic
isometry of a complete metric space with diameter greater than 2. Take ,
with , and define where for all and
for all . Then, take the sets and
. Since is uniquely ergodic, and
. For any ,
, and so . Also for any ,
, and so
. This means that by making a sequence by alternately choosing
longer and longer subsequences of and , we could create such with
(so does not satisfy the hypotheses of any of our theorems) where
IIIII fails to converge for any in the second category set .
The point of these examples is to show that our hypotheses are certainly not the only
ones under which examples of the types constructed in this paper exist. This brings
up the following questions:
Question 2.7.1. For what increasing sequences of integers does there exist
66
totally minimal and totally uniquely ergodic system and a point such
that is bounded away from ?
Question 2.7.2. For what increasing sequences of integers does there exist
totally minimal and totally uniquely ergodic system and a function
such that fails to converge for a set of of second category ?
Also, it is interesting that we could create examples for a wider class of sequences
when was not connected. We would like to know whether or not this is
necessary, i.e.
Question 2.7.3. Given an increasing sequence of integers and a totally minimal
totally uniquely ergodic topological dynamical system (X,T) and x such that
is bounded away from x, must there exist a system with the same properties where
is a connected space?
Question 2.7.4. Given an increasing sequence of integers and a totally minimal
totally uniquely ergodic topological dynamical system and such that
fails to converge for a set of of second category, must
there exist a system with the same properties where is a connected space?
67
(JIAPTER 3
PERTURBATIONS OF MULTIDIMENSIONAL SHIFTS
OF FINITE TYPE
3.1 Introduction
Consider a symbolic dynamical system with positive topological entropy . (A
rigorous definition of topological entropy appears below, but informally it is the expo-
nential growth rate of the number of different -letter words appearing as subwords
in elements of ) Fix any word for which appears as a subword
of some element of . Then, define to be the set of all elements of in
which is not a subword. Clearly , and so the topological entropy of
is not more than that of . We wish to estimate the size of the drop in topological
entropy , and how this quantity behaves as . To more
rigorously state and approach this problem, we begin with some definitions. Some of
these terms have already been defined in Chapter 2, but we repeat the definitions for
the more general setup of Chapter 3.
We denote by the . . . parallelepiped in ,
and use the shorthand notation for the cube . , . . . , are the sizes of
, and is called the size of . We use to denote the metric on : for
68
, , . For each , we denote by
the element of the standard basis for , and for , , the distance in
the -direction between and is defined to be .
Definition 3.1.1. An alphabet is a finite set. For any alphabet , we define the
shift action on as follows: for any and ,
for all .
Definition 3.1.2. -subshift on an alphabet is a set with the following
trvo properties:
(i) is shift-invariant, meaning that for any and , .
(ii) is closed in the product topology on Q.
For any -subshift , is a symbolic dynamical system.
Definition 3.1.3. A Borel probability measure of supported on a subshift is er-
godic if the measure-preserving dynamical system , , is ergodic,
where is the Borel -algebra of .
Definition 3.1.4. A word on the alphabet is any mapping from a non-empty
subset of to , where is called the shape of . For words and , where
has shape , we say that is a subword of if there exists such that
for all . In this case, we say that occurs in at , or
that .
Definition 3.1.5. A word with shape is periodic with respect to if
and for all . We also say that is
a period of . A word is aperiodic if it has no periods.
69
Definition 3.1.6. For any , we say that trvo words u and with shape
S agree on T if , i.e. u and have the same letters on T.
Definition 3.1.7. The boundary of thickness k of a subset S of , which is
denoted by , is the set of points p of S for which there exists a point q
with . Whenever we refer to the boundary of a shape S, we mean the
boundary of thickness one.
Definition 3.1.8. T is called a copy of a shape S if T for some
v . This v is then called the difference of S and T, denoted by T-S. We say
that a point p corresponds to a point q if p -T).
Definition 3.1.9. The language of -subshift X, denoted by , is the set of
words which appear as subwords of elements of X. The set of words with a particular
shape S which are in the language of X is denoted by .
Definition 3.1.10. For -subshift X, and for any finite shape , the number
of words in is denoted by , and the natural logarithm of this quantity
is denoted by .
Definition 3.1.11. -shift of finite type is -subshift defined by specifying
a finite collection of words on A, call this collection , and then taking X
to be the set of elements of which do not contain any member of as subwords.
Different collections could induce the same shift of finite type; for any fixed X,
the type of X is the minimum nonnegative integer t such that for some consisting
entirely of words with shape , X . A shift of type trvo is called a Markov
shift.
70
Definition 3.1.12. A word v with shape S forces a word w with shape T in a shift
of finite type X if any x with also has .
Definition 3.1.13. A subshift is topologically transitive if there exists
for which is dense in , . if there is which contains every word
in as a subword.
Definition 3.1.14. The topological entropy of a subshift X, denoted by ,
is defined by
, '
. .
Definition 3.1.15. The measure-theoretic entropy of a subshift with respect
to a shift-invariant probability measure on X, which is denoted by , is defined
by
1
,
. .
, '
where for and .
Both of these limits exist by subadditivity.
We now restate the question from the beginning of Chapter 3. Suppose that a
subshift on an alphabet is given, and consider any -letter word .
One can then define the subshift as consisting of all elements of which do not
contain as a subword. Since , , but it is natural to
wonder how much the entropy decreases by. For example, is it necessarily the case
71
that as ? It is not hard to check that this is not always
the case. For example, if is minimal, then every contains every as
a subword. This means that for any word , , and so .
This means that minimality is possibly too globally defined for our purposes. It is
then natural to move to the completely local definition of a shift of finite type. In
this setup, as was already mentioned in Chapter 1, there is the following result of
Douglas Lind:
Theorem 3.1.16. ([L], p. 360, Theorem 3) For any topologically transitive Z-shift
of finite type with positive topological entropy , there exist constants
, , and such that for any and any word , if we denote
by the shift of finite type , then
.
Lind was apparently examining this question in order to prove some results about
the automorphism group of a shift of finite type; he needed to know that
as the size of the word approaches infinity. ([BoyLR]). Some
work on bounds of this type in the case where was also done by Guibas and
Odylzko. ([GO1], [GO2])
We wish to prove a version of Theorem 3.1.16 for multidimensional shifts of finite
type, i.e. shifts of finite type on with . In the remainder of this section, we
will discuss two relatively significant changes that we make in the statement of the
multidimensional extension. This discussion will have the dual purposes of motivating
72
our result and illustrating some of the issues that arise in higher-dimensional symbolic
dynamics. Our methods are similar in places to those used in [QT], where the effects
on entropy of removing words from shifts of finite type are also examined.
The first problem in giving amultidimensional version of Theorem 3.1.16 comes in
changing the exponents in the denominators. The most reasonable guess is that
instead of using , in the extension wwee should have However, we
will show that this is a bit too much to hope for. Consider the following examples:
take . is clearly a shift of finite type, with the set of forbidden words .
For every , we define a shift of finite type as follows: define the alphabet
, and define : by
for every and . This map is usually called the higher block presentation
of Y. (see [LM]) In other words, for every , is defined to be the
subword of which lies in (1, , 1), or a copy of whose least vertex in
the usual lexicographic order on is . Figure 3.1 illustrates the action of on a
typical element of for : (the lower-leftmost entry of each array is at the point
In Figure 3.1, the horizontal and vertical lines separate letters of the alphabet. So,
the left half shows a word with shape whose letters are elements of , namely
zeroes and ones, and the right half shows a word with shape on the alphabet ,
which are themselves words with shape whose letters are zeroes and ones.
With so defined, is a subshift with alphabet . In fact it is easily checkable
that is a Markov shift; is an element of if and only if for
73
1101000111
00 01 11 11 00 01 01 11 11 11
1 | 1 | 0 | 1 |
0 | 0 | 1 | 1 |
0 | 1 | 1 | 1 |
1 | 0 | 0 | 0 |
1 1 0 0 | 1 0 0 1 | 0 1 1 1 |
0 0 0 1 | 0 1 1 1 | 1 1 1 1 |
0 1 1 0 | 1 1 0 0 | 1 1 0 0 |
Figure 3.1: 's action on a sample element of
every and , ,
where the is in the subindex in both cases. Since is a shift-commuting
bicontinuous bijection between and , is topologically conjugate to
for every O.
Since and are such canonical examples of shifts of finite type (the full shift
is in some sense the simplest shift of finite type, and every is conjugate to it),
any meaningful extension of Lind's result should certainly apply to them. can be
considered as a map between words just as easily as a function between subshifts: in
the diagram above, if one disregards the dots surrounding the words, then can be
thought of as sending to . For this reason, and since
is a topological conjugacy for all , if we consider the shift of finite type resulting
74
from removing some word with shape from , and the shift of
finite type resulting from removing from , then and
are topologically conjugate as well, and therefore have the same topological entropy.
In other words, the removal of a word from results in the same drop in
topological entropy as the removal of a word from . Now, let us
suppose that Lind's result can be extended by simply changing the in the exponent
to an . This would result in something that looks like this:
Possible Theorem 3.1.17. For any d 1 and any topologically transitive shift
X of finite type with positive topological entropy , there exist constants
, , and such that for any n and any word w , if we denote
by the shift of finite type , then
.
We can now show that this possible extension cannot be true; suppose that there
exist such constants , , and , which satisfy Possible Theorem 3.1.17 for ,
and for any , , , and , which satisfy Possible Theorem 3.1.17
for . Then, for any word , where ,
.
By the previous argument, this means that for any ,
. (3. 1)
75
And by the definitions of , , and , and since ,
as long as , we know that
. (3.2)
However, since , for large enough , since ,
grows much more quickly than Therefore (3.1) and (3.2) contradict
each other. The problem that must be addressed is that due to examples such as
these, there must be some sort of a "tolerance range" around in any meaningful
multidimensional version of Theorem 3.1.16. In other words, we should amend our
possible extension to the following:
Possible Theorem 3.1.18. For any d 1 and any topologically transitive shift
X of finite type with positive topological entropy , there exist constants
, , , , and such that for any n and any word w
(X), if we denote by the shift of finite type , then
.
Since could be arbitrarily large in the definition of , it must be the case that
and depend on rather than being absolute constants. In fact, such constants
and could be thought of as being present in Theorem 3.1.16 as well, but
hidden inside the constants and . This new version is still not true, however.
76
Figure 3.2: A portion of a sample element of
This time, the problem is not with the conclusion, but the hypotheses. Topological
transitivity is not nearly a strong enough hypothesis in several dimensions to get
the type of result that we are after, and in fact we will see that topological mixing
is not nearly good enough either. To show this, we consider a subshift called the
checkerboard island shift, which we call , defined by Quas and §ahin in [QS]. is
a -Markov shift, on an alphabet with thirteen letters. is also topologically
mixing; for a proof see [QS].
All letters of and elements of are displayed in Figure 3.2. In other words,
a two-by-two word of symbols is in if and only if it appears as a subword of the
word in Figure 3.2. Consider the word with shape in Figure 3.3, which we call .
It will not be shown here (details are in [QS]), but an occurrence of in any element
of forces the arrows and diagonal lines surrounding in Figure 3.2, along with
a boundary of thickness one filled with empty spaces. In other words, wherever ,
77
which has shape , appears, it forces a word, which we call , of shape .
appears in Figure 3.4.
Figure 3.3:
Figure 3.4:
In fact, for every , there is an analogous word (a checkerboard of size
with a boundary filled with arrows and diagonal lines in an analogous way to that
78
of ) which forces a word of size around it. We claim that for any , the
shift obtained by removing from and the shift obtained by removing
from are equal. Since is a subword of , it is obvious that .
Suppose that . Then there exists which has an occurrence of
, but no occurrence of , which, as has already been stated, is not possible. So,
. This implies that . However, since
and , we have a contradiction to Possible Theorem 3.1.18
for the same reason as we did before. Since this is true for all , if we wish to adjust
the conditions of the conclusion again, we will have to change our "tolerance range"
from to something like , which would weaken the result
considerably. To obtain a conclusion of the strength we want, we need a stronger
mixing property than topological mixing. The notion which we use is that of strong
irreducibility.
Definition 3.1.19. -subshift X is strongly irreducible with uniform filling
length R if for any trvo subsets S, T of , and for all p such that
R (that is, for all q and r , d ( q, ), and for any elements x, ,
there exists y such that and .
For the sake of comparing, we also include a definition of topological mixing whose
formulation looks similar to that of strong irreducibility:
Definition 3.1.20. -subshift X is topologically mixing if for any trvo subsets
S and T of , there exists such that for all p with the property that
79
, and for any elements , , there exists such that
and .
The difference in the two definitions is subtle but important: for strong irreducibility,
the distance that must be present between two shapes in order to make interpolation
possible is independent of the shapes involved, whereas in topological mixing, this
distance can depend on the shapes. This distinction is irrelevant in one dimension,
since strong irreducibility and topological mixing are the same concept for -subshifts.
(To see this, note that for a -shift of finite type, a distance sufficient for interpolating
between two letters of the alphabet is sufficient to interpolate between any two words.)
However, strong irreducibility is a strictly stronger notion in more than one dimension.
Pertinent for our purposes is the fact that if for some in a stongly
irreducible shift of finite type , then the occurrence of at can only "force"
letters to appear in the region of wrttnn distance of . This is because for any
with and any , by the definition of strong irreducibility
there is some with and . For this reason, the checkerboard
island system is topologically mixing but not strongly irreducible. Armed with this
stronger hypothesis, we can now refine Possible Theorem 3.1.18 a bit more:
Possible Theorem 3.1.21. For any d and any strongly irreducible shift X
of finite type with uniform filling length R and positive topological entropy ,
there exist constants , , , , and such that for any n
and any word w , if we denote by the shift of finite type , then
80
.
In fact the above is now provable with correct choices for the pertinent constants,
and so we state our main result:
Theorem 3.1.22. For any d and any strongly irreducible shift X of finite
type with uniform filling length R and positive topological entropy , there exist
constants and such that for any n and any word w ,
if we denote by the shift of finite type , then
.
Before beginning with the preliminaries of a proof, we make an observation: our
example shifts of finite type used above to show the necessity of the constants
and in this statement did not actually show that and are not the
same; it is possible that , for all . However, we may now introduce
another example to show that in fact and must be distinct for certain .
Consider, again for any , and any dimension , the shift of finite type on
the alphabet {0, 1} defined by the set of forbidden words :
has at least two ones}. In other words, consists of all infinite words of zeroes
and ones on such that any two ones are a -distance of at least from each other.
By definition, is clearly a shift of finite type. We also claim that it is strongly
irreducible with : consider any words and such
that . Take defined by , , and
81
for any . We claim that : suppose, on the contrary, that .
Then there exist , such that , and . If , ,
then contains two ones a -distance less than apart and cannot be in , a
contradiction. We arrive at a similar contradiction if , . Clearly we cannot
have and since , and similarly it cannot be the case that
and . But since the only ones in are in or , we have exhausted
all possible cases. Therefore, . By definition, is strongly irreducible with
. We claim that in the language of Possible Theorem 3.1.21, it must be the
case that , and in fact that .
To prove this, we first make a definition: for any , we define the word with
shape by if and only if all coordinates of are equal to one modulo
. since it contains no two ones a -distance of less than apart. Let
us also for any define the word with shape where has as
a subword occupying the central , and has zeroes on all of . By the
definition of , any occurrence of in an element of forces an occurrence of
containing it. Therefore, as argued for , for any . However,
the size of the shape of is bigger than that of for any , and so for any
, , , and which satisfy Possible Theorem 3.1.21, it must be the case
that - for any . Although this does not show that the
bounds in Theorem 3.1.22 are optimal, it does show that must be at least
linear in , and so the value of for - attained in Theorem 3.1.22
can be improved by at most additive and multiplicative constants.
82
Before beginning to prove our main result, we now briefly outline the content of this
paper. The main idea in the proof of Theorem 3.1.22 is to use strong irreducibility
to find a word where and agree on , and then define a map which
replaces all occurrences of by in large words . In order to know how
many-to-one this map is, it is important to know how many times and appear in
a "typical" . In Section 3.2, we use ergodic measures of maximal entropy
to show that for fixed , any , and large , there are "many" words
in for which we have good bounds from above and below on the number of
occurrences of in . Namely, for a large set of , the number of occurrences of is
roughly , where is an ergodic measure of maximal entropy on . We then
use a result of Burton and Steif ([BuSl], ) to give upper and lower bounds on
of the type
There is a slight problem though; it is entirely possible that a replacement of
by could create a new occurrence of , which could make this replacement map
extremely unwieldy and hard to deal with. What we would like is a word such
that replacing by can never create a new occurrence of . Unfortunately, this
is impossible for some choices of . In Section 3.3, we prove a slightly weaker fact
that is still sufficient for our purposes; for any , there exists a subword
with not much smaller than for which there exists where
and agree on and where a replacement of by can never create a new
occurrence of .
Section 3.4 contains the proof of Theorem 3.1.22. The proof is done by considering
83
maps between and which either introduce or destroy occurrences of
(which is done by using the result of Section 3.3 to perform various replacements) , and
estimating the sizes of images or preimages under these maps. We can make certain
helpful assumptions about the words in which these replacements take place (such as
the approximate number of occurrences of ) by using the results of Section 3.2.
In Section 3.5, we deal with a simplified version of our main result where the word in
question is already chosen to have a nice property. (namely, that there exists such
that replacing by can neither accidentally create new or or destroy existing
or ) In this case, we prove that (In )
for any an ergodic measure of maximal entropy on and an ergodic measure
of maximal entropy on . We then show that it is not possible to prove a lower
bound on of the form for constant .
In Section 3.6, we prove a corollary using our methods: for any alphabet , and
given any set of words , . . . , where the size of is much greater than that
of for , , where is the full shift. This can be seen
to be related to some of the so-called undecidability questions in shifts of finite
type; given a set of words, it is algorithmically undecidable whether or not
is nonempty. The corollary described gives a condition under which this question is
easily answered.
Finally, in Section 3.7, we give some open questions that arise in the course of proving
our results.
84
3.2 Some measure-theoretic preliminaries
We will now move towards proving Theorem 3.1.22. We first need a lemma about
strongly irreducible systems:
Lemma 3.2.1. For any -subshift which is strongly irreducible with uniform fill-
ing length and for any rectangular prism , , '
.
Proof: We first prove the first inequality: take any rectangular prism . For
any , since can be partitioned into disjoint copies of ,
, ' , ' Take natural logarithms of both sides and divide by
:
,
and by letting approach infinity, we see that , which implies
that , ' .
We prove the second inequality in almost the same way: again consider any rect-
angular prism . For any positive integer , can be
broken into copies of , and we can choose copies of inside each
so that the -distance between any pair of these is greater than
. This is illustrated in Figure 3.5.
(A note on figures: all figures in this paper are drawn with , but this is in some
sense without loss of generality; all of the ideas that we use can be illustrated via
85
Figure 3.5:
two-dimensional pictures. All constructions and descriptions are carried out in full
generality. )
By strong irreducibility, for any way of filling these copies of with words in
, the remainder can be filled to make a word in , ' . This
implies that , ' , ' . Take natural logarithms of
both sides and divide by :
,
and by letting approach infinity, we see that , or that
, ' .
86
To prove Theorem 3.1.22, we will be repeatedly using the concept of a measure of
maximal entropy.
Definition 3.2.2. A shift-invariant probability measure on a subshift X is called
a measure of maximal entropy if .
Such measures are said to have maximal entropy because of the following Variational
Principle:
Theorem 3.2.3. For any -subshift X, , where ranges over
all shift-invariant Borel probability measures on X. This supremum is achieved for
some .
See [M] for a proof. A useful observation is that since the extreme points of the
set of measures of maximal entropy are ergodic, and since the set of measures of
maximal entropy is nonempty for subshifts, any subshift also has an ergodic measure
of maximal entropy. (See [Wal] for details.) We need the following proposition:
Proposition 3.2.4. ( , p. 281, Proposition 1.20) Let be a measure of maximal
entropy for -shift of finite type . Then for any shape , the conditional
distribution of on given a fixed word is uniform over all words
such that the word rvith shape (& ) defined by and
is in .
We have the following convenient restatement:
Proposition 3.2.5. For any -shift X of finite type t, and for any measure of
maximal entropy on X, and for any word w , and for any shape
87
, . In other words, any trvo words with the same shape which
agree on have the same measure under .
Proof: Consider any two words , with . We claim that
. Consider By definition,
and there exists such that . Suppose that . Then,
since and since , by definition, , a contradiction. Therefore,
. Since , this implies that . Therefore,
. We define and . Since , and since
, . Therefore, if we de-
fine , then since either or could fill while fills
, we may take and use Proposition 3.2.4 to see that
.
Lemma 3.2.6. For any strongly irreducible shift X of finite type t with uniform
filling length R, and for any measure of maximal entropy on X, and for any word
u with shape S a subset of ,
.
Proof: Before we begin the proof, we recall that for any positive integer , is
the boundary of thickness of , and that is the boundary of thickness of
88
the complement of , i.e. the set of all points for which there exists with
.
Fix . We begin by bounding from below. Choose any word
. We claim that . Consider any
and . By definition, there exists such that
. Also by definition, . Therefore, by the triangle inequality,
, as claimed. This means that by strong irreducibility, there exists a
word such that and .
By Proposition 3.2.5, .
The inequality comes from the fact that the number of words in with shape
with the fixed word on is clearly less than or equal
to the number of words in with shape For
clarity, we rewrite:
, (3.3)
If we sum (3.3) over all possible choices for , then we get
(3.4)
Note that all are distinct, and for all , . Therefore,
,
89
and so . Since ,
. By combining these facts with (3.4), we see that
,
We now bound from above. Choose any word with .
We wish to use Proposition 3.2.5 with as our . To do so, we need to know that
. The second containment is trivial. To prove the
first, suppose that . By definition, there then exists
such that . Since , in particular , or . Therefore,
since , by definition. We now apply Proposition 3.2.5:
(3.5)
We claim that ; choose any and .
Clearly . If , then , a contradiction. Thus, . This
implies that for any , and for any , there exists with
and . Therefore,
. By using this fact and summing (3.5) over all possible , we get
. (3.6)
Note that since all are distinct, , and so
90
. Also note that since all are distinct, but all have
, it must be the case that all are distinct, and so
.
Combining these facts with (3.6), we see that
,
which, along with the lower bound on already achieved, completes the proof.
To avoid confusion, from now on we use to denote an ergodic measure of maximal
entropy on a subshift , and to denote a shift-invariant probability measure which
may or may not be of maximal entropy.
Lemma 3.2.7. For any -subshift , and for any shift-invariant ergodic probability
measure on , and for any finite set of words , , ,
and , if we denote by the set of words in which
have be rween and occurrences of for all ,
then .
Proof: (For brevity, we use the shorthand notation for the mea-
sure of the union of all cylinder sets corresponding to words in Fix
any . Firstly, we notice that it is a simple consequence of Birkhoff's ergodic
theorem that : for each ,
91
for almost every ,
where a represents the by shifts on . Convergence almost everywhere
implies convergence in measure, meaning that
({ : for } . (3.7)
Denote by the set whose measure is taken on the lefthand side of
(3.7). Then whether or not lies in depends only on its values
on , where is the minimum integer such that all are subsets of some copy
of . Therefore, is a union of cylinder sets of words in .
It is not hard to see that for large , any word whose cylinder set is a subset of
is an element of . Combining this with the fact
that (3.7) is true for any , we see that .
We now write out the formula for the measure-theoretic entropy of :
,
where for and . Since is a measure-theoretic
generator for the a-algebra of Borel sets in , we know that . (For
more information on measure-theoretic and topological generators, see now
partition into the two pieces , , , , , and , , , , , :
92
, '
, ' .
To estimate each summand, we use the easily checkable fact that for any set of
nonnegative reals , , . . ., ( whose sum is , the maximum value of
occurs when all terms are equal, and is In . Using this, we see that the above sum
is bounded from above by
.
Since, as , approaches 1 and approaches
, the second term approaches zero as . By replacing by
1 in the limit in the first term, we see that
In ,
which was exactly what needed to be shown.
Corollary 3.2.8. For any strongly irreducible -shift X of finite type t with uniform
filling length R, for any , and for any positive integers k and M, denote by
the set of words u in (X) with be rween
93
and occurrences of
(X) for every word with shape . Then,
.
Proof: Using a combination of Lemmas 3.2.6 and 3.2.7 for any fixed an ergodic
measure of maximal entropy on shows that
.
And by the definition of topological entropy,
.
3.3 A replacement theorem
We must prove an auxiliary theorem about replacements which is mostly combinato-
rial in nature, and which will be integral in the proof of the upper bound portion of
Theorem 3.1.22.
Theorem 3.3.1. For any strongly irreducible -Markov shift with uniform filling
length , there exists depending on such that for any choice of
with , there exists a subword of , where , and
so that and agree on the boundary, and with the property that
94
replacing an occurrence of by in any element of cannot possibly create a nerv
occurrence of .
Proof: First let us make sure that the statement of Theorem 3.3.1 is totally clear.
What this means is that we find , such that is a subword of ,
and agree on the boundary, and such that the following is true: for any
with for some a copy of , if one defines by replacing
by , then for any a copy of such that , it must be the case that
1
1
. We define or
, whichever is
odd. Then, , (X) is, by Lemma 3.2.1, at least . This is
greater than the number of words with shape which occur as subwords of , and
so therefore there exists a word , (X) which is not a subword of . This allows
us to make a definition:
Definition 3.3.2. Given a subword u of w, choose a subword of u of shape
which does not contain any portion of the boundary of u, and use strong irreducibil-
ity to replace this subword by any word of shape which has a as the subword
occupying its central . The word that u has become after this replacement is called
a standard replacement of u, and so is any word created in this way.
Choosing the to be replaced to be disjoint from the boundary of ensures that
any standard replacement of agrees on the boundary with . This means that since
is a Markov shift, any occurrence of in a point can be replaced by a
standard replacement , and the resulting element of is still in .
Definition 3.3.3. A subword u of w has Property A if for every standard
95
Figure 3.6: A standard replacement of
replacement of , replacing by in some could possibly create a nerv
occurrence of . In other words, for every standard replacement of , there exists
and , copies of such that , , and if we define
by replacing by , then .
Clearly, if we can find a subword of which is large enough and does not have
Property , we will be done.
The organization of this induction is a bit odd, so we give a quick overview. We will
construct a sequence of subwords of , where and each is a subword of
the previous. Each has shape a cube. We will be able to show that one of these
does not have Property , and that it is large enough in the sense of Theorem 3.3.1.
However, the construction of these depends on a fact about periodicity of subwords
of which have Property , which we must prove first. We will set up and prove
96
this periodicity fact for any even though has not yet been defined for .
This is not a problem though; this proof depends only on being a subword of ,
which will be clearly true once the definitions of are able to be made.
Let's assume that has Property A for some . For the purpose of the following
constructions, it is helpful to picture as fixed in space, so say that has shape
, i.e. the cube with size which lies entirely within the positive octant of
and has a vertex at (1, 1, , 1). To make sure to distinguish it from other copies of
, we call the shape that this fixed occurrence of occupies . We take a copy
of or (call it , and its size ), where the size
is chosen to have the same parity as , and which is central in , and partition
it into or disjoint copies of . Consider only
the interior copies of in this partition, i.e. the ones which are disjoint from the
boundary of . For large , there are more than of these, and to each one
we may associate a standard replacement of , and since has Property , also
a copy of which overlaps which could somehow be filled with at the same
time that is filled with the standard replacement in question. We use to denote
the set (possibly with repetitions) of these copies of .
Recall that each of these represents a possible new occurrence of created by per-
forming a standard replacement. Since any of the standard replacements that we
would perform is done by changing only letters in a particular in , it must
be the case that every element of contains some portion of its associated .
(Otherwise, the supposed "new" occurrence of would have already existed before
97
Figure 3.7: A sample element of
the replacement.) Also, since every standard replacement results in some occurrence
of in , and since is not a subword of , none of the elements in contains
all of its associated . In Figure 3.7, the element of shown, which we call
, is associated to the standard replacement given by changing the letters in the
highlighted .
We first make the claim that contains no repeated elements, i.e. that two distinct
copies of must be associated to distinct elements of .
In Figure 3.8, suppose that is associated to the standard replacements correspond-
ing to each of and , the two highlighted copies of . This means that there
98
Figure 3.8: An element of associated to two standard replacements
exists a word with and , where is a standard
replacement of made by changing only the letters in . The letters in must
be changed from their values in in order for the occurrence of in to be "new."
Therefore, , and . Similarly, there exists
a word with , , and
. Since and are disjoint, this implies that and
, and so that . But ,
and since we have a contradiction. Now we know that consists of more
than 2 distinct copies of .
99
Definition 3.3.4. A suboctant of is a subcube of which shares vertices with
both and , is co ntained in , and whose intersection with contains only the
shared vertex. (See Figure 3.9.)
Figure 3.9: The suboctants of
It is fairly clear upon observation that each element of must contain some suboc-
tant of , since each element of contains some portion of . Since has more
than distinct elements, there is ssoomm suboctant which is contained in more
than of the copies of in .
We now prove a fact about the periodicity of a set which is contained in two elements
of . Suppose that two distinct elements and of are given such that ,
which we denote by , is a multiple of for some . We make the additional
too
assumption that both S and contain some suboctant O of , and assume without
loss of generality that is the least suboctant lexicographically of , i.e. has the
point (1, , 1) as a vertex, and that is a negative multiple of . (This is without
loss of generality because one can make these assumptions simply by renaming and
reflecting several times, which does not affect our proof.) We will show
that certain portions of must be periodic based just on these hypotheses.
Figure 3.10: Elements , of whose difference is a multiple of
In Figure 3.10, is any point of such that is also in , is the suboctant
of opposite , is the vertex shared by and , is the vertex of opposite
101
, is the vertex of corresponding to in , and . and are
subcubes of and , respectively, which share vertices with and whose size is
, or the size of minus the size of . (The dotted line portions are copies
of to remind us of the definition of and , but are themselves irrelevant and
are not named.) We define to be , giving us two new points and .
Since , , they may each be filled with when is filled with the correct
standard replacement of . Call the copy of on which is altered to make
the standard replacement which allows to be filled with , and the copy of
on which is altered to make the standard replacement which allows to be
filled with . (A quick note: in several figures in this paper, we represent a point
by the vector which points from 0 to )
We first claim that if , , then , . Suppose that
, . Since contains some portion of , but does not contain all of , the
same can be said of ; contains some portion of , but not all of . Therefore,
some coordinate of is between and . (We always use the word
"between" in the inclusive sense.) The fact that has nonempty intersection with
also implies that all coordinates of are between and , since all
coordinates of all elements of are at lleuausutt . Since , every coordinate
of is between and . This implies that every coordinate of is nonpositive
and at least , and also that one coordinate of is at least . Then,
is in , and has one coordinate at least , and therefore does not
lie in , and so is not in or . Since , the same argument shows that
102
is in , but does not lie in or . Clearly, since and are disjoint from
the boundary of by their construction, , are not in or either.
Suppose that , , , . Then by noting that can be filled with
when is filled with a standard replacement of which agrees with outside
of , we can infer that (this is because corresponds
to ), and by noting that can be filled with when is filled with
a standard replacement of which agrees with outside of , we infer that
and . (This is because corresponds to
, and corresponds to ) This implies that .
This conclusion was dependent only on the fact that , , , .
We have already shown that this is true for any , , and so is periodic
with respect to .
Note that in the course of this argument, we have also shown that .
We claim that any pair of points in which are separated by are and
for some choice of , . To do this, we just determine from :
if , then every coordinate of is between and . As already
mentioned, all coordinates of are between and , and one coordinate
is between and . This implies that has all coordinates
between 1 and , and so is in , and that one coordinate is between 1 and ,
which implies that either does not lie in or lies in the boundary of , and, in
either case does not lie in or . Since , the same argument shows that
is in , but does not lie in or . We have then shown that and could
103
be any pair of points in separated by , and so the fact that in
the above argument shows that is also periodic with respect to .
We will use this fact momentarily; for now let's recall that we earlier showed that
some suboctant of is contained in more than distinct elements of . As
before, denote by the vertex of shared by and , and by the vertex of
opposite . It is clear upon observation that for any element of which contains
, the vertex of which corresponds to in must be contained in . Fix any
in . can be partitioned into sets where ranges over all
points in with , and by tthllee above comments, there are more than
distinct points in which are vertices of elements of which contain . By the
pigeonhole principle, this implies that one of the sets contains more
than of these vertices. Again by the pigeonhole principle, this implies that there
are two elements of , call them and , which contain and where is a
multiple of whose length is less than . We make the notation .
This in turn implies that is periodic with respect to on the regions above
described as and , and since these regions are dependent on and , we call
them and . Since this can be done for all , and are
periodic with respect to , , . . . , where for each , is a multiple
of with length less than . This implies that their lengths are less than as
well, since .
Definition 3.3.5. A word w is purely periodic if for some positive integers , ..., ,
it is periodic with respect to for .
104
We have then shown that if has Property , then there are regions and
as described above so that and are purely periodic with all periods
less than . In particular, the preceding is true for and . For
each 1 , we can then choose a multiple of which is a period for
and and which has length less than . We now choose a subword
of . For sufficiently large , we take to be the subword of with shape
and which still has , the vertex shared by and
, as a vertex. The purpose of this is to cause the forced purely periodic portion
of to contain a purely periodic central cube within . We will choose subsequent
to be purely periodic on a central cube as well, which we will denote by , and
so will always be a purely periodic subword of . In fact, for each , we will
choose so that is a subword of , and so each will be
periodic with respect to for . Due to the construction of , we can
take to have shape in . We also move in space to lie
entirely within the positive octant of with one vertex at (1, , 1) (in other words,
, so that the phrasing of the earlier arguments still works.
This choice of was done to create : from now on we will follow a different
inductive procedure. We need one more definition:
Definition 3.3.6. A superoctant of any (j 1) is a subcube which shares
vertices with both and , is contained in , and contains .
There is then a natural one-to-one correspondence between the superoctants and
suboctants of ; for every superoctant, there is exactly one suboctant which is a
105
subset of it. For each where has Property , we now describe how to
construct . For each such , we define to be the set of superoctants of
such that is periodic with respect to for . We denote the size of
by . Since is periodic with respect to each , and since
for a superoctant of , we have then already shown that contains , and
. We assume that for all , and
in fact this will be clear once the construction is finished.
Let's now suppose that for some , has Property A. The idea is that we take
to be a suitable subword of where is smaller than , but is strictly
larger than . To show this, we have to prove a couple of claims. Firstly, we claim
that any superoctant of on which is purely periodic with periods less than
and which contains must be periodic with respect to each , i.e. must be in .
Since , clearly for all .
We now claim that if contains any pair of opposite superoctants and of ,
then neither of the suboctants and associated to them can possibly be contained
in any element of .
We again assume without loss of generality that is the least suboctant lexicograph-
ically of . In Figure 3.11, (the portion of shaded) is purely periodic
with periods , and is the which is changed to make the standard replace-
ment corresponding to . We assume that and derive a contradiction.
We have denoted by the region , i.e. in corresponds to
in S. because , and for some , it is the case that
106
Figure 3.11: An element of which contains
consists of points whose coordinate is between and . We then
choose which is a multiple of such that is a subset of and
is a subset of which is disjoint from (this can be done since has length less
than ; some multiple of it is greater than , but still less than ) and take
and &|"1 . It must be the case that filling with a standard
replacement for in which no letters of are changed and at least some letters
in are changed can be done simultaneously with filling with . Call the
word with shape with this property . We can restate our assumptions then
107
by saying , , and . Since
in corresponds to in , and since , it must be the case that
. By periodicity of with respect to , .
Since in corresponds to in , and since , . Since
and and are disjoint, . Finally, by period-
icity of with respect to , . But we have then shown that
, a contradiction. Therefore, if two opposite superoctants are
contained in , then neither of the suboctants associated to them may be contained
in any element of .
'
'
Figure 3.12:
Finally, we may make our inductive step. Consider any and with Property
A. By the earlier argument, we may then conclude that there are subcubes and
108
as defined above so that and are purely periodic with periods at most
. However, in the proof of this, we made use of the fact that the suboctant of
corresponding to is contained in many elements of . Therefore, by the fact
just shown, one of or corresponds to a superoctant of which is not in .
First, we take a subword of , call it , obtained by removing the boundary of
thickness from . In other words, we take to be the subword whose shape is
the central of , which we call .
'
Figure 3.13:
and are now subwords of occupying suboctants of . By construc-
tion, and are also purely periodic with periods at most . However,
109
one of them is associated with a superoctant which is not in , and we may as-
sume that without loss of generality it is . Also, since the size of is at
least 10(2R+l) , and since the size of is at most , the overlap
between and is a cube whose size is at least . Denote this cube by .
is defined to be a subcube of which shares the same vertex with that
shares with , with the correct size so that the overlap between and is
central. In doing this, we reduce the size of by at most . is defined to be
. Then, since is a subword of , it is also purely periodic
with respect to each , and by construction, is central in , so indeed
has the necessary properties. Each superoctant in corresponds to a superoctant
in , and now occupies a superoctant of , call it , on which
is purely periodic with periods at most . We claim that this means that
as well. For each , choose a period of which is a multiple
of whose length is less than . For every , as already mentioned,
is periodic with respect to for . Now, consider any such that
for some fixed . Since the size of is much greater
than , there exists some linear combination with , call it , such
that and . Then, since is
a sum of periods for . since 1s
periodic with respect to Finally, since is
a sum of periods for , and therefore is itself a period of . Then, by
definition, is periodic with respect to as well, and since was arbitrary,
with respect to every for . By definition, this shows that .
110
Since corresponds to a superoctant of that is not in , contains at least
one more element than , and . We again move in space so that it
lies entirely within the positive octant of and has one vertex at (1, , 1). Since
, this implies that for .
We will show that this induction never need proceed beyond , and so in the
process will justify the claim already used that for all that
we consider.
If each for has Property , this means that must have more
than elements. But, there are only superoctants of , and so we have a
contradiction. This implies that some must not have Property , which we call
. is created by at most truncations of , each of which reduces the size by at
most , and so since for large enough , the size of is at least
n-1O . Since for large , for some constant
, we can then say that there exists for which In for
and some constant . Then, take so that A for . This
implies that for , . Since does not have Property , there
exists a standard replacement of , call it , which agrees on the boundary with
by definition of standard replacement, and with the property that replacing by
in some cannot possibly create a new occurrence of . Thus, Theorem 3.3.1
is proved.
nt
We can prove as a corollary a version of Theorem 3.3.1 for any strongly irreducible
shift of finite type:
Corollary 3.3.7. For any strongly irreducible -shift of finite type with uniform
filling length , there exists depending on such that for any choice of
with , there exists a subword of , with shape , where , and
so that and agree on the boundary of thickness , and with the
property that replacing an occurrence of by in an element of cannot possibly
create a nerv occurrence of .
Proof: Fix a strongly irreducible -shift of finite type and uniform filling length
. Take the alphabet whose letters are the elements of , i.e. the words
in the language of whose shape is , and then define a map from to
where for any , is the subword of which occupies , i.e. the
copy of whose least vertex lexicographically is . Then, is a Markov shift
with alphabet . The reader can check that is still strongly irreducible, but
with a uniform filling length of . In short, this is because two shapes which
are a distance of apart in an element of correspond to shapes a distance of
apart in for C.
Since is a strongly irreducible Markov shift, by Theorem 3.3.1 there exists
such that for any with , there exist a subword of
, with , and with the same shape as and which agrees on the
boundary with such that replacing with in cannot result in the
creation of a new occurrence of .
112
Now, take any word with . Although is defined
as a map from one subshift to another, it should be fairly clear that as defined,
it can also function as a map from to . So we can define
. For ease of notation, we define , so .
Then, by Theorem 3.3.1, we can find a subword of , with
, and which agrees with on the boundary so that replacing
with in any element of cannot create a new occurrence of . and
are then subwords of with shape which agree on the boundary
of thickness . We wish to show that replacing by in some
cannot possibly result in a new occurrence of . Suppose that this is not the
case. Then there is and two copies of , call them and , such that
, , and if we define by replacing by ,
then . If we apply to all of the objects in the above description,
we arrive at a contradiction to the definition of and . It is therefore the case
that replacing with in any cannot create a new occurrrence of
. Since is a subword of , is a subword of . ,
and the size of the shape of is , and so we
are done.
Corollary 3.3.7 is the main tool in the proof of Theorem 3.1.22; very roughly, the
idea is that for each occurrences of in some large word in , we destroy this
occurrence of by replacing the occurrence of which is a subword of it by .
113
The requirement that new occurrences of the destroyed words not appear during this
process is to ensure that the process of getting rid of these occurrences of takes
as few steps as possible. It is natural to wonder why we go to the trouble of dealing
with subwords in the statement of Corollary 3.3.7; i.e., why is it not possible, instead
of dealing with the intermediate step of taking a subword of , to just find a word
such that replacing by can never result in a new occurrence of being
created? The answer is that there are examples of strongly irreducible shifts of finite
type and words for which this is impossible! Here is a quick example.
Consider, for any dimension and any , the full shift , and the word
defined by if , and if
for and for some . then has zeroes at every point of
except the least lexicographically. We claim that for any , there is some
such that , and replacing by creates a new occurrence of
. Consider any . Suppose that contains a one, i.e. there is some
such that . Since , we can assume that . Consider
defined by taking 1, 1, , and for all other . Then,
. Create a new by replacing by . Then, it is not hard to
check that , and that is the
word consisting of all zeroes, and so not equal to . This means that the replacement
involved in changing to created a new occurrence of . The only other possibility
for is that is the word consisting of all zeroes, i.e. for all . If
this is the case, then define by taking 0, 1, , and
for all other . Again, . Create by replacing
114
by . Then, it is not hard to check that , and that had two
ones, and is thus not equal to . This means that in this case also, the replacement
involved in changing to created a new occurrence of . We have then shown
that for any , it is possible that a replacement of by could create a new
occurrence of , and therefore the extra step of taking subwords in Theorem 3.3.1
and Corollary 3.3.7 is in fact necessary.
3.4 The proof of the main result
Proof of Theorem 3.1.22: We take to be a strongly irreducible shift of finite
type . For any word in , we call the shift of finite type resulting
from removing from the language of . We begin by proving an upper bound
on for sufficiently large . By Corollary 3.3.7, as long as is
sufficiently large, there exists a subword of with and
with the same shape as and which agrees with on with the property
that replacing by cannot create new occurrences of .
For any , we will define a mapping : . will
actually be defined as a composition of three maps: ( : ,
: (, and : ( . To
define these we need a definition:
Definition 3.4.1. Trvo overlapping occurrences of the word which occur in copies
S and of are said to have overlap of type B if |S .
115
We point out an important property of this definition: given a fixed occurrence of ,
there are at most copies of which could be filled with an overlapping , and
so there are at most words which are made up of a pair of whose overlap is of
type B. Call these words , , . . . , , where , and call the shape of for
any . Given any , we define ( by finding each occurrence
of any of the within , and in each one, replacing the first lexicographically of
the two occurrences of which make up by . In order to make this operation
well-defined, we must specify an order for these replacements to be done in; let us
just choose the usual lexicographic order, and search for and replace each in turn.
(This means that we first replace the lexicographically first occurrence of , then
find the new lexicographically first occurrence of and replace it, and continue until
no remain. We then perform the same procedure for , , etc.) Since replacing
by can never create a new occurrence of , the resulting -word, which
we call (, contains no , i.e. contains no pair of occurrences of with overlap
of type B.
For any (, we define ( as follows: find the first
occurrence of lexicographically in (, and replace the which is a subword of
this occurrence of by . Denote by the word that becomes when its subword
is replaced by . Repeat this procedure until there are no occurrences of left,
and call the resulting word ( . There is one fact that we must check;
namely that when performing any of these replacements, we do not accidentally create
a new occurrence of . For a contradiction, assume that a particular replacement of
by could create a new occurrence of .
116
Figure 3.14: Intersecting occurrences of
In Figure 3.14, is a copy of which is filled with , is a copy of which is
filled with , is the on which letters are changed to change the word on
from to , is a copy of which will be filled with once the replacement
is done, is the copy of in corresponding to in , which will therefore be
filled with once the replacement is done, and is the copy of or
central in in which must lie, which comes from the proof
of Theorem 3.3.1. For the same reasons as in the proof of Theorem 3.3.1, cannot
contain all of , but must contain some nonempty subset of . Since the replacement
cannot possibly create a new occurrence of , must have already been filled with
117
before the replacement occurred. This implies that before the replacement,
and were filled with overlapping occurrences of . We claim that these two
occurrences had overlap of type . Since , and since the size of is at most
, the distance in the -direction between the center of and the
center of is less than for each . is central in , so
the center of is the same as the center of . Since is a subcube of whose
size is at most shorter the distance in the -direction between the center of
and the center of is less than for each . For the same reason, the
distance in the -direction between the center of and the center of is less than
for each . Finally, since intersects the boundary of , and since
has size , there exists for which the distance in the -direction
between the center of and the center of&is between and .
Putting all of these facts together, we see that there exists so that the
distance in the -direction between the center of and the center of is between
and , which
implies that for large enough , it is between and , which
implies that before the replacement, and were indeed filled with occurrences
of with overlap of type B. However, ( contained no pair of occurrences of
with overlap of type , and so since all replacements in the definition of involve
replacing some with , no new occurrences of can be created during these
replacements. Therefore, there can never be a pair of occurrences of with overlap
of type during the application of to a member of (. We therefore have
a contradiction. Our original assumption was wrong, and none of these replacements
ns
can create a new occurrence of . So, when these replacements are finished, we have
a word with shape , which we called ( , which has no occurrences of
.
The definition of is much simpler: for any , is the subword of
occupying its central . For any , we define .
We now claim that is in , i.e. can be extended to a configuration on
which is in and contains no occurrences of . To do this, we inductively define
a sequence of words , where and each has no occurrences of
. Take and . For any for , define as
follows: since , it can be extended to an infinite configuration in . Use
this fact to create a word which has as the word occupying
its central . We then create by performing the same replacements as
in the definitions of ( and then , in other words first finding and replacing any
occurrences of for any , and then finding and replacing any occurrences
of . For the same reasons as above, has no occurrences of . Since agrees
on the boundary of thickness with , , completing the inductive step
and allowing us to define for all . We also note that for any , since contained
no occurrences of or for any , any occurrences of these words in
must have had nonempty overlap with the newly created portion, i.e. must have been
contained in the boundary of thickness of . As a result, and
must agree on their respective central , since no replacements could have
affected that portion. This means that we may define the "limit" of the : for each
119
, since , there exists with . Since and agree
on their respective central , the limit of the exists, call it . Since
is a subshift, it is closed and so . Since the central of each is equal to
, is a subword of . Finally, since each has no occurrences of and
is a limit of the , has no occurrences of . This means that as
claimed.
We have now defined on all of . For any , we can then deal
with the restriction of to . Since the shape of is , any
element of has fewer than occurrences of ,
and since each has shape a subset of , any element of has fewer
than occurrences of for each . By Lemma 3.2.1,
We now bound in a similar fashion. First
let's look at a picture of for any . Recall that is the union of
two overlapping copies of , so is the union of two overlapping copies of
concentric with the original copies of . By the definition of type overlap,
there exists for which the distance in the -direction between the centers
of these two is between and .
In Figure 3.15, we denote the directio in question by , the two copies of
by and , and the distance in the -direction between the centers of and
by . We then define to be a rectangular prism which is a subset of a distance
of away from . The sizes of are in every direction but , and
in the direction
120
Figure 3.15:
Clearly by just taking very large we can assume that this dimension is greater
than . Since and , , by strong irreducibil-
ity . , , and
are all small in relation to for large enough , and asympotically ap-
proaches as , so for large , this bound is greater than Using
these bounds, we can rewrite the above statement about a little more
easily: every element of has fewer than
occurrences of and fewer than occurrences of any
for .
121
We now prove the upper bound on . We do this by proving
upper bounds on for any , on
( for any , and on for any . The
first upper bound is fairly straightforward: consider any . Since
, the total number of letters in which are part of an occurrence of any
is at most , which for large is less than
. Now, consider any . By
the previous reasoning, any differs from on less than
letters, and so the number of possible is at most
for large values of .
The second upper bound, the one on for any ,
is slightly more difficult. Consider any . Since is in the image
of ( has no occurrences of any . Consider any pair of overlapping occurrences of
in , say at and copies of . Then these occurrences of contain occurrences
of as subwords, say at and copies of with , , and
. Also define to be the copy of in which would be changed to change
to , and to be the corresponding copy of in . Since there are no
122
occurrences of , it must be the case that |T . This implies
that contains the central of for large , and so that contains .
Similarly, contains . Therefore, if is replaced by sometime during the
replacements defining , then will be changed to something other than , and if
is replaced by at some point in these replacements, then will be changed
to something other than . In other words, in any replacement involved in changing
to , if is changed from to , then all occurrences of with nonempty
overlap with are also changed. Since new occurrences of cannot be created during
these replacements, this implies that for any such . Note that we
have also shown that the copies of where replacements occur in the application of
must be disjoint. We now wish to bound from above the number of occurrences
of in for any . By the definition of , for
any , has at most occurrences of
. We now claim that each of the replacements involved in ( and could create at
most new occurrences of , which rests on an aperiodicity property of : namely,
for large , and for any with for and copies of ,
.
To show this, recall that in the definition of , we ensure that contains a sub-
word , (X) which is not a subword of . In addition, lies in the central
in . Also recall that is asubword of whose shape has size at
least . Together, these imply that for large , contains as a subword of its
central . Since and agree outside this central , and since is not a subword
of , this implies that any occurrences of in have nonempty intersection with its
123
central , and that there is at least one such occurrence of . Now consider
such that , where . Also consider a copy of such that
. There are two possibilities: either lies inside or does not lie inside
. If lies inside , then since , must have nonempty intersection with
the central of . Again since , must have nonempty intersection with
the central of as well. This implies that . However,
then is a period of . Since for large
, there is some positive integer so that lies in , but has empty
intersection with the empty central of . But by periodicity of with respect to
, , which gives a contradiction. Therefore, it must
be the case that does not lie inside . Since has nonempty intersection with the
central of , this implies that for large .
Now consider any replacement of by in the replacements defining ( and ,
and suppose that this replacement occurs at a copy of . Specifically, suppose
that , where , , and and agree outside of .
Also assume that a copy of is a location for an occurrence of newly created
in this replacement, i.e. , . Since and agree outside , and
must have nonempty intersection, meaning that any possible location for given
a fixed lies in a copy of concentric with . However, by the aperiodicity
fact about above, any , both of which contain newly created occurrences of
must satisfy . This means that at each replacement, there are no
more than newly created occurrences of . We now bound from above the total
number of replacements performed in applying ( to . As already
124
shown, for large the total number of occurrences of any in is
smaller than , and so the total number of occurrences
of created during these replacements is less than
, which is less than for large . We now
bound from above the total number of replacements performed in applying to
(. For this, we need to bound from above the total number of occurrences
of in ( itself had at most occurrences of
by definition of . While changing to (, less than
replacements are made. Each one of these could create at
most new occurrences of by the same reasoning we used for (we
just don't have any aperiodicity facts about to use.) Therefore, ( has less than
occurrences
of , which is less than for large . This
means that the number of replacements involved in changing ( to
is less than as well. Since each of these can
create at most new occurrences of , the number of occurrences of created
during these replacements is less than .
So, the total number of occurrences of created in the process of changing to
( is less than .
Since itself contained at most occurrences of
by definition of , this means that ( contains less than
occurrences of .
We now collect the three facts we need to bound from
125
above for any . Firstly, for any , we know that for any
a copy of which is the location of a replacement during the process of changing
( to . In other words, once is changed to
during the application of to (, that occurrence of will not be changed during
future replacements. Secondly, all replacements perfromed in the application of
to ( are disjoint. This means that knowing ( , along with knowing the
locations of all copies of where replacements occur in changing ( to ,
is enough to uniquely determine (. Finally, ( contains fewer than
occurrences of . This implies that for any
, , the
total number of subsets of the locations of occurrences of in which could have
been the locations of replacements in the application of .
Finally, we give an upper bound on for any , which is straight-
forward: any must have its central filled with . This means that
for large .
Putting all of these upper bounds together, we see that
. .
for constants independent of and and independent of . This implies that
.
126
We then take natural logarithms of both sides, divide by , and let
to get
,
and by using the definition of entropy and Corollary 3.2.8, we are left with
.
was arbitrary, so we may let it approach zero, leaving
.
A different tactic must be used to prove a lower bound for . We
in a sense proceed in the opposite way from our upper bound: we will define a
map which sends any word in to a subset of such that for any
, and are disjoint. This map will create occurrences of
instead of eliminating them.
We first need another auxiliary combinatorial theorem, again about a sort of lack of
periodicity.
Theorem 3.4.2. For any and any strongly irreducible -shift of finite
type with uniform filling length , there exists a constant dependent on such
that for all suffiffifficiently large , there is a word with the following
aperiodicity condition: there cannot exist trvo overlapping occurrences of such
that one has nonempty intersection with the empty central of the other.
127
Figure 3.16: Disallowed and allowed pairs of overlapping
Proof: To make the statement of Theorem 3.4.2 more clear, we refer to Figure 3.16,
which gives an example of a pair of overlapping occurrences of which is ruled out
by the given aperiodicity condition (left) and a pair which is not. (right)
We begin by giving a concrete example, for any fixed and , of an aperiodic
word in . For any , we define if any for
is not equal to 1 or . This leaves only where each of , , . . . , are 1 or
to be defined. We think of this undefined portion as a set of one-dimensional j-
letter words: for every , we think of the yet-to-be-defined
, , ,
as the letters of a -letter word. Now, to fill these portions in,
we define -letter words, which we will call , , . . . , , such that no two
may overlap each other, i.e. if the final letters of are equal to the initial letters
128
of for some , then and . We do this in the following way: the first
four letters of any are defined to be 0001, and the final four letters are defined to
be Ottt. The letters following the initial 0001 in any are defined as follows:
concatenate d-l th -letter words, determined by -digit binary expansion:
each 0 in the binary expansion corresponds to the word 001, and each 1 in the binary
expansion corresponds to Of 1. The remaining letters of which precede
the final Ottt are alternating zeroes and ones, beginning with a zero.
An example should be helpful: suppose that and . Then we create
four 18-letter words , , , and . The initial four letters of are 0001. The
next letters are dependent on the two-digit binary expansion of the
subscript 0: since it is 00, the next six letters are 001001. The next
letters are alternating zeroes and ones beginning with a zero, i.e. 0101, and the
final four letters are Otll. This gives . Using similar
reasoning, we see that , , and
000101101101010111.
We claim that this set of words has the nonoverlapping property described earlier.
Suppose that for some and , , the initial letters of are the
same as the final letters of . cannot equal 1 or 2, because the first letters
of would then be 0 or 00, and does not end with either of those words. So,
. Then the initial letters of begin with 000, and since the only place that
000 occurs in is at the beginning, this implies that , and since
contains distinct words, that .
129
We now, for every choice of , choose an , and define
for . Assigning the to the choices
for in a one-to-one way completes the definition of . We now make
the claim that is aperiodic. Suppose not; then there exists
with for such that for all such that
. Since could serve as a period as well, we may assume without loss of
generality that . We choose a vertex of based on : for each ,
if , . If , . In this way, we ensure that , and
therefore that . Since for and ,
by construction . Therefore, . However, again due to
construction, the only zeroes in lie at points whose first -1 coordinates are
either 1 or . This implies that the first coordinates of are either 1 or .
Let's denote by the -letter word where for ,
and by the -letter word where . Due to the
supposed periodicity of , the first letters of are the same as the final
letters of . Since no two distinct words in may overlap, and
. This implies that the first -1 coordinates of are the same as the first
Z-t coordinates of , and therefore that the first Z-t coordinates of are zero,
implying that , a contradiction. Thus, is aperiodic.
We now use as a tool to create, for any , a word with
the aperiodicity property outlined in the conclusion of Theorem 3.4.2: there cannot
exist two overlapping occurrences of such that one has nonempty intersection
with the empty central of the other. is defined to be 0 if or 2,
130
and defined to be 1 if . If , then .
Suppose that is periodic with respect to . Since our supposed overlap is of
the form described it must be the case that for . We can again
assume without loss of generality that . Choose a point of as follows:
for each , take if , and - 1 if . Choose
if , and if . Finally, take .
For any all of whose coordinates are zero or one the first Z-t coordinates of
are between 1 and , and the dth coordinate is 1 or 2, implying that .
This means that a copy of with its least vertex lexicographically at is a subset of
, call it . Since is composed entirely of points whose dth coordinate is 1 or 2,
is filled with zeroes. Again, for any point all of whose coordinates are zero or
one, the first coordinates of are between 1 and , the Z-dth coordinate
is 1, 2, , or , and the dth coordinate is between 1 and . Therefore, any such
is in , rmplying that a copy of with its least vertex lexicograhically at
is a subset of as well, which is then . By periodicity, must
be filled with zeroes as well. However, by construction, for any copy of which is
filled with zeroes in , the dth coordinate of its least vertex lexicographically is 1.
Since , it must be the case that . This implies that is periodic
with period , which implies that for as well, and
so , a contradiction. Therefore, has the desired aperiodicity property. Note,
however that this does not prove Theorem 3.4.2, since is a word in the full shift,
and there is no reason for it to be in .
We now turn to our shift of finite type . First, we will be constructing a word
131
for which is nonempty. Obviously, finding a word such that
is nonempty could be done by using the already proven upper bound for
, but this may require to have shape with very large size. We need the
shape of to have a small size to prove as tight a lower bound as possible, and for
our proof, we need a lower bound on for relatively small . For large ,
Lemma 3.2.1 gives an exponential lower bound for , but doesn't really tell us
anything about small . For this reason, we prove the following lemma:
Lemma 3.4.3. For any -subshift X with , and finite S ,
|S|.
Proof: We write where comes before in the usual lex-
icographical order for , and make the notation for all
. Suppose that . Then, since , it must
be the case that for some , . Since , this
means that for every word , there is a unique way to extend it to a word
in . In other words, for any , given , is forced. By shift-
invariance of , for any , given , is forced. Since is finite,
take . Then, consists of elements of within a d-distance
of less than from 0 and lexicographically less than 0. For this reason, we make
the notation { - n) : is less than 0 lexicographically}, where
. It is then clear that , and so we note that
forces . We need a few more pieces of notation; ,
{ : for , }, and
132
. We quickly note a few useful facts about these sets which can be checked by
the reader: , and . a
now claim that if forces for all , then forces for
any . We prove this claim by induction on .
For , the claim is fairly easy to check (and is in fact a classical theorem due
to Hedlund and Morse ([HeM]) ; it amounts to showing that if any consecutive
letters of any force the next, then any consecutive letters of force the next for
any . But this is clear; if , , . . . , force , then , . . . ,
force , and we can proceed like this indefinitely. Thus, the claim is proven
for . Now suppose that it is true for a fixed , and we will prove it for .
This proof will also be by induction; since we wish to show that forces
, and since , it suffices to show that
forces for every
. Fix any such , and suppose that
is given. We will show that is forced. Consider any
. We claim that
. Since , it suffices to
show that for
. Since
for , and
for , and since for
, it suffices to show that ( - n) for
. Consider any and any . By definition of
133
, , and for . By choice of ,
for . Therefore, for , and
. There are then two cases; if
any coordinate of is nonpositive, then , and if all coordinates are
positive, then by definition, . So, indeed
n) , and so for any ,
. This means that since ,
for any , forces . But now since
, by the inductive hypothesis 1s
in fact forced by . Since was arbitrary here, as
described above this shows that forces as claimed, and so by
induction we see that for any , if forces for all , then
forces for any .
We finally return to our original example of for which . Recall that
we showed that there is then some such that forces for all . By
the claim just shown, then forces for all . This then shows that
for any . Note that for all . Therefore,
for all . Since
for large and a constant independent
of , we see that for all , and so by definition of entropy,
. Therefore, , a
contradiction to the hypotheses. Our initial assumption was therefore wrong, and so
for all finite shapes .
134
Before moving on with our proof, we must comment on Lemma 3.4.3. In the course
of the proof, we show that if for some total order on which is preserved under
addition, some finite set , and some with for all , forces
, then the topological entropy of is zero. It is natural to wonder whether the
assumption about ordering is necessary. In fact it is:
Lemma 3.4.4. For any finite and such that there is no addition-
preserved total order on with respect to which is greater than all elements of ,
there exists a subshift such that forces for all , but .
Proof: Suppose that for some finite and , it is the case that for no
addition-preserved total order on is greater than all elements of . We claim
that this implies that lies in the convex hull of . To see this, suppose that does
not lie in the convex hull of . Then clearly there is a -dimensional hyperplane
of such that and lie on opposite sides of . Define the subspace of
which equals for any . Without loss of generality, we may assume that
, since such an assumption requires only a small rotation of . Then,
define an addition-preserved total order on by arbitrarily fixing one of the open
half-spaces that splits into and defining if . Then,
since and are on opposite sides of , for all , is on the same side of ,
and so either for all or for all . By reversing if necessary, we
may assume without loss of generality that is greater than all elements of under
. This is a contradiction, and so lies in the convex hull of .
135
To make things easier, by shift-invariance, we can without loss of generality assume
that . We will, for any such that 0 lies in the convex hull of ,
construct a subshift with positive topological entropy such that forces
for any . By decomposing the convex hull of into simplices, we can, for some
, find a linearly independent subset of of cardinality such that
the convex hull of contains 0 and is contained in a -dimensional subspace of ,
call it . Denote the elements of by , , , . Then there exist positive
reals , . . . , such that . Since all elements of are in , we can
take all to be rational, and so without loss of generality integers. This implies that
. Take for all . Take the d'-dimensional
parallelepiped contained in . One may then tessellate
with copies of . (Intersecting with on both sides
leads to a similar tessellation of by copies of ) If ' then choose
linearly independent , , in such that . We
will define , a subshift with alphabet , such that forces
for any , which clearly implies that forces for any as well.
Choose any Q. We define as follows: for any and ,
define . Due to the already mentioned
tessellation of by copies of and tessellation of by copies of ,
this defines on all of . Take to be the set of all elements of constructed in
this way. is not necessarily shift-invariant, but by the construction, it is invariant
under shifts by for and for . Since the collection
is linearly independent, this means that a finite
136
union of shifts of , call it X, is a shift-invariant set. It is also clear that X is closed,
and thus it is a subshift.
We first show that forces for any . By the definition of , it is
sufficient to show that forces for any and . Since ,
there exists some and such that .
Since , for some positive reals . For any , is
in unless the coefficient of is greater than or equal to . Therefore,
if for all 1 , then it must be the case that for
all . But then since as noted earlier, and since
for all , , with all coefficients
in . This implies that . We then showed that if for all
, then . In other words, there is some such
that . Fix this . Then, .
Say that for some . Then by definition of , since
and lie in the same copy of , it must be the case that . This
means that forced Since , this shows that
forces for any , and so that forces for any .
It remains to show that . Choose , for ,
. Then,
, for any . By the definition of , there are at least
ways to fill letters of on , and so we
see that This means that
137
for all , and therefore, by letting approach infinity, we see that
.
We claim that for a strongly irreducible shift of finite type , .
We define and
. Then , and so by strong irreducibility
, which is greater than by Lemma 3.4.3.
Since ,
4 .
Therefore, . Consider any . Then, since there
are copies of contained in , there are at most different
words in which are subwords of . Since , this means
that there is such that is not a subword of . Then, again using strong
irreducibility, we create such that for any , .
(See Figure 3.17.) Then, for any a copy of , contains a , and therefore
. Thus, is not a subword of , and so contains at least one point and
1s nonempty.
138
Figure 3.17: A point
We will use and to create, for , and for any , a word
such that there cannot exist two overlapping occurrences of
where one has nonempty intersection with the empty central of the other.
To do this, we first partition into disjoint copies of . The disjoint copies of
then have an obvious bijective correspondence to tthllee points of , illustrated in
Figure 3.18.
We then use each entry of to assign entries of to the corresponding in
For , if , then the least copy lexicographically of in
the corresponding to is filled with any word in , i.e. a word without
any occurrences of . If , then the least copy lexicographically of in
the corresponding to has occurrences of placed inside it, each one sharing
a vertex with it.
139
The remainder of I is then filled to make a word in by using strong irre-
ducibility, since all of the filled portions are a distance of at least from each
other. We claim that this word has the desired aperiodicity con-
dition. Suppose not; then there exist two overlapping occurrences of such that
one has nonempty overlap with the empty central of the other, which implies
that is periodic with respect to some with for .
We then define by defining to be the closest multiple of to for .
If two are equally close, choose either. In this way, we ensure that for
each . We make the notation . Since each coordinate of is
divisible by , has integer coordinates.
Figure 3.18: The correspondence between copies of and points in
140
Figure 3.19: How a copy of is filled if
Since for all , every coordinate of has absolute value at most
. This implies that either or is the difference between two overlapping
occurrences of , one of which has nonempty intersection with the empty central
of the other. Assume that the latter is the case. Then, by the already proven
aperiodicity condition on , this implies that there exist , such that
. Without loss of generality, we assume that and
.
Let's call the central in the in corresponding to and the
central in the in corresponding ttoo . Then , and
is periodic with respect to , .
. Since every coordinate of is at most , ,
and hence , is nonempty. In fact, it is a rectangular prism where
. Therefore, contains one of
141
the copies of in which share a vertex with . Since , and since
is the central in the corresponding to , every one of these subcubes
of contains an occurrence of in . Therefore, has as a
subword. However, since , is a subword of .
Since , and since is the central of the corresponding to
, contains no occurrences of . Therefore, contains
no occurrences of either. Since , we have a
contradiction.
The only remaining case is when , i.e. for . We then simply
take an integer multiple of such that some coordinate of is greater than ,
but at most . Then, is also a period of , and we can repeat the argument
just made for the same contradiction.
Figure 3.20:
142
We now return to our proof of the lower bound on . We suppose
that our word which we are removing from has .
Take to be the smallest integer such that (0-4)W . Then, (0-4)W
, and so . We also define . By the definition of ,
. Take to be any ergodic measure of maximal
entropy on . Since
,
there must exist some word , such that . We note that
since , , , (X). By Lemma 3.2.1, ,
Therefore,
This means that by using Lemma 3.2.7, we see that if, for any , we denote
by the set of words in which have at least
occurrences of , then .
In Figure 3.20, we show a word constructed as follows: the copy of
is filled with as constructed in Theorem 3.4.2, and the central is filled
with . The remaining shaded portion is filled using strong irreducibility to create a
word .
We may now finally describe our map . Consider any . By definition,
has at least occurrences of . We choose a set of disjoint
occurrences of using a simple algorithm: choose any occurrence of to begin, call
it . At most 3 occurrences of overlap , and so we choose any occurrence
143
of which does not overlap , and call it . is any occurrence of which
does not overlap or , and we continue in this fashion. We are able to choose
at least disjoint occurrences of in in this way. We
then choose any nonempty subset of this chosen set of occurrences of , and for each
chosen occurrence of , if its shape is a copy of , we use strong irreducibility to
replace the central of by . is defined to be the set of words which
could be created by performing such replacements on . The cardinality of is
then at least for any .
We claim that for any . To show this, we
first show an auxiliary claim: that no occurrence of is ever incidentally created
in the replacement process of . In other words, when some occurrences of in
are replaced by words containing to create an element of , the
only occurrences of in the result are the ones which are subwords of each replaced
occurrence of . Suppose that this is not the case; that for some and
, contains an occurrence of which occupies a copy of , which we
call , which was not the central of a copy of which was occupied by one
of the replaced occurrences of in . Denote by the central of and by the
copy of in which is central. Then . Since , .
This implies that one of the replacements made had nonempty intersection with ,
otherwise , a contradiction. Call the where this replacement
was made , and call its central . By our hypothesis, was not one
of the replaced copies of , and so . Since all of the replacements made
were disjoint, . We know that as well. Since ,
144
. But, . Since ,
. It was part of the definition of that (0-4)W
, so . Therefore, .
We have then shown that . Since is central in and
is central in , and so as well. We make
one more notation: call the boundary of thickness of and the boundary
of thickness of . Since , . Then, since ,
. Also, , so ,
which implies that has nonempty intersection with the central of , a
contradiction to the aperiodicity property of .
We have then shown that the only occurrences of in any for some
are those which occupy the central of replaced occurrences of .
We also have a sort of converse result: for any such , , and for every a copy
which is filled with an occurrence of in which is replaced, the central of
is filled with in . This fact rests on the disjointness of the replacements made in
the definition of ; since these replacements are disjoint, each letter is changed at
most once.
Now, consider any for . Since , has at least one
occurrence of . Since the only occurrences of in come from replacements during
the application of , we know that for every a copy of with , if
we call the copy of in which is central, it must be the case that .
We also know that these are the only replacements performed in turning into ,
145
since any others would have resulted in more occurrences of . So, determines the
set of replacements which were made in . However, trivially outside of the
regions where replacements occurred, and so determines the letters of outside
regions where replacements occurred, as well. Therefore, is uniquely determined by
, and so we know that for , .
Since for
, we have shown that
for large . Take natural logarithms of both sides, divide by , and let to
see that
.
Since was arbitrary, we allow it to approach zero, and so
III 2 . ,
which for large enough , gives
,
or, by replacing by its maximum possible value ,
.
146
Combining this with the earlier upper bound on , we have
.
3.5 A closer look at the main result
We here wish to briefly review the proof of Theorem 3.1.22, and point out that most
of the complications encountered in the proof were due to possible problems with
periodicity of . If some assumptions about aperiodicity are made, the proof can
be simplified significantly. We will skip some of the details of these proofs, as they
are just simplified versions of arguments already made in the main text.
Theorem 3.5.1. For any strongly irreducible shift X of finite type with uniform
filling length R and positive topological entropy , and for any words w,
(X) such that w and agree on the boundary of thickness t and any replacement
of w by or vice versa cannot create or destroy any other occurrences of w and ,
if we denote by the shift of finite type , then for any an ergodic measure
of maximal entropy on and any an ergodic measure of maximal entropy on X,
.
Proof: To prove an upper bound on , define a mapping (similar
to from the upper bound portion of the proof of Theorem 3.1.22) on :
147
given , one replaces the occurrences of by , going in order lexico-
graphically. Since none of these replacements can make new occurrences of , this
process terminates. The word occupying the central of the resulting word is then in
for the same reasons as before, and we call this word . We then wish to
bound from above the size of 0 , , , , for any and any
ergodic measure of maximal entropy on . Recall that , , , , is the set of
words in which have between and
occurrences of for , .
Consider any , , , , . Because of the fact that occurrences of
and are not incidentally created nor destroyed during the replacements made in
the process of changing to , the word occupying a shape a copy of will be
changed from to at some point during these replacements if and only if .
If , then , since once an occurrence of is created, it cannot
be eliminated in any of the other replacements made in the process of changing
to . Therefore, any replacements made are at copies of such that
. We have then shown that if we denote by , . . . , the set of copies
of such that , then agrees with on , and for each ,
or . This shows that there are at most choices for , and
so since there are at most ways to fill with letters, that
But, , and . Therefore, , the number
of occurrences of in , is equal to , where is the number of occurrences of
in and is the number of occurrences of in . By definition of ,
and . By Proposition 3.2.5,
148
since and agree on their boundary of thickness . Therefore,
, and so we see that if 0 , , , , is nonempty,
then it has cardinality at most This implies that
.
Take natural logarithms of both sides, divide by , and let to get
,
and by using the definition of entropy and Lemma 3.2.7, we see that
.
Since is a measure of maximal entropy, and since can be arbitrarily small, we may
rearrange to arrive at
.
We could then use Lemma 3.2.6 to further bound this from above, but we leave the
bound as it is for now, to emphasize the fact that our upper bound comes from .
We can also give a similarly shortened proof of a lower bound on .
Define a map (similar to from the proof of the lower bound portion of Theo-
rem 3.1.22) from to as follows: for any which has
at least one occurrence of , take any nonempty subset of the occurrences of in
149
and replace them all by . It is possible to perform all of these replacements simulta-
neously because the hypotheses of Theorem 3.5.1 imply that the portions of any two
occurrences of which would need to be changed to turn them into are disjoint.
(If not, then a replacement of one of them by would destroy the other.) Therefore,
for any ergodic measure of maximal entropy on and for any , , , ,
. For exactly the same reasons as in the earlier lower bound
proof, for . Therefore,
.
Take natural logarithms of both sides, divide by , and let to see that
,
and by using Lemma 3.2.7 and letting approach zero since it is arbitrary,
(In ) .
Finally, we use the fact that is a measure of maximal entropy on to rewrite as
.
We recall that the upper and lower bounds in Theorem 3.1.16 (the one-dimensional
case) have a ratio bounded away from zero and infinity, but that we showed that it is
150
impossible to have such close bounds in the multidimensional case which depend only
on the size of . It is then natural to wonder whether or not it is possible, at least
under the hypotheses on from Theorem 3.5.1 and for large , to have a lower bound
of the form for . It turns out that this cannot be true if
there are multiple measures of maximal entropy. In one dimension, the uniqueness
of a measure of maximal entropy of any mixing shift of finite type is well-known,
but Burton and Steif ([BuSl], ) have shown that there are multidimensional
strongly irreducible shifts of finite type which have more than one measure of maximal
entropy. In order to show that this type of lower bound is too good to be true, we
need a lemma:
Lemma 3.5.2. If X is a strongly irreducible -shift of finite type t and if we denote
by the set of words w for which there exists where
and agree on the boundary of thickness t and such that replacing w by or vice
versa can never incidentally create nor destroy an occurrence of w or , then for
any ergodic measure of maximal entropy on X, .
Proof: We claim that as long as is sufficiently large, and has the property
that given any two disjoint copies and of contained in , ,
then . Consider any with the property just described. Then create
a standard replacement of in the same way as in the proof of Theorem 3.3.1, by
1
changing only on a central copy of . (Recall that or
and that is chosen so that there exists , (X) so that
is not a subword of . We before chose to be odd, but here we take it to have the
151
same parity as so that there is a central within ) We denote by the central
copy of on which is changed to create , and so .
We have four cases that we would like to show are impossible:
Case 1: Suppose that a replacement of by could possibly create a new occur-
rence of . More rigorously, that there exists and copies and of such
that , and if we define by the element of created by replacing by
, then , but . This means that and .
Since the central in is occupied by , this implies that , or else
would have to contain this occurrence of . However, in order for
to be a newly created occurrence of , it must be the case that has nonempty
overlap with , and so . By this upper bound on
, contains a cube of size at least , which must contain a copy
of disjoint from the very small sets and , call it .
Since , . Since is disjoint from ,
. Since , .
Therefore, . But, since , which
is greater than for large , and are disjoint copies of
, and so we have a contradiction.
Case 2: Suppose that a replacement of by could possibly destroy an existing
occurrence of . More rigorously, that there exists and copies and of
such that , and if we define by the element of created by replacing
152
by , then , but . This means that and
. Everything from here proceeds to a contradiction exactly as in Case 1.
Case 3: Suppose that a replacement of by could possibly destroy an existing
occurrence of . More rigorously, that there exists and copies and of
such that , and if we define by the element of created by replacing
by , then , but . This means that and .
However, since , it must be the case that has nonempty
intersection with . This again implies that . It is then possible
to choose a positive integer so that . Then choose
any a copy of so that , . Since , .
Since , . So, . In the same
way, we can see that , and since , and
are disjoint and so we have a contradiction.
Case 4: Suppose that a replacement of by could possibly create a new occur-
rence of . More rigorously, that there exists and copies and of such
that , and if we define by the element of created by replacing by
, then , but . This means that and .
However, since , it must be the case that has nonempty
intersection with . This again implies that . It is then pos-
sible to choose a positive integer so that . Then,
choose any a copy of so that , , and both are disjoint
from . Then, since , . Since ,
153
. So, . In the same way, we can
see that . Since and are disjoint from
, and . Therefore,
. Since , and are disjoint
and so we have a contradiction.
Therefore, as long as has for and disjoint copies of in ,
. This implies that
.
(3.8)
We can now use Lemma 3.2.6 to see that for any , any with
, and any ergodic measure of maximal entropy on ,
,
Since ,
.
Since , for
large , and so by strong irreducibility,
,
154
which is at least by Lemma 3.2.1. Combining this with shift-invariance
of and (3.8), we see that
Again by Lemma 3.2.1, Therefore,
,
which clearly approaches zero as .
Now, suppose that for any strongly irreducible -shift of finite type , there exist
constants , and an ergodic measure of maximal entropy on such that
for any with , . For a contradiction,
choose any with more than one measure of maximal entropy. Since the extreme
points of the set of measures of maximal entropy are the ergodic measures of maximal
entropy, there must exist some ergodic measure of maximal entropy on .
Then we have
(3.9)
for all with . However, since and are ergodic, they must be
mutually singular. Therefore, for any , there exists some open set such that
and . By Lemma 3.5.2, for large enough ,
155
and obviously . Since is an open set in , it is a union of cylinder
sets. This means that there exists so that for any , there exists some
a union of cylinder sets of words in with , and
again . However, we assumed that
(2 In ) for all and , and so since is a union of cylinder
sets associated to -words, for any we may sum (3.9) over to see
that ( In ) , and so ( In 2) , a contradiction
for small enough . This implies that if , then there does not exist a lower
bound on of the form which holds for all words
for sufficiently large .
3.6 An application to an undecidability question
One of the complexities of multidimensional symbolic dynamics is that it is algorith-
mically undecidable, given just the set of forbidden words, whether or not a shift
of finite type in is nonempty. (See [B], [R], [Wan] for details.) One application of
Corollary 3.3.7 is a condition under which is nonempty:
Theorem 3.6.1. For any alphabet , there exist , such that for any
and any finite set of words satisfying
and for , .
To prove this, we need the following lemma:
Lemma 3.6.2. For any strongly irreducible -shift of finite type with
uniform filling length , there exist , dependent only on such that for
156
any with , if we denote by the shift of
finite type , then there is some a subword of such that , is
nonempty and strongly irreducible with uniform filling length at most .
Proof: Suppose that such a shift is given. By Corollary 3.3.7, there exists such
that for any with , there exist , such that is a
subword of , and agree on , and replacing an occurrence of by in an
element of cannot possibly create a new occurrence of . (We are not using the
full force of Corollary 3.3.7 here, in that we do not care about the size of , only
that exists.) By reviewing the proof of Corollary 3.3.7, we see that a sufficient
condition for the existence of is that
'
for some constant depending only on . Some algebraic manipulation shows that
the above is implied if . By Lemma 3.2.1,
, so . Therefore, it is sufficient to have
In . Since and , for any a sufficient
condition is . There is a constant dependent
only on so that if , . For such , it would be enough to have
, or by dividing both sides by and then squaring
both sides, for some constant dependent only on . So, as long as
, such a exists.
Consider any two shapes , such that , and any words
157
and . Since , there exists such
that . Similarly, there exists such that . For
any , by definition there exists such that . Similarly,
for any , there exists such that . But ,
so . By the triangle inequality, . Therefore, since , were
arbitrary, , and so by strong irreducibility of there
exists such that and . We now fix any
ordering of the elements of with a least element (say lexicographically with respect
to polar coordinates) and replace each element of by in turn with respect to
this order. In this way, we eventually arrive at which has no occurrences of
and is thus an element of . Note that since and
are words in , they had no occurrences of . Therefore, any of the replaced
occurrences of had nonempty intersection with
Consider such a replaced occurrence which occupies a copy of . From the fact
just noted, there exists such that . This implies that
and . Therefore, since the size of is , is disjoint
from both and . Since is an arbitrary replaced occurrence of , this implies
that remained unchanged on and throughout the process of changing it to ,
and so and . By definition, we have shown that
, is nonempty and strongly irreducible with uniform filling length at most .
Proof of Theorem 3.6.1: We prove Theorem 3.6.1 by induction. Our inductive
158
hypothesis is that for any as described in the hypotheses, contains a strongly
irreducible shift of finite type with uniform filling length less than . First note
that by Theorem 3.1.22 and Lemma 3.6.2, there certainly exists such that for
any word with , there is a subword of such that is
nonempty and strongly irreducible with uniform filling length . We take
this to be greater than the from Lemma 3.6.2, and take our to be ,
where is from Lemma 3.6.2. Now suppose the hypothesis to be true for , and
consider any , , . . . , satisfying the hypotheses of Theorem 3.6.1.
By the inductive hypothesis, contains a strongly irreducible shift
of finite type with uniform filling length . There are two cases: either
or not. If , then , and so since
, in this case contains a strongly irreducible shift
of finite type with uniform filling length and the inductive
hypothesis is verified for . So, we suppose that . We will show
that . By the hypotheses of Theorem 3.6.1, we see
that Since the largest
size of a word in is , , and so . Finally,
. Therefore, satisfies the hypotheses of Lemma 3.6.2 and so there
is a subword of such that is nonempty and strongly irreducible
with uniform filling length less than 42 . Also, ,
and so the induction is complete. We have then shown that for any , contains
a nonempty shift and is therefore nonempty itself.
159
3.7 Questions
A few questions suggest themselves based on our results. The first, and perhaps most
obvious, is simply how the bounds in Theorem 3.1.22 could be improved. (We suspect
that at least the lower bound is far from optimal.)
Question 3.7.1. Can the constants and in the statement of Possible Theo-
rem 3.1.21 be improved from their values in Theorem 3.1.22 l.?
We know at least that the gap between and can be improved only up to
possible additive and multiplicative constants; we showed earlier that there are ex-
amples which force for strongly irreducible shifts of finite type with
arbitrarily large , and our Theorem 3.1.22 has . However, it
would be nice to achieve optimal values.
Next, recall that we showed in the introduction that no result nearly as strong as
Theorem 3.1.22 can be true for shifts of finite type which are only assumed to be
topologically mixing. However, it is probable that something can be said. So, we can
ask
Question 3.7.2. If is a topologically mixing -shift of finite type with
positive topological entropy , and if we denote by the shift of finite type
, then is it true that for every , there exists such that for all
with , ? If so, can anything be said about
the rate at which approaches zero as the size of
approaches infinity ?
160
If it is the case that approaches zero as , then it seems
likely that some new techniques would be necessary for a proof, since there exist
non-strongly irreducible shifts of finite type whose languages contain words for
which there exists no word agreeing with on its border. For example,
in the checkerboard island shift , any square word which is a finite portion of a
checkerboard is forced by its boundary.
161
BIBLIOGRAPHY
[B] ROBERT BERGER, The undecidability of the domino problem, Mem. Amer.
Math. Soc. 66 (1966)
[Be] VITALY BERGELSON, Ergodic Ramsey Theory- an Update, Ergodic Theory
of -actions, M. Pollicott and K. Schmidt, editors, London Math. Soc.
Lecture Notes Series 228, Cambridge University Press (1996), 1-61.
[BeL] VITALY BERGELSON AND ALEXANDER LEIBIVIAN, Polynomial extensions
of van der Waerden's and Szemeredi's theorems, Journal of AMS 9 (1996),
no. 3, 725-753.
[Bi] G.D. BIRKHOFF, Proof of the ergodic theorem, Proc. Nat. Acad. Sci. USA,
17 (1931) 656-660.
[Bou] JEAN BOURGAIN, Poin rwise ergodic theorems for arithmetic sets, Publ.
Math. IHES 69 (1989), no. 3, 5-45.
[Bou2] JEAN BOURGAIN, On poin rwise ergodic theorems for arithmetic sets, C. R.
Acad. Sci. Paris Sir. I Math. 305 (1987), no. 10, 397-402.
[BoyLR] MICHAEL BOYLE, DOUGLAS LIND, AND DANIEL RUDOLPH, The auto-
morphism group of a shift of finite type, Trans. Amer. Math. Soc., 306
(1988), no. 1, 71-114.
[BuSl] ROBERT BURTON AND JEFFREY STEIF, Non-uniqueness of measures of
maximal entropy, Ergodic Theory Dynam. Systems, 14 (1994), no. 2. 213-
235.
[BuS2] ROBERT BURTON AND JEFFREY STEIF, Nerv results on measures of
imal entropy, Israel J. Math. 89 (1995), nos. 1-3, 275-300.
[Fa] BASSAIVI FAYAD, Analytic mixing reparametrizations of irrational florvs, Er-
godic Theory Dynam. Systems 22 (2002), no. 2, 437-468.
162
[Fu]
HILLEL FURSTENBERG, Strict ergodicity and transformation of the torus,
Amer. J. Math. 83 (1961), 573-601.
[GO1] LEO GUIBAS AND ANDREW ODLYZKO, 6aximal prefix-synchronized codes,
SIAM J. Appl. Math. 35 (1978), 401-418.
[GO2] LEO GUIBAS AND ANDREW ODLYZKO, Periods in strings, J. Combin. The-
ory Ser. A, 30 (1981), 19-42.
[HaK] FRANK HAHN AND YITZHAK KATZNELSON, On the entropy of uniquely
ergodic transformations, Trans. Amer. Math. Soc. 126 (1967), 335-360.
[HaW] G.H. HARDY AND E.IVI. WRIGHT, An introduction to the theory of num-
bers, Clarendon Press, 1979.
[HeM] GUSTAV HEDLUND AND MARSTON Morse, Symbolic dynamics, Amer. J.
Math., 60 (1938), no. 4, 815-866.
[L] DOUGLAS LIND, Perturbations of shifts of finite type, Siam J. Discrete
Math. 2 (1989), no. 3, 350-365.
[LM DOUGLAS LIND AND BRIAN Marcus, An Introduction to Symbolic Dy-
namics and Coding, Cambridge University Press, 1995.
[M] MICHAEL MISIUREWICZ, A short proof of the variational principle for
-action on a compact space, Asterisque 40 (1975), 147-157.
[QS] ANTHONY QUAS AND A.A. §AHIN, Entropy gaps and locally maximal en-
tropy in subshifts, Ergodic Theory Dynam. Systems 23 (2003), 1227-
1245.
[QT] ANTHONY QUAS AND PAUL Trow, Subshifts of multi-dimensional shifts
of finite type, Ergodic Theory Dynam. Systems 20 (2000), no. 3, 859-874.
[R] RAPHAEL IVI. ROBINSON, Undecidability and nonperiodicity for tilings of
the plane, Invent. Math 12 (1971), 177-209.
[Wal] PETER WALTERS, An Introduction to Ergodic Theory, Springer-Verlag,
1982.
[Wan] HAO WANG, Proving theorems by pattern recognition II, AT&T Bell Labs.
Tech. J. 40 (1961), 1-41.
163
[Wi]
MAacuteTEacute WIERDL, Poin rwise ergodic theorem along the prime numbers, Israel
J. Math. 64 (1988), no. 3, 315-336.
164