]> An IDEAL Group, CLC, Project

SOME RESULTS ON RECURRENCE AND ENTROPY

DISSERTATIO N

Presented in Partial Fulfillment of the Requirements for

the Degree Doctor of Philosophy in the Graduate

School of the Ohio State University

By

Ronald Lee Pavlov Jr., B.S., M.S.

**** *

The Ohio State University

2007

Dissertation Committee:

133C1bClbl11111111lbbCC. Approved by

Professor Vitaly Bergelson, Advisor

Professor Alexander Leibman

Advisor

Professor Manfred Einsiedler Graduate Program in Mξ

Graduate Program in Mathematics

ABSTRACT

This thesis is comprised primarily of two separate portions. In the first portion, we

exhibit, for any sparse enough increasing sequence {pn} of integers, a totally minimal,

totally uniquely ergodic, and topologically mixing system (X, T) and fC(X) for

which the averages 1Nn=0N-1f(Tpnx) fail to converge on a residual set in X, answering

negatively an open question of Bergelson. We also construct here a totally minimal,

totally uniquely ergodic, and topologically mixing system (X, T) and xX for

which x{Tpnx}.

In the second portion, we study perturbations of multidimensional shifts of finite

type. Given any Zd shift of finite type X for d>1 and any word w in the language of

X, denote by Xw the set of elements of X in which w does not appear. If X satisfies

a uniform mixing condition called strong irreducibility, we obtain exponential upper

and lower bounds on htop(X)-htop(Xw) dependent only on the size of w. This result

generalizes a result of Lind about Z shifts of finite type.

ii

To Dilip

ill

ACKNOWLEDGMENTS

First and foremost, I would like to thank my advisor Vitaly Bergelson. This the-

sis could never have been completed without his help. His infectious enthusiasm

for mathematics, along with a willingness to share his knowledge, were a constant

inspiration.

I would also like to thank my other committee members, Professors Alexander Leib-

man and Manfred Einsiedler, for agreeing to serve on my committee and for many

fruitful mathematical discussions.

Finally, I thank my friends and loved ones, without whom I would never have been

able to get as far as I have. In no particular order:

Thank you to Mom and Dad, who have always been willing to do anything to

support me and have always believed in me.

Thank you to Greg, without whose friendship and sense of humor graduate school

would have been much less bearable.

Thank you to Jasmine, who was always there for me. I can never thank you

enough for your constant love and support.

IV

VITA

2000-Present . . . . . . . . . . . . . . . . . . . . . . . . . . Graduate Teaching Associate,

The Ohio State University

1998-1999 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Student Instructional Associate,

The Ohio State University

2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.S. in Mathematics,

The Ohio State University

2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.S. in Mathematics,

The Ohio State University

v

PUBLICATIONS

Research Papers

Some Counterexamples In Topological Dynamics (Ergodic Theory and Dynam-

ical Systems, to appear)

Perturbations of Multidimensional Shifts of Finite Type (submitted)

FIELDS OF STUDY

Major Field: Mathematics

Specialization: Ergodic Theory and Dynamical Systems

vi

TABLE OF CONTENTS

Abstract ii

Dedication . iii

Acknowledgments . 1V

Vita v

List of Figures ix

CHAPTER PAGE

1Introduction 1

2Recurrence and convergence of ergodic averages along sparse sequences 8

2.1 Introduction 8

2.2 Some general symbolic constructions 16

2.3 Some symbolic counterexamples 31

2.4 Some general constructions on connected manifolds . 42

2.5 Some counterexamples on connected manifolds 55

2.6 A counterexample about simultaneous recurrence 61

2.7 Questions 65

3Perturbations of multidimensional shifts of finite type 67

3.1 Introduction 67

3.2 Some measure-theoretic preliminaries 84

3.3 A replacement theorem 93

3.4 The proof of the main result 114

3.5 A closer look at the main result 146

3.6 An application to an undecidability question 155

3.7 Questions 159

Vll

Bibliography

161

Vlll

LIST OF FIGURES

FIGURE PAGE

3.1 f2 's action on a sample element of Y 73

3.2 A portion of a sample element of Z 76

3.3 a4 77

3.4 b4 77

3.5 Rk(j1+R),k(j2+R),...,k(jd+R) 85

3.6 A standard replacement of u. 95

3.7 A sample element S of Rj 97

3.8 An element S of Rj associated to two standard replacements 98

3.9 The suboctants of Bj 99

3.10 Elements S, S of Rj whose difference is a multiple of ei tOO

3.11 An element S of Rj which contains O 106

3.12 107

3.13 Bj 108

3.14 Intersecting occurrences of w 116

3.15 SiSi(R) 120

3.16 Disallowed and allowed pairs of overlapping wj,d 127

3.17 A point xXy 138

ix

3.18 The correspondence between copies of ΓW and points in Γj(2)

139

3.19 How a copy of ΓW is filled if bj,d(p)=1 140

3.20 f

141

x

(JIAPTER 1

INTRODUCTION

This introduction will serve as a brief mathematical and historical overview of both

of the main problems that we will examine in this thesis. Due to their somewhat

disparate natures, the two portions of this thesis will each contain a more in-depth

introduction as well. For this reason, we will for the most part relegate formal defi-

nitions to their pertinent introduction.

Ergodic theory is the study of average or long-term behavior of systems which evolve

with time. For example, one can consider a particle bouncing in a box at fixed speed.

Its position and velocity at any time can be represented by a vector in R6. One can

then model this behavior by taking X to be the space of all possible positions and

velocities for the particle, and T a self-map of X which, given a position and velocity,

gives the position and velocity one second later. This pairing of a space X with a

self-map T is called a dynamical system.

The more general setup is that of a group G acting on a space X with some sort

of structure by maps {Tg}gG preserving this structure. In this thesis, G is always

Zd for some d N. Measure-theoretic ergodic theory occurs when X is a measure

space with a probability measure μ invariant under each Tg, and so in this case we call

1

(X, B, μ, {Tg}gG) a measure-preserving dynamical system. Topological dynam-

ics occurs when X is a compact topological space and each Tg is a homeomorphism,

and so in this case we call (X, {Tg}gG) a topological dynamical system. In ei-

ther type of system, if G=Z, then Tn=(T1)n for any integer n, and so we shorten

(X, B, μ, {Tn}nZ) to (X, B, μ, T) and (X, {Tn}nZ) to (X, T). (Here T=T1.)

These cases are not as disjoint as they may first appear: the famed Bogoliouboff-

Krylov theorem states that any topological dynamical system (X, {Tg}gG) has at

least one invariant probability measure μ as long as G is an amenable group. (An

examination of amenability is beyond the scope of this thesis, but we mention that

amenable groups are a class which contain all abelian groups. We only consider

G=Zd in this thesis, so all topological dynamical systems examined here will possess

invariant measures.) This sometimes allows one to examine topological properties of

a system via properties of invariant measures. For instance, in Chapter 3, we will

prove some purely combinatorial properties of a Zd-action by homeomorphisms of a

Cantor set by means of studying the invariant measures of this action.

Chapter 2 deals primarily with ergodic averaging. One of the fundamental results of

ergodic theory is Birkhoff's ergodic theorem:

Theorem 1.0.1. ([Bi]) For any measure-preserving dynamical system (X, B, μ, T)

and any fL1(X), limn1ni=0n-1f(Tix) exists for μ-almost every xX.

Several results have been proven about the convergence of such averages when one

averages not along all powers of T, but only along some distinguished subset of the

2

integers. ([Bou], [Bou2], [Wi]) In particular, when one averages along {p(n)} for a

polynomial p(n) with integer coefficients, there is the following result of Bourgain:

Theorem 2.1.12. ([Bou], p. 7, Theorem t) For any measure-preserving dynamical

system (X, B, μ, T), for any polynomial q(t)Z[t], and for any fLp(X, B, μ) with

p>1, limN1Nn=1Nf(Tq(n)x) exists for μ-almost every xX.

Theorem 2.1.12 can be interpreted as follows: for any polynomial q(t)Z[t], any

measure-preserving system (X, B, μ, T), and any measure-theoretically "nice" func-

tion f, the set of points x where limN1Nn=1Nf(Tq(n)x) fails to converge is of mea-

sure zero, or negligible measure-theoretically. It is then natural to wonder whether or

not there is a topological parallel to this result using topological notions of "niceness"

(continuity) and negligibility (first category), and in fact such a question was posed

by Bergelson:

Question 2.1.13. ([Be], p. 51, Question 5) Assume that a topological dynamical

system (X, T) is uniquely ergodic1, and let pZ[t] and fC(X). Is it true that for

all but a first category set of points limn1ni=0n-1f(Tp(i)x) exists?

One of our results is a (quite negative) answer to Question 2.1.13:

Theorem 2.1.15. For any increasing sequence {pn} of integers with upper Banach

density zero2, there exists a totally minimal, totally uniquely ergodic, and topologically

1A topological dynamical system (X, B, μ, T) is uniquely ergodic if there exists exactly one T-

invariant measure μ.

2A set AN has upper Banach density zero if limsupnsupmN|{m,m+1,...,m+n-1}A|n=0.

3

mixing topological dynamical system (X, T) and a continuous function f on X with

the property that 1Nn=0N-1f(Tpnx) fails to converge for a residual set of x.

We also examine recurrence along distinguished subsets of the integers, motivated

primarily by the following result:

Theorem 2.1.17. ( [BeL], p. 14, Corollary 1.8) For any dN, any minimal3 topolog-

ical dynamical system (X, {Tv}vZd) and any polynomials q1(t), q2(t), ..., qd(t)Z[t]

with qi(0)=0 for 1<i<d, for a residual set of xX there exists a sequence

{ni} of positive integers such that Tq(ni)eixx for 1<i<d, where {ei}i=1d is the

standard orthonormal basis of Rd.

In particular, this implies that for any minimal topological dynamical system (X, T)

and any polynomial q(t)Z[t] with q(0)=0, the set of xX with x{Tq(n)x}nN

is residual. The following result shows that for q with degree at least two, this residual

set is not necessarily all of X.

Theorem 2.1.18. For any increasing sequence {pn} of integers with upper Banach

density zero, there exists a totally minimal, totally uniquely ergodic, and topologically

mixing topological dynamical system (X, T) and an uncountable set A X such that

for every x A, the sequence {Tpnx} does not have x as a limit point, i.e. there is

no sequence of positive integers {ni} such that Tpnix converges to x.

Another consequence of Theorem 2.1.17 is that for any commuting minimal homeo-

morphisms T and S of a compact space X, there exists a residual set of x for which

sA topological dynamical system (X, B, μ, T) is minimal if there exist no nonempty proper closed

T-invariant subsets of X.

4

there exists a sequence of positive integers {ni} such that Tnixx and Snixx.

The following theorem shows that for some systems, it is the case that this residual

set of x is not all of X.

Theorem 2.1.21. There exists a totally minimal topological dynamical system (X, T)

and a point x X such that for any positive integers r s, any sequence {ni} of

integers satisfying Trnixx and Tsnixx is eventually zero.

The systems which we construct to prove Theorems 2.1.15 and 2.1.18 are symbolic

dynamical systems. A symbolic dynamical system is defined by first choosing a

finite set A, called the alphabet. Ω=AG endowed with the product topology is a

compact space, and for any gG, we may define σg the shift homeomorphism by

(σgω)(h)=ω(hg) for ω Q. Any closed set XΩ has a topology induced by Ω, and

if X is invariant under each Tg, then (X, {σg}gG) is a topological dynamical system

which we call a symbolic dynamical system.

In Chapter 3, we examine symbolic dynamical systems exclusively. There, we consider

a specific type of symbolic dynamical system called a shift of finite type. A Zd-shift

of finite type is a symbolic dynamical system (X, {σv}vZd) where X is defined by

specifying a finite set F of finite words (a word is a function from a finite subset of

Zd to A) and taking X to be the set of all elements of Ω in which none of the words

in F appear. The shift of finite type X specified by a finite set F of words in this

way is denoted by ΩF. For example, the set of all biinfinite sequences of zeroes and

ones in which no two ones occur consecutively is a shift of finite type, as is the set of

5

all Z2 arrays of zeroes and ones in which no three-by-four blocks of all zeroes or all

ones appear.

We are particularly interested in the effects of forbidding a particular word from a

shift of finite type X. Given any word w, define Xw to be the set of elements of X in

which w does not appear. Then clearly Xw is a subset of X, and so the topological

entropy htop(Xw) of Xw is not greater than the topological entropy htop(X) of X. It

is natural to wonder how much the topological entropy drops by when w is removed,

as it is a sort of measure of how important w is to the information-retaining capacity

of X. In [L], Lind proved the following: (the condition wLΓn(X) means that

wA[1,...,n]d and that w appears in some element of X.)

Theorem 3.1.16. ([L], p. 360, Theorem 3) For any topologically transitive Z-shift

of finite type X=ΩF with positive topological entropy htop(X), there exist constants

CX, DX, and NX such that for any n> NX and any word wLΓn(X), if we denote

by Xw the shift of finite type ΩFu{w}, then

CXehtop(X)n<htop(X)-htop(Xw)<DXehtop(X)n.

Our main result is a generalization of Theorem 3.1.16 for Zd-shifts of finite type

which satisfy a mixing condition called strong irreducibility. (We formally define

strong irreducibility in Definition 3.1.19.)

Theorem 3.1.22. For any d >1 and any strongly irreducible Zd-shift X =ΩF

of finite type with uniform filling length R and positive topological entropy htop(X),

6

there exist constants NXN and DXR such that for any n> NX and any word

wLΓn(X), if we denote by Xw the shift of finite type ΩFu{w}, then

1ehtop(X)(n+44R+70)d<htop(X)-htop(Xw)<DXehtop(X)(n-2R)d.

One way in which Zd-shifts of finite type are more difficult to deal with for d>1 is

that given a finite collection F of patterns, it is undecidable whether or not the shift

of finite type X induced by F is even nonempty. Our methods yield a situation in

which this question can be answered:

Theorem 3.6.1. For any alphabet A, there exist F, G N such that for any m >0

and any finite set of words Fm={wkLΓnk(X) : 1<k<m} satisfying n1>G

and nk>F(nk-1)4d2 for 1<k<m, ΩFm.

7

(JIAPTER 2

RECURRENCE AND CONVERGENCE OF ERGODIC

AVERAGES ALONG SPARSE SEQUENCES

2.1 Introduction

In this chapter, we are concerned with the convergence of averages of the form

1Nn=0N-1f(Tpnx) for an increasing sequence of integers {pn}. We begin with some

definitions.

Definition 2.1.1. A measure-preserving dynamical system (X, B, μ, {Tg}gG)

consists of a measure space X, a probability measure μ with σ-algebra 8 of measurable

sets, and a group action {Tg}gG of transformations Tg : XX with μ(Tg-1A)=

μ(A) for all AB.

Definition 2.1.2. A measure-preserving dynamical system (X, B, μ, {Tg}gG) is er-

godic if any set A satisfying TgA A for all g G has μ(A){0,1}.

Definition 2.1.3. A topological dynamical system (X, {Tg}gG) consists of a

compact topological space X and a group action {Tg}gG of homeomorphisms Tg :

X X.

8

Definition 2.1.4. Given a topological dynamical system (X, {Tg}gG), a Borel prob-

ability measure μ on X is called ergodic if (X, B(X), μ, {Tg}gG) is an ergodic

measure-preserving dynamical system, where B(X) is the Borel σ-algebra of X.

In this chapter, all dynamical systems have G=Z, so as already described we

will use the notations (X, B, μ, T) and (X, T) for measure-preserving and topological

dynamical systems respectively.

Definition 2.1.5. A topological dynamical system (X, T) is minimal if for any

closed set K with T-1KK, K = or K =X. (X, T) is totally minimal if

(X, Tn) is minimal for every n O.

Definition 2.1.6. A topological dynamical system (X, T) is uniquely ergodic if

there is only one Borel measure μ on X such that μ(A)=μ(T-1A) for every Borel

set A X. (X, T) is totally uniquely ergodic if (X, Tn) is uniquely ergodic for

every n N.

Definition 2.1.7. A topological dynamical system (X, T) is topologically mixing

if for any open sets U, VX, there exists N N such that for any n >N,

UTnV.

Definition 2.1.8. For any set AN, the upper Banach density of A is defined

by

d*(A)=limsupsup|{m,m+1,...,m+n-1}A|n.

nmN

9

Definition 2.1.9. For any set AN, the upper density of A is defined by

d¯(A)=limnsup|{1,...,n}A|n.

Definition 2.1.10. For a set AN, the density of A is defined by

d(A)=limn|{1,...,n}A|n

if this limit exists.

Definition 2.1.11. Given a topological dynamical system (X, T) and a T-invariant

Borel probability measure μ, a point x X is (T, μ)-generic if for every f C(X),

limn1ni=0n-1f(Tix)=fdμ.

In the measure-preserving setup, there are several positive results about convergence

of averages of the form 1Nn=0N-1f(Tpnx), including this theorem of Bourgain:

Theorem 2.1.12. ([Bou], p. 7, Theorem t) For any measure-preserving dynamical

system (X, B, μ, T), for any polynomial q(t)Z[t], and for any fLp(X, B, μ) with

p>1, limN1Nn=1Nf(Tq(n)x) exists for μ-almost every xX.

The following question regarding a possible topological version of Theorem 2.1.12 was

posed by Bergelson:

Question 2.1.13. ([Be], p. 51, Question 5) Assume that a topological dynamical

system (X, T) is uniquely ergodic, and let pZ[t] and fC(X). Is it true that for

all but a first category set of points limn1ni=0n-1f(Tp(i)x) exists?

to

Bergelson added the hypothesis of unique ergodicity because it is a classical result

that a system (X, T) is uniquely ergodic with unique T-invariant measure μ if and

only if for every xX and fC(X), limn1ni=0n-1f(Tix)=Xfdμ, and so in

the topological setup this is a natural assumption to make about (X, T) to achieve

the desired result.

However, Bergelson was particularly interested in the convergence of these averages

to the "correct limit," i.e. Xfdμ where μ is the unique T-invariant measure on

X. To have any hope for such a result, it also becomes necessary to assume ergod-

icity for all powers of T in order to avoid some natural counterexamples related to

distribution (mod k) of p(n) for positive integers k. For example, if p(n)=n2, T is

the permutation on X={0,1, 2} defined by Tx=x+1 (mod 3), μ is normalized

counting measure on X, and f=χ{0}, then (X, T) is obviously uniquely ergodic with

unique invariant measure μ=δ0+δ1+δ23, but

limn1ni=0n-1f(Tp(i)x)= ifx=1ifx=0iffx=2', .and

To avoid such examples, we would need T to be totally ergodic as well as uniquely

ergodic, and so it makes sense to assume total unique ergodicity to encompass both

properties. Bergelson's revised question then looks like this:

Question 2.1.14. Assume that a topological dynamical system (X, T) is totally uniquely

11

ergodic with unique T-invariant measure μ, and let pZ[t] and fC(X). Is it true

that for all but a first category set of points limn1ni=0n-1f(Tp(i)x)=Xfdμl.?

We answer Questions 2.1.13 and 2.1.14 negatively in the case where the degree of

p is at least two, and in fact prove some slightly more general results. The level of

generality depends on what hypotheses we place on the space X. In particular, we

can exhibit more counterexamples in the case where X is a totally disconnected space

than we can in the case where X is a connected space such as Tk.

Theorem 2.1.15. For any increasing sequence {pn} of integers with upper Banach

density zero, there exists a totally minimal, totally uniquely ergodic, and topologically

mixing topological dynamical system (X, T) and a continuous function f on X with

the property that 1Nn=0N-1f(Tpnx) fails to converge for a residual set of x.

Theorem 2.1.16. For any increasing sequence {pn} of integers with the property

that for some integer d, pn+1<(pn+1-pn)d for all suffiffifficiently large n, there ex-

ists a totally minimal, totally uniquely ergodic, and topologically mixing topological

dynamical system (X, T) and a continuous function f on X with the property that

1Nn=0N-1f(Tpnx) fails to converge for a residual set of x. In addition, the space X is

a connected 2d+ 9-manifold.

We note that Theorems 2.1.15 and 2.1.16 answer Question 2.1.13 negatively for p

with degree at least two, since the sequence pn=p(n) for any nonlinear p(t)Z[t]

satisfies the hypotheses of both theorems.

Theorems 2.1.15 and 2.1.16 are about nonconvergence of ergodic averages along cer-

tain sequences of powers of x. We also prove two similar results about nonrecurrence

12

of points. As motivation, we note that a minimal system has the property that every

point is recurrent. In other words, if (X, T) is minimal, then for all xX, it is the

case that x{Tnx}nN. If (X, T) is totally minimal, then all points are recurrent

even along infinite arithmetic progressions: for any nonnegative integers a, b, and for

all xX, x{Tan+bx}nN. It is then natural to wonder if the same is true for other

sequences of powers of T, and in this vein there is the following result of Bergelson

and Leibman, which is a corollary to their Polynomial van der Waerden theorem:

Theorem 2.1.17. ( [BeL], p. 14, Corollary 1.8) For any dN, any minimal topolog-

ical dynamical system (X, {Tv}vZd) and any polynomials q1(t), q2(t), ..., qd(t)Z[t]

with qi(0)=0 for 1<i<d, for a residual set of xX there exists a sequence

{ni} of positive integers such that Tq(ni)eixx for 1<i<d, where {ei}i=1d is the

standard orthonormal basis of Rd.

In particular, Theorem 2.1.17 implies that for a minimal system (X, T) and any

polynomial p(t) with p(0)=0, the set of x such that x{Tp(n)x}nN is residual. If

(X, T) is assumed to be totally minimal, then by definition if the degree of p is one,

then every point xX is in {Tp(n)x}nN. The following two results imply that if the

degree of p is greater than one, total minimality of (X, T) does not necessarily imply

that every point xX is in {Tp(n)x}nN.

Theorem 2.1.18. For any increasing sequence {pn} of integers with upper Banach

density zero, there exists a totally minimal, totally uniquely ergodic, and topologically

mixing topological dynamical system (X, T) and an uncountable set A X such that

13

for every xA, the sequence {Tpnx} does not have x as a limit point, i.e. there is

no sequence of positive integers {ni} such that Tpnix converges to x.

Theorem 2.1.19. For any increasing sequence {pn} of integers with the property

that for some integer d, pn+1<(pn+1-pn)d for all suffiffifficiently large n, there exists a

totally minimal, totally uniquely ergodic, and topologically mixing topological dynam-

ical system (X, T) and a point xX such that the sequence {Tpnx} does not have

x as a limit point, i.e. there is no sequence of positive integers {ni} such that Tpnix

converges to x. In addition, the space X is a connected 2d+ 7-manifold.

The following simple lemma shows that if (X, T) is topologically mixing, then The-

orems 2.1.18 and 2.1.19 cannot be improved too much, i.e. there is no increasing

sequence {pn} for which we can exhibit topologically mixing examples with a second

category set of such nonrecurrent points.

Lemma 2.1.20. If a topological dynamical system (X, T) is topologically mixing, then

for any increasing sequence {pn}, the set of x X for which x {Tpnx} is of first

category.

Proof: Define Cε={x : d(x, Tpnx)>nZ}. It is clear that Cε is closed. We

claim that Cε contains no nonempty open set, which shows that it is nowhere dense,

implying that C=n=1C1n the set of points x for which x is not a limit point of

{Tpnx} is of first category. Suppose, for a contradiction, that there is a nonempty

open set U with UCε for some ε. Then, there exists V with diam(V <ε such that

VUCε. By topological mixing, there exists n such that VT-pnV. This

14

implies that there exists xV so that TpnxV. Since diam(V <ε, d(x, Tpnx)<ε.

However, xVCε, so we have a contradiction.

Theorem 2.1.18 shows that it is possible for this set of points nonrecurrent along pn

to be uncountable though.

We mention that some mixing condition is necessary for a statement like Lemma 2.1.20;

as a simple example, consider an irrational circle rotation T : xx+α on the cir-

cle T. There is clearly some increasing sequence of integers {pn} such that pn(y

(mod 1) 12. Then, for any xT, Tpnxx+12, and so for every xX, {Tpnx}

does not have x as a limit point.

We also prove a result about simultaneous recurrence. Theorem 2.1.17 implies that

for any commuting minimal homeomorphisms T and S of a compact space X, there

exists a residual set of x for which there exists a sequence of positive integers {ni}

such that Tnixx and Snixx. The following theorem shows that for some

systems, it is the case that this residual set of x is not all of X. (In this particular

example, T and S are powers of the same homeomorphism.)

Theorem 2.1.21. There exists a totally minimal topological dynamical system (X, T)

and a point x X such that for any positive integers r s, any sequence {ni} of

integers satisfying Trnixx and Tsnixx is eventually zero.

Before proceeding with the proofs, we now give a brief description of the content of

this chapter. In Section 2.2, we will describe some general symbolic constructions of

15

topological dynamical systems with particular mixing properties. At the end of this

section, we will arrive at a construction of a system which is totally minimal, totally

uniquely ergodic, and topologically mixing, and which has as a parameter a sequence

of integers {nk}.

In Section 2.3, by taking this sequence {nk} to grow very quickly, we will show

that the examples constructed in Section 2.2 are sufficient to prove Theorems 2.1.15

and 2.1.18. Some interesting questions also arise and are answered in Section 2.3

pertaining to the upper Banach density of countable unions of sets of upper Banach

density zero.

In Section 2.4, we create a flow under a function with base transformation a skew

product, which acts on a connected manifold, and which is totally minimal, totally

uniquely ergodic, and topologically mixing. This transformation has as a parameter

a function fC(T). We use conditions of Fayad ([Fa]) on flows under functions

to achieve topological mixing, and some conditions of Furstenberg ([Fu]) on skew

products to prove total unique ergodicity.

In Section 2.5, by a judicious choice of f, we use the examples of Section 2.4 to prove

Theorems 2.1.16 and 2.1.19.

In Section 2.6, we prove Theorem 2.1.21.

Finally, in Section 2.7 we give some open questions about strengthening our results.

2.2 Some general symbolic constructions

Definition 2.2.1. An alphabet is any finite set, whose elements are called letters.

16

Definition 2.2.2. A symbolic dynamical system with alphabet A is a topological

dynamical system (X, {σg}gG) where for any gG, σg is the shift homeomorphism

of Ω=AG defined by (σgω)(h)=ω(hg) for ωΩ, and X is any closed subset of Ω

invariant under each σg.

Definition 2.2.3. A word on the alphabet A is any element of AF for some finite

set F G. F is called the shape of w.

For any words v of length m and w of length n, we denote by vw their concatenation,

i.e. the word v[1]v[2]... v[m]w[1]w[2]... w[n] of length m+n. We denote by wk the

word torn . . . w given by the concatenation of k copies of w.

Definition 2.2.4. Given any symbolic dynamical system (X, {σg}gG), the language

of X, denoted by L(X), is the set of all words which appear as subwords of X.

All symbolic dynamical systems appearing in this chapter have alphabet {0, 1}, G=

Z. We also restrict ourselves in this chapter to words with shape {1, ..., n} for some

n. In other words, a word can be thought of for now as a finite string of letters

w=w(1)w(2)... w(n). The number of letters in a word w is called its length.

Definition 2.2.5. A word w of length n is a subword of a sequence u Ω if there

exists k Z such that u(i+k)=w(i) for 1<i<n. Analogously, w is a subword

of a word v of length m if there exists 0<k<m-n such that v(i+k)=w(i) for

1<i<n.

We will outline three constructions which algorithmically create X such that (X, σ)

have certain properties. (Here the "certain properties" in question depend on which

17

construction is used.) It should also be mentioned that many ideas from these con-

structions are taken from work of Hahn and Katznelson ([HaK]), where they also

algorithmically constructed symbolic topological dynamical systems with certain er-

godicity and mixing properties.

Construction 1: (Minimal) We define inductively nk and Ak, which are, respec-

tively, sequences of positive integers and sets of words on the alphabet {0, 1}. Each

word in Ak is of length nk. (We will use the term " Ak-word" to refer to a member of

Ak from now on.) We define these as follows: always define n1=1 and A1={0,1}.

Then, for any k>1, nk+1 is defined to be any integer greater than or equal to nk|Ak|

which is also a multiple of nk, and then Ak+1 is chosen to be the set of words of

length nk+1 which are concatenations of Ak-words, containing each Ak word in the

concatenation at least once.

We then define X to be the set of all xΩ which are limits of shifted Ak words In

other words, xX if there exist wkAk and mkZ such that for all large enough

i, x(i)=wk(i-mk). Since Ω is compact, such x exist by a standard diagonalization

argument. It is easily checkable that X is closed and a-invariant. The claim is that

regardless of the choice of the integers nk, as long as nk divides nk+1, and nk+1>

nk|Ak|, (X, σ) is minimal. It suffices to show that for any yX, {σny}nZ=X.

Choose any yX and wL(X). By the definition of X, there exists k and some

Ak word wk such that w is a subword of wk. wk is an Ak-word, so by definition,

every Ak+1-word contains wk, and therefore w, as a subword. Finally, note that again

by the definition of Construction 1, y is an biinfinite concatenation of Ak+1 words

18

This implies that y(1). . . y(2nk+1) contains some Ak+1 word, and therefore w, as a

subword, and so there exists nZ so that σny begins with w. Since w was an

arbitrary word in L(X), this implies that {σny}nZ=X, and so (X, σ) is minimal.

So, we have now demonstrated a way of constructing minimal (X, σ). We will now

make this construction a bit more complex in order to make (X, σ) totally minimal.

Construction 2: (Totally minimal) We define inductively nk and Ak, which are,

respectively, sequences of positive integers and sets of words on the alphabet {0, 1}.

Each word in Ak is of length nk. We define these as follows: always define n1=1

and A1={0,1}. Then, for any k>1, nk+1 is defined to be any integer greater than

or equal to (k!)2nk|Ak|+k!+nk2 , and then Ak+1 is chosen to be the set of words w of

length nk+1 which are concatenations of Ak-words and the word 1 with the following

properties: the word 1 does not appear at the beginning or end of w, only a single

1 can be concatenated between two Ak-words, and for every wAk, and for every

0<i<k!, w appears in w at an i (mod k!)-indexed place. That is, there exists

mi (mod k!) with w(m)w(m+1) . . . w(m+nk-1) =w. From now on, to refer

to this second condition, we say that every wAk occurs in w at places indexed by

all residue classes modulo k!.

Since this construction is a bit complicated, a few quick examples may be in or-

der. Suppose that n2=6, |A2|=4, and we choose n3=134. Say that A2=

{a, b, c, d}.Thenw=abcd1abcd1dabcdabcabcdab is an A3-word: each A2 word ap-

pears at least once beginning with a letter of w with an odd index, and at least

19

once beginning with a letter of w with an even index. Examples of words which

would not be A3-words include abcdllabcddabcdababcabcd (1 is concatenated twice

between d and a), abcdo lbcdldbcdaabcbbbbdc (occurrences of the word a begin only

with even-indexed letters), or abcdldcbabcdabcdabcdadb (wrong number of letters.)

We can then define X to again be the set of all xΩ which are limits of shifted

Ak-words. For this definition to make sense, it suffices to show that Ak is nonempty

for all k, since if this is true then X by compactness of Ω and a diagonalization

argument just as in Construction 1. A1 is nonempty, so it suffices to show that Ak

implies Ak+1 for all k. For any k, assume that wkAk. Then, enumerate the

elements of Ak by wk=a1,a2,...,a|Ak|,ankddnk+1-k1(k1nk|Ak|+1)-i(n+1)nk efine the words uk+1=a1k!a2k! . . . a|Ak|k!

and w=(uk+11)k!(a11)ia1 , where ink+1-k! (mod nk). Since

nk+1>(k!)2nk|Ak|+k!+nk2, w exists, and is a concatenation of Ak-words and the

word 1 with length nk+1. In w, at most a single 1 is concatenated between any two

Ak-words and 1 does not appear at the beginning or end of w. Also, since the length

of uk+1 is divisible by k!, and since all Ak-words are subwords of uk+1, all Ak words

appear in (uk+11)k! at places indexed by all residue classes modulo k!, and so all

Ak words appear in w at places indexed by all residue classes modulo k! as well.

Therefore, wAk+1, and so Ak+1 is nonempty.

We will show that (X, σ) is totally minimal. Fix any m>0. We wish to show that

for any yX, {σmny}nZ=X. Choose any such y, and fix any word wL(X).

By definition of X, there exists k and an Ak word wk such that w is a subword of

wk. Without loss of generality, we assume that k>m. By the construction, wk

20

occurs in every Ak+1-word, and it occurs at places indexed by every residue class

modulo k!. Since k>m, in particular this implies that wk, and therefore w, occurs

in every Ak+1-word at places indexed by every residue class modulo m. By definition

of Construction 2, y is a biinfinite concatenation of Ak+1-words and the word 1.

Therefore, y(1). . . y(2nk+1+2) contains some Ak+1-word as a subword, and so w

occurs in y(1). . . y(2nk+1+2) at places indexed by every residue class modulo m.

There then exists n so that σmny begins with w. Since w was an arbitrary word of

L(X), this shows that {σmny}nZ=X, and since m was arbitrary, that (X, σ) is

totally minimal.

We now define one more general type of construction, again more complex than the

last, so that the system created is always totally uniquely ergodic and topologically

mixing, in addition to being totally minimal. For this last construction, we first need

a couple of definitions.

Definition 2.2.6. For any integers 0<i<m and k, and wAk-1 and w

Ak, we define fri,m* (w, w) to be the ratio of the number of occurrences of w as a

concatenated Ak-1-rvord at i (mod m)-indexed places in w to the total number of

Ak-1-rvords concatenated in w.

We consider any positive integer to be equal to 0 (mod t) for the purposes of this

definition. An example is clearly in order: if A1={01, 10}, w=01, (in Construc-

tions 2 and 3, A1 is always taken to be {0, 1}, but here we deviate from this for

21

illustrative purposes) and w is the A2 word Ot |10|1|01 |10 (here vertical bars show

where breaks in the concatenation occur), then w occurs twice out of four A1-words,

so fr0,1* (w, w) =12. Since one of these occurrences begins at w(1) and one begins at

w(6), fr0,2*(w, w)=fr1,2*(w, w)=14. We make a quick note here that there could

be some ambiguity here if an Ak+1-word could be decomposed as a concatenation of

Ak words and ones in more than one way. For this reason, we just assume that when

computing fri,j* (w, w) , the definition of the Ak+1 word w includes its representation

as a concatenation of Ak-words and ones. (i.e. in the example given, w is defined

as the concatenation Of |10|1|01 |10 of A1-words and ones, rather than the nine-letter

word 011010110.)

Definition 2.2.7. Given any words w of length n and w of length n<n, and any

integers 0<i<m, define fri,m (w, w) to be the number of occurrences of w at i

(mod m)-indexed places in w, divided by n-n+1.

Taking the previous example again, fr0,1(w, w)=38, since 01 occurs three times as

a subword of 011010110. Since two of these occurrences begin at letters of w with

even indices and one begins at a letter of w with odd index, fr0,2(w, w)=28 and

fr1,2(w, w)=18.

Construction 3: (Totally minimal, totally uniquely ergodic, and topologi-

cally mixing) We define inductively nk and Ak, which are, respectively, sequences

of positive integers and sets of words on the alphabet {0, 1}. Each word in Ak is of

length nk. We define these as follows: always define n1=1 and A1={0,1}. Then,

we fix any sequence {dk} of positive reals such that k=1dk< and define, for each

22

k>1, some nk+1=Ck(k+1)!|Ak|nk+p for any integer Ck>nk>1dk and prime

nk<p<2nk (We may choose such a p by Bertrand's postulate. [HaW]) Note that

this implies that (nk, k!)=1 for all kN. We then define Ak+1 to be the set of words

w of length nk+1 with all of the same properties as in Construction 2, along with the

property that, for any wAk, and for any 0<i<k!, fri,k!*(w, w)[1-dkk!|Ak|, 1+dkk!|Ak|].

X is again taken to be the set of xΩ which are limits of shifted Ak words For

this definition to make sense, it suffices to show that Ak is nonempty for all k, since

if this is true then X by compactness of Ω and a diagonalization argument

just as in Construction 1. A1 is nonempty, so it suffices to show that Ak

implies Ak+1 for all k. For any k, assume that wkAk. Then, enumerate the

elements of Ak by wk=a1, a2, ..., a|Ak|, and define the words uk+1=a1k!a2k! . . . a|Ak|k!

and w= (uk+1)Ck(k+1)-p(uk+11)p. Clearly for large k, w exists, and is a concatenation

of Ak-words and the word 1 with length nk+1. In w, at most a single 1 is concatenated

between any two Ak-words and 1 does not appear at the beginning or end of w. Since

(nk, k!)=1, for every 0<i<k! and xAk, x appears in uk+1 exactly once as a

concatenated Ak-word at an i (mod k!)-indexed place. Therefore, x appears in w

exactly Ck(k+1) times as a concatenated Ak-word at i (mod k!)-indexed places, and

so fri,k!*(x, w)=1|Ak|nkk! . Since i and x were arbitrary, wAk+1, and so Ak+1.

(X, σ) is totally minimal by the same proof used for Construction 2. We claim that

(X, σ) is also totally uniquely ergodic. Take any word wL(X), and any fixed integer

j. We define two sequences {mk(j)} and {Mk(j)} as follows: mk(j) is the minimum value

23

of fri,j(w, w) , where 0<i<j and w ranges over all Ak word and Mk(j) is the

maximum value of fri,j (w, w) , where 0<i<j and w ranges over all Ak-words.

Suppose that mk(j) and Mk(j) are known, and that k>j. We wish to show that amk+1(j)

and Mk+1(j) are very close to each other. Let us consider any element w of Ak+1 and,

for any fixed 0<i<j, see how few occurrences of w there could possibly be at i

(mod j)-indexed places in w. By the definition of Construction 3, for every wAk,

and 0<i<k!, the ratio of the number of times w occurs as a concatenated Ak word

in w whose first letter is a letter of w whose index is equal to i (mod k!) to the total

number of Ak-words concatenated in w is at least 1-dkk!|Ak|. Since j divides k!, then for

any 0<i<j, the ratio of the number of times that w occurs as a concatenated Ak-

word at i (mod j)-indexed places in w to the total number of Ak-words concatenated

in w is at least 1-dkj|Ak|. Since the total number of Ak-words concatenated in w is at

least nk+1nk+1, this implies that the number of such occurrences of w in w is at least

1-dkj|Ak|nk+1nk+1 for any i and w. For any w and i, the number of times that w occurs at i

(mod j)-indexed places in w as a subword of an occurrence of w that occurs at an i

(mod j)-indexed place in w is then at least 1-dkj|Ak|nk+1nk+1(nk-|w|+1)fri-i(mod j),j(w, w).

Summing over all wAk and 0<i<j, the number of occurrences of w in w at i

(mod j)-indexed places is at least

(1 -dk)nk+1nk-|w|+1nk+1wAkm=0j-1frm,j(w,w)j|Ak|.

Since w was arbitrary in Ak+1 and 0<i<j was arbitrary,

24

mk+1(j)> (1 -dk)nk+1nk+1-|w|+1nk-|w|+1nk+1wAkm=0j-1frm,j(w,w)j|Ak|.

Let us now bound from above the number of occurrences of w in w at i (mod j)-

indexed places. By precisely the same reasons as above, for any 0<i<j, the

number of occurrences of w at i (mod j)-indexed places which lie entirely within a

concatenated Ak-word in w is not more than

(1 +dk)nk+1nk-|w|+1nkwAkm=0j-1frm,j(w,w)j|Ak|.

(The denominator of the first fraction changed because there are at most nk+1nkAk-

words concatenated in w.) However, it is possible that there are occurrences of w

in w which do not lie entirely within a concatenated Ak-word in w. The number

of such occurrences of w is not more than |w|+1 times the number of concatenated

Ak words in w, which in turn is less than or equal to (|w|+1) nk+1nk. This means that

the number of occurrences of w at i (mod j)-indexed places in w is bounded from

above by

(1 +dk)nk+1nk-|w|+1nkwAkm=0j-1frm,j(w,w)j|Ak|+(|w|+1)nk+1nk,

and since 0<i<j was arbitrary, this implies that

Mk+1(j)< (1 +dk)nk-|w|+1nkwAkm=0j-1frm,j(w,w)j|Ak|nk+1

nk+1-|w|+1

+nk+1|w|+1.

nk+1-|w|+1 nk

This implies that

25

Mk+1(j)-mk+1(j)<2dknk+1nk+1-|w|+1nk-|w|+1nkwAkm=0j-1frm,j(w,w)j|Ak|

+nk+1|w|+1.

nk+1-|w|+1 nk

Since frm,j(w, w)<1 for every 0<m<j and wAk, for large k this shows that

Mk+1(j)-mk+1(j)<2dk+2(|w|+1)nk, which clearly approaches zero as k.

We now note that since

mk+1(j)> (1 -dk)nk+1nk+1-|w|+1nk-|w|+1nk+1wAkm=0j-1frm,j(w,w)j|Ak|,

and since by definition frm,j(w, w)>mk(j) for all wAk, we see that mk+1(j)>

(1 -dk)nk+1nk+1-|w|+1nk-|w|+1nk+1mk(j), and so

mk+1(j)-mk(j)>mk(j)[(1-dk)(1+|w|nk+1-|w|+1)(1-|w|-1nk+1)-1]

>-mk(j)(dk+|w|nk+1)>-(dk+|w|nk).

By almost completely analogous reasoning, for large k

Mk+1(j)-Mk(j)<Mk(j)[(1+dk)(1+|w|-1nk+1-|w|+1)(1-|w|-1nk)-1]

+nk+1nk+1-|w|+1|w|+1nk<Mk(j)dk+2(|w|+1)nk<dk+2|w|+1nk.

Therefore,

mk+1(j)<Mk+1(j)<Mk(j)+dk+2|w|+1nk<mk(j) a 2dk-1+2(|w|+1)nk-1+dk+2|w|+1nk

26

<mk(j)+2dk-1+dk+4|w|+3nk-1,

so |mk+1(j)-mk(j)|<dk+2dk-1+4|w|+3nk-1. In a completely analogous fashion, |Mk+1(j)-

Mk(j)|<dk+2dk-1+3|w|+2nk-1. We know that k=1dk converges, and since nk>2k for

all k, k=11nk converges as well. Therefore, we see that the sequences {mk(j)} and

{Mk(j)} are Cauchy, and converge. Since we also showed that Mk(j)-mk(j)0, we

know that they have the same limit, call it (y.

This implies that for very large k, |fri,j(w, w)-α| is very small for every 0<i<j and

wAk. We claim that this, in turn, implies that for very large N, |fri,j(w, w)-α| is

very small for every word wL(X) of length N: fix any ε>0, and take k such that

|fri,j(w, w)-α|<: for every 0<i<j and wAk, and such that 1+|w|nk<ε4. Then

for any word wL(X) of length at least 8nkε, w is a subword of a concatenation

of Ak-words and copies of the word 1. The number of full Ak-words appearing in

the concatenation formi ng w iiss aatt lleeaasstt |w|nk+1-2, and aatt mmoosstt |w|nk. So, the number

of occurrences of w at i (mod j)-indexed places in w which are contained entirely

within a concatenated Ak-word is at least ((nk-|w|+1)|w|nk+1-2(nk-|w|+1))(α-ε2) >

|w|((1-ε4)-ε4)(α-ε2)>|w|(α-ε), and at most |w|nk-|w|+1nk(α+ε2)<|w|(α+ε2).

Since there are at most (|w|+1)|w|nk<|w|ε4 occurrences of w not contained entirely

within a concatenated Ak-word, this implies that fri,j(w, w) is at least (y -ε, and

at most a +ε. Since for any ε>0, this statement is true for any long enough word

wL(X) and 0<i<j, we see that 1ni=0n-1χ[w](σijy) a uniformly for yX.

Since w was arbitrary, and since characteristic functions of cylinder sets are dense in

C(X), 1ni=0n-1f(σijy) approaches a uniform limit for all fC(X), and so (X, σj)

27

is uniquely ergodic for every j N. Since an invariant measure for (X, σ) would be

invariant for any (X, σj) as well, the unique invariant measure is the same for every

j.

Finally, we claim that (X, σ) is also topologically mixing. Consider any words w, w

L(X). By construction, there exists k so that there are Ak words y, y with w a

subword of y and w a subword of y. We also claim that for any nk+16<i<5nk+16, there

exists an Ak+1 word bi where bi(i+1)bi(i+2) . . . bi(i+nk)=y, and similarly biAk+1

with bi(i+1)bi(i+2) . . . bi(i+nk)=y. We show only the existence of bi, as the proof for

bi is trivially similar. Consider any nk+16<i<5nk+16, and take j=i (mod nk). Then,

if we enumerate the elements of Ak by a1, a2, ..., a|Ak|, first define the word uk+1=

a1k!a2k!... a|Ak|k!, and then define the word yi=(uk+11)j(uk+1)Ck(k+1)-p(uk+11)p-j. yi

has the property that yi(i+1)yi(i+2) . . . yi(i+nk) is a concatenated Ak-word in yi as

long as yi(i+1)yi(i+2) . . . yi(i+nk) lies in the subword (uk+1)Ck(k+1)-p of yi, which is

true for large k and i(nk+16, 5nk+16). This means that if we reorder a1, ..., a|Ak| in the

definition of uk+1, we may create a word bi where bi(i+1)bi(i+2) . . . bi(i+nk)=y.

biAk+1, since for every 0<i<k! and xAk, x appears exactly C(k+1)

times in bi as a concatenated Ak-word at i (mod k!)-indexed places, implying that

fri,k!*(x, bi)=1|Ak|nkk! (This uses the fact that ( nk, k!)=1, which was already shown.)

We create bi in the same way for each i. Since w is a subword of y and w is a subword

of y, for every nk+16+nk<i<5nk+16-nk, it is easy to choose a word zi to be bj

for properly chosen j so that zi(i+1) . . . zi(i+|w|)=w, and similarly zi so that

zi(i+1) . . . zi(i+|w|)=w. For large k, this means that we can construct such zi

and zi for any i[nk+15, 4nk+15].

28

We will now use these zi and zi to prove that for any n>|w|+nk+1, there exists a

word xL(X) of length n such that rvxrv' L(X). We do this by proving a lemma:

Lemma 2.2.8. For any t >k+1, and for any 0<i, j <nt such that there exists an

At word x where x(i+1)x(i+2) . . . x(i+nk+1) and x(j+1)x(j+2) . . . x(j+nk+1) are

concatenated Ak+1-rvords in x, and for any trvo Ak+1 words z and z, there exists an

At word x where x(i+1)x(i+2)... x(i+nk+1)=z and x(j+1)x(j+2) . . . x(j+

nk+1)=z.

Proof: We prove this by induction. First we prove the base case t=k+2; take an

Ak+2 word x where x(i+1)x(i+2) . . . x(i+nk+1) and x(j+1)x(j+2) . . . x(j+nk+1)

are concatenated Ak+1-words in x, call them a and b respectively. Since x is an Ak+2-

word, there exists an occurrence of z at an ( i (mod (k+1)!)) (mod (k+1) !)-indexed

place, i.e. there exists ii (mod (A +1)!) such that x(i+1)x(i+2) . . . x(i+

nk+1)=z. Similarly, there exists jj (mod (A +1)!) such that x(j+1)x(j+

2) . . . x(j+nk+1)=z. We now create x by leaving almost all of x alone, but defining

x(i+1) . . . x(i+nk+1)=z, x(j+1) . . . x(j+nk+1)=z, x(i+1) . . . x(i+nk+1)=a,

and x(j+1) . . . x(j+nk+1)=b. This new word x is still a concatenation of Ak+1-

words and ones, and since we switched two pairs of Ak+1-words which occurred at

indices with the same residue class modulo (k+1)!, fri,(k+1)!*(w, x)=fri,(k+1)!*(w, x)

for all 0<i<(k+1)! and wAk+1. Therefore, x is an Ak+2-word, with z and z

occurring at the proper places, completing our proof of the base case.

Now, let us assume that the inductive hypothesis is true for a certain value of t,

and prove it for t+1. Consider an At+1 word x where x(i+1) . . . x(i+nk+1) and

29

x(j+1) . . . x(j+nk+1) are concatenated Ak+1-words in x, call them a and b. Call

the concatenated At-word that x(i+1) ... x(i+nk+1) is a subword of a, and denote

by b the corresponding At-word for x(j+1) . . . x(j+nk+1) b. From now on, when

we speak of these words a, b, a, b, we are talking about the pertinent occurrences at

the places within x already described. There are two cases; either a and b are the

same; i.e. the same At-word in x, occurring at the same place, or they are not. If a

and b do occur at the same place, then by the inductive hypothesis, there exists an

At-word c with an occurrence of z at the same place as a occurs in a=b, and an

occurrence of z at the same place as b occurs in a=b. If we can replace a=b

by c in x, then we will be done. If a and b do not occur at the same place, then

since a is a concatenated Ak+1-word in a, by the inductive hypothesis there exists

an At-word a such that a has z occurring at the same place where a occurs in a.

Similarly, there exists an At-word b such that b has an occurrence of z at the same

place where b occurs in b. If we replace a by a and b by b in x, then we will be

done. So regardless of which case we are in, our goal is to replace one or two chosen

At-words within x with one or two other At-words. We will show how to replace

two, which clearly implies that replacing one is possible. We wish to replace a by

a and b by b. We do this in exactly the same way as in the base case; say that

a=x(i+1) . . . x(i+nt) and b=x(j+1) . . . x(j+nt). Since aAt, there

exists i=i (mod t!) and j=j (mod t!) such that x(i+1) . . . x(i+nt)=a

and x(j+1) . . . x(j+nt)=b. As in the base case, we create x by making

x(i+1) . . . x(i+nt)=a and x(i+1) . . . x(i+nt)=a, x(j+1) . . . x(j+nt)=b,

30

and x(j+1) . . . x(j+nt)=b. Then x is an At+1- word, and by construction

x(i+1) . . . x(i+nk+1)=z and x(j+1) . . . x(j+nk+1)=z.

Choose any sequence {vm} of Am-words for all m>k+1. For any such m, take

Pm={ n : vm(n+1) ... vm(n+nk+1) is a concatenated Ak+1 word in vm }. Since

vm is a concatenation of Ak+1-words and ones, if we write the elements of Pm as

p1(m)<p2(m)<...<pt(m), then for any 1<l <t, pl+1(m)-pl(m)< a k+1+1. For any

1<l <t, and i, j[nk+15, 4nk+15], by Lemma 2.2.8, there exists an Am word v with the

property that v(p1(m)+1) ... v(p1(m)+nk+1)=zi and v(pl(m)+1) ... v(pl(m)+nk+1)=zj.

This implies that there is a subword of v of the form rvxrv' where the length of x is

pl(m)-p1(m)+(j-i)-|w|. We note that j-i can take any integer value between -3nk+15

and 3nk+15 inclusive. Therefore, the set of possible lengths of x for which rvxrv' L(X)

contains

l=2t{(|w|+pl(m)-p1(m)+[-3nk+15, 3nk+15])}.

When l is increased by one, pl(m) is increased by at most nk+1+1. This, along with

the fact that the intervals [-3nk+15, 3nk+15] have length 6nk+15, which for large k exceeds

nk+1+1, implies that this set of possible lengths of v contains [ |w|+p2(m)-p1(m) -

3nk+15, |w|+pt(m)-p1(m)+3nk+15][|w|+nk+1, |w|+nm-2nk+1]. Since this entire

argument could be made for any m, we see that for any n>|w|+nk+1, there exists

xL(X) of length n so that rvxrv' L(X). Then, for any nonempty open sets

31

U, VX, there exist w and w such that [w]U and [w]V. By the above

arguments, there exists N so that for any n>N, [w]σn[w], implying that

UσnV. This shows that (X, σ) is topologically mixing.

2.3 Some symbolic counterexamples

Proof of Theorem 2.1.15: We take the continuous function f(y)=y(0) for all

yX, and first note that

{yX : 1Nn=0N-1f(σpny) does not converge}

(n>0k>n{yX : fr0,1 (0, y(p1)y(p2). . . y(pk))<14})

(n>0k>n{yX : fr0,1(0, y(p1)y(p2)...y(pk))>34}),

and that the latter set, call it B, is clearly a Gδ. We will choose nk so that B is dense

in X. This will imply that B is a dense Gδ, and since X is a complete metric space,

by the Baire category theorem, that B is residual, which will prove Theorem 2.1.15.

Now let us describe the construction of {nk}.

Recall that we have assumed that the sequence {pn} has upper Banach density zero.

We define A:={pn :nN}. We also define the intervals of integers Bj=

[2j!, (j+1)!]N for every jN, and take any partition of N into infinitely many

disjoint infinite sets in N, call them C1, C2, . . .. Define the set D1=jC1Bj, and

32

then define the set A1={pn : nD1}+1. Next, choose some r2 large enough so that

(minCr2)!>2ċ2, and define D2=jCr2Bj, and then define A2={pn : nD2}+2.

Continuing in this way, we may inductively define Ak, Dk for all kN so that for

1ll k, Dk=jCrkBj for some rk with the property that (minCrk)! >2k, and

Ak={pn :nDk}+k. We will verify some properties of these sets. Most

importantly, we denote by H the union n=1An, and claim that d*(H)=0. We

show this by noting that H has a certain structure; H consists of shifted subintervals

of A, separated by gaps which approach infinity. More rigorously:

Lemma 2.3.1. There exist intervals Ik=[ak, bk]N and integers jk such that

H=k=1((AIk)+jk) and such that limk(min((AIk+1)+jk+1)-max((A

Ik)+jk))=.

Proof: Take the set Q=k=1Crk, and denote its members by q1<q2< . . ..

Then, for any k, Bqk is a subset of some Ds. The interval Ik is then defined to be

[p2(qk)!, p(qk+1)!)], (which means ak=p2(qk)! and bk=pqk(qk)!) and jk is defined to be s.

It is just a rewriting of the definition of the Ak that H=k=1((AIk)+jk) with

these notations. All that must be checked is that limk(min((AIk+1)+jk+1)-

max((AIk)+jk))=. We will show that ak+1+jk+1-bk-jk, which

implies the desired result. Since qk+1>qk, ak+1-bk=(qk+1)!>(qk)!. So, we must

simply show that (qk)!-jk. Suppose that Bqk is a subset of Ds. Then jk=s.

We also note that by construction, (minCrs)!>2s. But, since BqkDs, qkCrs,

and so minCrs<qk. Therefore, (qk)!>2s, and so (qk)!-jk=(qk)!-s>(qk)!2 , which

33

clearly shows that this quantity approaches , since {qk} is an increasing sequence

of integers.

We now prove a general lemma that implies, in particular, that d*(H)=0.

Lemma 2.3.2. If d*(A)=0, and if there exist intervals Ik=[ak, bk]N and integers

jk such that limk ( min ((AIk+1)+jk+1)-max((AIk)+jk))=, then the

set B=k=1(AIk)+jk has upper Banach density zero.

Proof: Fix ε>0. By the fact that d*(A)=0, there exists N such that for any interval

J of integers of length at least N, |AJ||J|<ε. Take J to be any interval of integers of

length exactly N. Since limk(min((AIk+1)+jk+1)-max((AIk)+jk))=,

there is some K such that if J has nonempty intersection with (AIk)+jk for some

k>K, it is disjoint from (AIk)+jk for every kk. Therefore, for intervals

J of integers of length N with large enough minimum element, JB consists of a

subset of a shifted copy of JA, and so |BJ||J|<|AJ||J|, for some interval J of integers

whose length is also N. This means that in this case, |BJ||J|<ε. We have then shown

that for every ε, there exist N, M such that for any interval of integers J of length

N with minJ>M, |BJ||J|<ε. We will show that this slightly modified definition

still implies that d*(B)=0. Again fix ε>0, and define M and N as was just done.

Now consider any interval of integers I with length at least N+Mε. Then, partition I

into subintervals: define I0=I {1, ..., M}, and then break II0 into consecutive

subintervals of length N, called I1, I2, ..., Ik. There may be one last subinterval left

34

over of length less than N; call it Ik+1 (which may be empty.) Note that |I|>Nk,

or N|I|<1k. We see that

|BI||I|=|BI0||I|+i=1k|BIi||I|+|BIk+1||I|<M|I|+N|I|(i=1k|BIi||Ii|)+N|I|

<M+N|I|+1k(kε)<2ε.

Since ε was arbitrary, d*(B)=0.

By combining Lemmas 2.3.1 and 2.3.2, d*(H)=0. We will now create X. We note

that this part of our construction uses only the fact that d*(H)=0, and no other

properties. We take n1=1 and A1={0,1}. We recall that {nk} must be a sequence

of integers with the following properties: for all k>1, nk+1=Ck(k+1)!|Ak|nk+p

for some positive integer Ck>nk>1dk and prime nk<p<2nk. We also require nk

to grow quickly enough so that for all k, and for any interval of integers I of length

at least nk+1, |IH||I|<dk2k!|Ak|nk. That we may choose such nk is a consequence of the

fact that d*(H)=0. Using these nk, we define Ak as in Construction 3. We now

prove a lemma:

Lemma 2.3.3. For any k N, m Z, and for any u {0, 1}N, there exists an

Ak-rvord vu,k,m such that vu,k,m(i-m) =u(i) for all i H[m+1, m+nk].

Proof: This is proved by induction on k. Clearly the hypothesis is true for k=1

and for any u, m. Now suppose it to be true for a particular k. We will show

35

that it is true for k+1 and every u, m. We again construct an auxiliary word

uk+1 : enumerate the elements of Ak by a1, a2, ..., a|Ak|. Then, we again define the

word uk+1=a1k!a2k!... a|Ak|k!. Define vk+1= (uk+11)p(uk+1)Ck(k+1)-p, where nk+1=

Ck(k+1)!|Ak|nk+p as above. We note that for any 0<i<k! and wAk,

w occurs exactly Ck(k+1) times as a concatenated Ak-word at i (mod k!)-indexed

places in vk+1. (This uses the fact that (nk, k!)=1 for all k, which has already

been shown.) Now, fix mZ. We wish to construct an Ak+1 word vu,k+1,m such

that vu,k+1,m(i-m)=u(i) for all iH[ m+1, m+ a k+1]. We begin with the

Ak+1 word vk+1. Clearly it is not necessarily true that vk+1(i-m) =u(i) for all

iH[m+1, m+nk+1]. We force this condition to be true by changing some of the

Ak-words concatenated in vk+1. We show that this is possible; for any concatenated

Ak- word in vk+1, say vk+1(j)vk+1(j+1) ... vk+1(j+nk-1), the necessary condition is

that ones or zeroes (depending on u) be introduced at digits whose indices are of the

form i-m for all iH[ m+j+1, ..., a +j+nk-1]. To do this, we replace this Ak-

word by vu,k,m+j, which by the inductive hypothesis has the correct digits of u at the

desired places. So, we may change vk+1 into a concatenation of Ak-words and ones,

call it vu,k+1,m, which has the proper digits of u in all desired places. This may be done

by changing at most |H[m+1, ..., m+nk+1]|<dknk+12k!|Ak|nk<Ck(k+1)dkAk-words.

Therefore, since for any 0<i<k! and wAk, w occurred in vk+1 as a concatenated

Ak-word at i (mod k!)-indexed places exactly Ck(k+1) times, w occurs in vu,k+1,m as

a concatenated Ak-word at i (mod k!)-indexed places between Ck(k+1)(1-dk) and

Ck(k+1)(1+dk) times. This implies that fri,k!*(w, vk+1,m)[1-dkk!|Ak|, 1+dkk!|Ak|], and since

36

i and w were arbitrary, that vk+1,m is an Ak+1-word. By induction, Lemma 2.3.3 is

proved.

This implies in particular that for every u, k there exists an Ak-word vu,k,-nk-1 with

vu,k,nk-1(i+nk-1)=u(i) for all iH(-nk-1, ..., nk-nk-1]. By a standard

diagonalization argument, there exists a sequence {kj} and xΩ such that for

all iZ, x(i)=vu,kj,-nkj-1(i+nkj-1) for all large enough j. Since for every j,

vu,kj,-nkj-1(i+nkj-1)=u(i) for all iH(-nkj-1, nkj-nkj-1], clearly x(i)=u(i)

for all iH. Since x is a limit of shifted Ak-words, xX. As mentioned above,

this entire construction could be done with any set of zero upper Banach density in

place of H, which lets us state the following corollary:

Corollary 2.3.4. For any C N with d*(C)=0, there exists (X, σ) totally mini-

mal, totally uniquely ergodic, topologically mixing, and with the property that for any

sequence u {0, 1}N, there exists xuX such that xu(i)=u(i) for all i C.

We use Corollary 2.3.4 to create (X, σ) which proves Theorem 2.1.15. Recall that

H=k=1Ak, where Ak={pn+k :nDk} for all k, Dk=jCrkBj for some

rk, and Bj=[2j!, (j+1)!]Z for all j. For each k, we write the elements of Crk

in increasing order as crk(1), crk(2), .... We now decompose H into two disjoint subsets;

define Ho={ mH : m=pn+k for some nBj where j=crk(i) for odd i},

and He= { mH : m=pn+k for some nBj where j=crk(i) for even i}.

Since d*(H)=0, we use Corollary 2.3.4 to create a totally minimal, totally uniquely

37

ergodic, and topologically mixing (X, σ) and xX with x(n)=0 for nHo and

x(n)=1 for nHe. Recall that we wish to show that for the continuous function

f : yy(0) from X to {0, 1}, the set of points y such that 1Nn=0N-1f(σpny) does

not converge is residual. We showed earlier that it is sufficient to show that the set

B=( n>0k>n{yX : fr0,1(0, y(p1)y(p2)...y(pk))<14})

(n>0k>n{yX : fr0,1(0, y(p1)y(p2)...y(pk))>34})

is dense in X. Fix any jZ. By the construction of H, {n : j+pnH}=Dk

for some k, x(i)=0 for all iHo, and x(i)=1 for all iHe. In particular,

x(j+pn)=0 for all n in Bcrk(i) for odd i and x(j+pn)=1 for all n in Bcrk(i) for even i.

But then for any odd integer i, (σjx)(pn)=0 for all integers n[2(crk(i))!, (crk(i)+1)!],

and so

fr0,1(0, (σjx)(p1)(σjx(p2)...(σjx)(p(crk(i)+1)!))>crk(i)-1crk(i)+1,

which is clearly larger than 34 for sufficiently large k. Similarly, for any even integer

i, (σjx)(pn)=1 for all integers n[2(crk(i))!, (crk(i)+1)!], and so

fr0,1(0, (σjx)(p1)(σjx)(p2)...(σjx)(p(crk(i)+1)!))<2crk(i)+1,

which is clearly less than 14 for sufficiently large k. Therefore, σjxB. Since jZ

was arbitrary, the orbit of x is a subset of B. Since X is minimal, B is dense, which

completes the proof of Theorem 2.1.15.

38

We note that this proof in fact shows that the set

(n>0k>n{yX : fr0,1(0, y(p1)y(p2)...y(pk!))<2k})

(n>0k>n{yX : fr0,1(0, y(p1)y(p2)...y(pk!))>k-2k})

is a residual set in X, and so we can also say that for a residual set of yX,

liminfN1Nn=0N-1f(σpny)=0=infxXf(x) and limsupN1Nn=0N-1f(σpny)=

1 =supxXf(x).

Proof of Theorem 2.1.18: We use Corollary 2.3.4. Consider any set C={pn}nN

with d*(C)=0. Choose any set CN with (C+1) C, 1C, d*(C)=0,

and |C(C+1) |=. Denote the elements of C(C+1) by b1<b2<. . ..

By Corollary 2.3.4, we may construct a totally minimal, totally uniquely ergodic,

topologically mixing (X, σ) with the property that for every u{0,1}N, there exists

xuX with xu(i)=u(i) for all iC. For every v{0,1}N, define some uv

{0,1}N by uv(1) =0, uv(a)=1 for all aC+1, and uv(bk)=v(k) for all k C.

Then, for any such v, xuv(i)=1 for all iC+1, xuv(1) =0, and xuv(bk)=v(k)

for all kZ. Since for all nN, pn+1 C+1, (σpnxuv)(1)=xuv(pn+1) =1,

whereas xuv(1) =0. It is then clear that xuv is not a limit point of {σpnxuv}. Since

xuv(bk)=v(k) for all k N, xuvxuv, for vv, and so the set {xuv}v{0,1}N is

uncountable.

39

It is natural to wonder about one aspect of the proof; why is it that in our construc-

tion, we only force certain digits to occur along shifted subsets of A, rather than

along entire shifted copies of A? The reason comes from a combinatorial fact which

is somewhat interesting in its own right:

Example 2.3.5. There exists a set DN with d*(D)=0 with the property that for

any infinite set G of integers, the set D+G={d+g :d D, g G} has upper

Banach density one.

Proof: We begin with the sequence dn=3|n|2, where |n|2 is the maximal integer

k so that 2k|n. So, {dn} begins 1, 3, 1, 9, 1, 3, 1, 27, 1, 3, 1, 9, 1, 3, 1, . . . Then, define

cn=i=1ndn. {cn} is then an increasing sequence of integers, with the property that

the nth gap cn+1-cn is dn+1 for all n. We first claim that d*({cn})=0. Choose any

positive integer k, and any 2k+1 consecutive elements ci, ..., ci+2k of the sequence

{cn}. There must be some integer j[0,2k -1] such that 2k|i+j. This means

that 3k|di+j, and so that ci+2k-ci=m=ii+2k-1dm>di+j>3k. This means that

any interval of integers of length less than 3k can contain at most 2k elements of

{cn} for any k, which implies that d*({cn})=0 since limk2k+13k=0. Therefore,

if we construct a new sequence by increasing the gaps {dn}, it will still have upper

Banach density zero. We will change {dn} countably many times, never decreasing

any element. In other words, we inductively construct, for every kN, a sequence

{dn(k)}, so that these sequences are nondecreasing in k, i.e. dn(k)>dn(k-1) for all n, k.

We add the additional hypothesis that for every k, d({n : dn(k)>dn})=0, in other

40

words that only a density zero subset of the elements of {dn} have been changed after

any step.

Step 1: We change dn for some infinite, but density zero, set of n, so that for every

positive integer m, there exists n such that dn=m. This is clearly possible; for

example, by increasing dn for a density zero set of odd n. Call the resulting sequence

{dn(1)}.

Step k: (k >1) Assume that we have already defined {dn(k-1)}, a sequence of integers

with the property that dn(k-1)>dn for all n, and that d({n :dn(k-1)>dn})=0.

Define the set RkNk by Rk= {(a1, ..., ak) :aiN, ai>3k}. Since { n :

dn(k-1)>dn} has density zero, there exist infinitely many intervals of integers Ij such

that |Ij|>2k+1 for all j, and so that dn(k-1)=dn for all nIj for any j. Therefore,

by the construction of the sequence {dn}, each Ij contains a subinterval Ij of integers

of length 2k so that for every j, and for all nIj, dn(k-1)<3k. We may also assume,

by passing to a subset if necessary, that the union of all Ij has density zero. We now

take any bijection φ from N to R2k, and for every j, if Ij={s, s+1, ..., s+2k-1},

and if φ(j)=(a1, ..., a2k)R2k, define ds(k)=a1, ds+1(k)=a2, ..., ds+2k-1(k)=a2k. After

making these changes on each Ij, for any m not in any Ij, define dm(k)=dm(k-1). This

defines dn(k) for all n. Since for every n, dn(k)>dn(k-1), by the inductive hypothesis we

see that dn(k)>dn for all n. Also, since d({n : dn(k)>dn(k-1)})=0, and since by the

inductive hypothesis, d({n : dn(k-1)>dn})=0, we see that d({n : dn(k)>dn})=0,

completing the inductive step.

It is a consequence of this construction that if we define Hk={n :dn(k)>dn(k-1)}

41

for every k, the sets Hk are pairwise disjoint. Therefore, the sequences {dn(k)} have

a pointwise limit, call it {en}. By the construction, for any k, and for any k-tuple

(a1, ..., ak) of integers all greater than 3k, there exists m such that dm+i-1(k)=ai for

1<i<k. And, since the Hk are disjoint, this means that em+i-1=ai for 1<i<k.

Now, define the sequence fn=k=1nen for all n. As already noted, since en>dn for

all n, D={fn} has upper Banach density zero. We claim that for any infinite set

G of integers, D+G contains arbitrarily long intervals of integers. Fix any infinite

G={gn}, with g1<g2<.... For any k, there exist m1 , m2 , . . . , m2k , m2k+1 so

that gmj+1-gmj>3k for every 1<j<2k. Then, by construction, there exists m

so that em+j=gm2k-j+2-gm2k-j+1+1 for 1<j<2k. This means that fm+i=

fm+j=1iem+j=fm+j=1i(gm2k-j+2-gm2k-j+1+1) =(fm+gm2k+1)-gm2k+1-i+i

for 1<i<2k. But then for each such i, fm+i+gm2k+1-. =fm+gm2k+1+iD+G.

This implies that {fm+gm2k+1+1, ..., fm+gm2k+1+2k} is an interval of integers of

length 2k which is a subset of D+G. Since k was arbitrary, D+G contains arbitrarily

long intervals and so has upper Banach density one.

This answers our question: if we had, in the proof of Theorem 2.1.15, tried to force

ones to occur along infinitely many shifted copies of our set A of upper Banach

density zero, it's possible that no matter what set G of shifts we used, we would

be attempting to force x(i)=1 for arbitrarily long intervals of integers i and some

xX, which would imply that 1"X, yielding the closed invariant set {1}X<

and contradicting the minimality of X.

42

2.4 Some general constructions on connected manifolds

We will now construct some totally minimal, totally uniquely ergodic, and topo-

logically mixing topological dynamical systems (X, T) for which X is a connected

manifold, and in Section 2.5 we will use such examples to prove Theorems 2.1.16 and

2.1.19. The constructions in question will use both skew products and flows under

functions. We will be repeatedly using the topological space T, which can most easily

be considered as the half-open interval [0, 1) with 0 and 1 identified and the operation

of addition (mod 1). (Whenever we refer to the addition of elements of T, it should

be understood to be addition (mod 1).) We also note that Tn is a metric space for

any n with metric d defined by d(x, y)=minu,vZnd¯(x+u, y+v), where d¯ is the

Euclidean metric in Rn. We denote by λk Lebesgue measure on Tk for any k>0.

For any k>1, irrational (y T and continuous self-map f of T, we define the

homeomorphism S=Sk,α,f on Tk as follows: for any v= (v1, ..., vk)Tk, (Sv)1=

v1+α, (Sv)2=f(v1)+v2, and for every j>2, (Sv)j=vj-1+vj.

Theorem 2.4.1. For any f differentiable with 12<f(x)<32 for all x T, Sk,α,f is

totally minimal and totally uniquely ergodic with respect to λk.

Proof: During the proof, since k, (y, and f are taken to be fixed, we suppress

notational dependence and refer to Sk,α,f simply as S. Fix any rectangles R=

i=1kRi and R=i=1kRi in Tk, where Ri and 77: are intervals of length C for every

1<i<k. Now, fix any positive integer p such that λ1 ((R1+Pα) R1)>c2. Fix any

r2R2, ..., rkRk, and define the set

43

E1={x1R1 : π1 (S(x1, r2, ..., rk))R1}.

(Here and elsewhere, πi : TkT is the usual projection map vvi for 1<i<k.)

Since π1 (S(x1, r2, ..., rk))=x1+Iα, by choice of p we know that E1 is an interval

with λ1(E1)>c2. Now, define the set

E2={x1R1 : πi(S(x1, r2, ..., rk))Ri, 1<i<2}.

We will examine the structure of E2 by bounding π2(S(x1,r2,...,rk))x1 from above and

below. For this, we note that π2 (S(x1, r2, ..., rk))=r2+i=0-1f(x1+iα), and

make the observation that for any i, f(x1+iα) is equal modulo one to an increasing

function in x1 whose slope is between 12 and 32. Therefore, considered as a function

of x1, π2 (S(x1, r2, ..., rk)) is equal modulo one to an increasing function from [0, 1)

to R whose derivative is in (2, 32) for all x1. This implies that E2 is a union of many

intervals separated by gaps of length less than 12, where the length of all intervals but

the first and last is greater than c32. For large p, this implies that E2 contains some

set F2 a union of intervals of length D2C for some constant D2, where λ1(F2)>B2C2

for some constant B2.

We proceed inductively: for any 2<i<k, assume that we are given Fi a union of

intervals whose lengths are DiCl-(i-1) for some constant Di, where λ1(Fi)>BiCi

for some constant Bi, and such that for any x1Fi, πj (S(x1, r2, ..., rk))Rj for

1<j<i. We now wish to define Fi+1. Define the set

44

Ei+1={x1Fi : πi+1 (S(x1, r2, ..., rk))Ri+1}.

For any interval I of length DiCl-(i-1) in Fi, let's examine I Ei+1. We note that

πi+1 (S(x1, r2, ..., rk))=ηi+1,(r2, ..., ri+1)+j=0-i f(x1+jα),

where ηi+1, is some function of r2, . . . ' ri+1 which does not depend on x1. So, as a

function of x1, πi+1 (S(x1, r2, ..., rk)) is equal modulo one to an increasing function

from [0, 1) to R whose derivative is between Ali and Bli for some constants A, B>0.

This implies that for large p, I Ei+1 is a union of many intervals separated by gaps

of length less than 1Bl-i, where the length of all but the first and last is greater than

cAl-i. For large p, this implies that I Ei+1 contains FI , i+1 a union of intervals of

length DCl-i where λ1(FI,i+1)>ECλ1(I) for some constants D, E. By taking Fi+1

to be the union of all FI,i+1, we see that Fi+1 is a union of intervals of length Di+1Cl-i,

where λ1(Fi+1)>Bi+1Ci+1 for some constants Di+1 and Bi+1. Since Fi+1Ei+1, for

any x1Fi+1, πj (S(x1, r2, ..., rk))Rj for 1<j<i+1.

By inductively proceeding in this way, we eventually arrive at a set Fk where λ1(Fk)>

BkCk for some constant Bk, and where S(x1, r2, ..., rk)R for every x1Fk. By

integrating over all possible r2, ..., rk, we see that λk(SRR)>BkC2k-1. We have

then shown that for any p with λ1((R1+kα)R1)>c2, λk(SRR)>BkC2k-1.

Denote by L the set of such p. Then if we define Rα to be the transformation xx+α

on T, then 1={n : Rαn(0)J}, where J={xT : λ1((R+x)R)>c2}. It is

easily checked that λ1(J)=C. This means that for any M, NN,

45

1N|L{0, M, ..., M(N-1)}|=1Ni=0N-1χJ((RαM)i0),

which approaches λ1(J)=C as N by total unique ergodicity of Rα with respect

to λ1. Then, for large N,

1Ni=0N-1λk(SMRR)>1N|L{0, M, ..., M(N-1)}|BkC2k-1BkC2k

Therefore,

limNinf1N=0N-1λk(SMRR)>Bkλk(R)λk(R). (2.1)

We have shown that Equation 2.1 holds for R and R arbitrary congruent cubes in Tk.

It is clear that it also holds for R and R disjoint unions of congruent cubes. Suppose

that for some M N, there exists a Lebesgue measurable SM-invariant set ATk

with λk(A)(0,1). By taking complements if necessary, without loss of generality we

may assume that λk(A)<12. By regularity of Lebesgue measure, there exists 6 and

A a union of cubes with side length ε such that (A)c is also a union of cubes of side

length ε, λk(A)<λk(A)<12, and λk(AA)<Bk2λk(A)2. Then, for any pN, since

SMA=A and λk(AA)<Bk2λk(A)2, λk(ASMA)<Bk2λk(A)2 as well. Similarly,

λk(AcSM(A)c)<Bk2λk(A)2. Therefore, λk(SMASM(A)c)<Bkλk(A)2 for all

pN. Since λk(A)<λk(A)<λk((A)c), this contradicts Equation 2.1.

So, SM is ergodic with respect to λk for every M >0. We claim that this also

implies unique ergodicity of SM for every M >0, which follows from an argument of

46

Furstenberg, and rests on the fact that SM is a skew product over an irrational circle

rotation. The following fact is shown in the proof of Lemma 2.1 in [Fu] on p. 578:

For any minimal system (X0, T0) which is uniquely ergodic with respect to a measure

μ0, and any skew product T which acts on X=X0×T by T(x0, y)=(T0x0, y+h(x0))

where h : X0T is a continuous function, if (X, T) is ergodic with respect to μ0×m,

then (X, T) is minimal and uniquely ergodic with respect to μ0×m.

Denote by (SM)(i) the action of S on its first i coordinates for any 1<i<k. Since

they are factors of SM, each (SM)(i) is ergodic with respect to λi. Also, for each

1<i<k, (SM)(i+1) is a skew product as described above with T0=(SM)(i). We

may then use Furstenberg's result and the fact that (SM)(1) is minimal and uniquely

ergodic with respect to λ1 (since it is an irrational circle rotation) to see that (SM)(2)

is minimal and uniquely ergodic with respect to λ2. We can continue inductively in

this fashion to arrive at the fact that SM is minimal and uniquely ergodic with respect

to λk. Since M was arbitrary, S is totally minimal and totally uniquely ergodic with

respect to λk.

We will now use these skew products to define flows under functions which have all

of the previous properties and are also topologically mixing. Define the continuous

function gT2R by

g(x, y)=2+Re(k=2e2πikxek+l=2e2πilyel).

47

Note that 1<g(x, y)<3 for all x, y. We then define the space X={(v, x, y, t) :

vTk, x, yT, 0<t<g(x, y)} where (v, x, y, g(x, y)) and (Sk,α,fv, x+γ, y+γ, 0)

are identified for all v, x, y. X is then homeomorphic to the mapping torus of Tk+2

and a continuous map, and so is a connected (k +3)-manifold. For any irrational

γ, γT, we then define the continuous map Tk+3," γ,γ ,, f : XX by

Tk+3,α,γ,γ,f(v, x, y, t)= ift+1ift+1>g(x,y)<g(x,y),.

Finally, we define μ=(Tk+2gdλk+2)-1λk+3=12λk+3 a Tk+3,α,γ,γ,f-invariant Borel

probability measure on X. We will prove the following:

Theorem 2.4.2. For any f differentiable with 12<f(x)<32 for all xT and any

irrational (y, γ, γT which are linearly independent and which satisfy qn>e3qn and

qn+1>e3qn´, where {qn} and {qn} are the digits in the continued fraction expansions

of γ and γ respectively, (X, Tk+3,α,γ,γ,f) is totally minimal, totally uniquely ergodic,

and topologically mixing.

Again, since (y, γ, γ, and f are fixed, for now we suppress the dependence on these

quantities in notation and denote the transformations Tk+3," γ,γ,f and Sk,α,f by T

and S respectively. We also make the notation, for any integer p>0, g(x, y)=

i=0-1g(x+iγ, y+iγ), and define g0(x, y)=0. The proof of Theorem 2.4.2 rests

mostly on the following lemma, which is essentially taken from [Fa] .

Lemma 2.4.3. For any suffiffifficiently large integer n>0, yT, p[12e2qn, 2e2qn´], and

any x0T with qnx0[16, 13],

48

lqneqn<|g(x,y)x(x0)|<7lqneqn.

Proof: g(x, y)=2P+Re(l=2X(,l)e,e2πilx+m=2Y(,m)eme2πimy) , where

X(l, l) =1-e2πilγ1-e2πilγ,

Y(l, m)=1-e2πimγ1-e2πimγ'.

The following facts are proved in [Fa], p. 454:

For all l N{0}, pN, |X(l, l) |<l. (2.2)

For all nN, l <qn, pN, |X(l, l) |<qn. (2.3)

For all nN, l (qn, 2qn), pN, |X(l, l) |<2qn. (2.4)

For any l<qn+12, |X(l, qn)|>2lπ. (2.5)

For any l<qn+12, |arg(X(l, qn))|<πl-1qn+1. (2.6)

We will use these to prove Lemma 2.4.3. It is easy to check that

g(x,y)x(x0) =Re(l=22πilX(l,l)ele2πilx0)

49

=Re(2πiqn|X(l,qn)|eqne2πiqnx0)+Re(l=2qn-12πilX(l,l)ele2πilx0)

+Re(l=qn+12qn-12πilX(l,l)ele2πilx0)+Re(l=2qn2πilX(l,l)ele2πilx0)

+Re(2πiqnX(l,qn)-|X(l,qn)|eqne2πiqnx0).

We bound the first term from above and below and the rest from above.

|Re(2πiqn|X(,qn)|eqne2πiqnx0)|=|X(,qn)|eqn2πqn|sin(2πqnx0)|, and since p[12e2qn, 2e2qn´],

p<qn+12. By (2.2) and (2.5), 2πl<|X(l, qn)|<p. Since qnx0 [ ;' 13] , 12<

|sin(2πqnx0)|<1. Therefore, 2qneqn<|Re(2πiqn|X(,qn)|eqne2πiqnx0)|<2πqneqn.

Next, by (2.3),

|Re(l=2qn-12πilX(l,l)ele2πilx0)|<2πl=2qn-1l|X(l,l)|el<2πl=2qn-1qnlel<2πqn2.

We similarly have

|Re(l=qn+12qn-12πilX(l,l)ele2πilx0)|<4πqn2.

Also, from (2.2), we can conclude that for large n

|Re(l=2qn2πilX(l,l)ele2πilx0)|<2πl=2qnl|X(l,l)|e<2πll=2qnlel<2πle1.5qn .

Finally, from (2.6) and (2.2), we see that

|X(l, qn)-|X(l, qn)||<2|X(l, qn)|arg(X(l, qn))<2πl-1qn+1l,

50

and so since l<2e2qn´ and qn+1>e3qn´, this implies that

|Re(2πiqnX(l,qn)-|X(l,qn)|eqne2πiqnx0)|<8π2e-qn´qnpeqn.

By combining all of these bounds,

2lqneqn-2πqn2-4πqn2-2πle1.5qn-8π2e-qn´klqneqn<|g(x,y)x(x0)

<2πlqneqn+2πqn2+4πqn2+2πle1.5qn+8π2e-qn´lqneqn,

and since l>12e2qn, for large n we have

lqneqn<|g(x,y)x(x0)|<7lqneqn.

The proof of the following fact is trivially similar:

Lemma 2.4.4. For any suffiffifficiently large integer n >0, x T, p [12e2qn´, 2e2qn+1],

and any y0T with qny0,

lqneqn´<|g(x,y)y(y0)|<7lqneqn´.

Proof of Theorem 2.4.2: Consider any m[53e2qn, 2e2qn´], and any cubes R=

i=1k+3Ri and R=i=1k+3Ri where Ri and Ri are intervals of length C for 1<i<k+3.

Take intervals Qk+1 and Qk+2 of length c2 central in Rk+1 and Rk+2 respectively.

51

Define R¯=i=1kRi. We also make the following definition: l(m, x, y, t) is the integer

p such that g(x, y)<m+t<g+1(x, y). Alternately, for any vTk,

Tm(v, x, y, t)=(S(m,x,y,t)v, x+P(m, x, y, t)γ, y+P(m, x, y, t)γ, m+t-g(m,x,y,t)(x, y)).

Now, fix any yQk+2 and tQk+3. We First define the set

E1={x1Qk+1 : qnx1[16, 13]}.

For large m, λ1(E1)>D1C for some constant D1>0, and E1 is a union of many

intervals where all but the first and last have length 16qn. By removing those, we can

define E1 a union of intervals of length 16qn where λ1(E1)>D2C for some constant

D2>0. We then define the set

E2={x1E1 : λ1(R1+P(m, x1, y, t)αR1)>C2, Qk+1+l(m, x1, y, t)γRk+1,

Qk+2+l(m, x1, y, t)γRk+2)}.

We wish to use Lemma 2.4.3 to analyze the structure of E2. First we note that since

1 <g(x, y)<3 for all x, y, by definition of l(m, x, y, t), l(m, x, y, t)<m+t<

3(P(m, x, y, t)+1), and so for large enough n, l(m, x, y, t)<m<103l(m, x, y, t), or

l(m, x, y, t)[.3m, m]. By our assumption on m, this means that for any x, y, t,

l(m, x, y, t)[12e2qn, 2e2qn´]. Now, fix any interval I of length 16qn in E1, and let

us examine E2I. By the preceding remarks and Lemma 2.4.3, (m,x1,y,t)qneqn<

|πk+3Tm(v,x,y,t)x(x1)|<7(m,x1,y,t)qneqn for every x1I. Since l(m, x, y, t)[.3m, m],

.3mqneqn<|πk+3Tm(v,x,y,t)x(x1)|<7mqneqn (2.7)

52

for every x1I. This means that πk+3Tm(v,x,y,t)x has the same sign for all x1I,

and without loss of generality we assume it to be positive. Let us define L to be the

set of possible values for l(m, x1, y, t) for x1I. (Since g is continuous for every

p, L is an interval of integers.) Then for any fixed pL, define I={x1I :

l(m, x1, y, t)=l}={x1I :g(x1, y)<m+t<g+1(x1, y)}={x1I :

g(x1, y)[m+t-g(x1+Pγ, y+lγ), m+t]}. Since 1<g(x1+Pγ, y+Pγ)<3,

by (2.7) 17mqneqn<m(I)<33mqneqn, or m(I)(17eqnmqn, 10eqnmqn) for all pL except the

smallest and largest, for which m(I) could be smaller.

Since m>e2qn, this means that the number of elements in L approaches infinity

as m does. Now, we note that E2I =L, I, where L is the set of pL

where λ1 ((R1+lα) R1)>c2, Qk+1+lγRk+1, and Qk+2+lγRk+2. Then

|L||L|=1|L|LχJ(Rα,γ,γ,(0)), where Rα,γ,γ is the rotation on T3 given by (a, b, c)

(a+α, b+γ, c+γ) and J={(a, b, c)T3 : λ1((R1+a)R1)>c2, Qk+1+b

Rk+1, Qk+2+cRk+2)}. Since (y, γ, and γ are rationally independent, Rα,γ,γ , is

uniquely ergodic, and so as m, |L||L|λ3(J)=c34.

Due to the already established bounds on λ1(I) for pL, this implies that there is

a constant D3>0 so that λ1(E2I) >D3C3λ1(I) for every interval I in E1. By

removing the possibly shorter first and last subintervals of E2I, we have FIE2I

a union of intervals of length greater than 17eqnmqn with λ1(FI)>D4C3λ1(I) for some

constant D4>0. We take F2 to be the union of all FI, and then λ1(F2)>D5C4 for

some constant D5>0. Finally, we define

53

E3={x1F2 : πk+3(Tm(v, x1, y, t))Rk+3vTk}.

Note that F2 is a union of intervals I, and so fix any such interval I of length greater

than 17eqnmqn. By definition, πk+3 (Tm(v, x1, y, t))=m+t-g(x1, y) for any x1I, and

πk+3(Tm(v, x1, y, t)) ranges monotonically from 0 to g(x1+Pγ, y+lγ) as x1 increases

over I. Since, by Lemma 2.4.3, qneqn<|g(x,y)x(x1)|<7qneqn, λ1(E3I)>D6Cλ1(I)

for some constant D6>0. By taking the union over all I, λ1 (E3)>D7C5 for some

D7>0.

Consider any x1E3. We know that λ1((R1+P(m, x1, y, t)α)R1)>c2, and by

the proof of Theorem 2.4.1, this implies that if we define the set Ax1={vR¯ :

πi(Tm(v, x1, y, t))Ri, 1 <i<k}, then λk(Ax1)>BkC2k-1 for some constant

Bk>0. But this means that for any vAx1,

Tm(v, x1, y, t)

=(S(m,x1,y,t)v, x1+P(m, x1, y, t)γ, y+P(m, x1, y, t)γ, m+t-g(m,x1,y,t)(x1, y))

is in R by definitions of E3 and Ax1. So, there exists a constant D8>0 such

that λk+1 ({(v, x1) :Tm(v, x1, y, t)R)} >D8C2k+4. By integrating over all

possible yQk+2, tQk+3, we see that there is a constant D9>0 such that

μ((TmR)R)>D94C2k+6=D9μ(R)μ(R).

This argument works for any large enough m[53e2qn, 2e2qn´] for some n. An analogous

argument using Lemma 2.4.4 and which involves varying y instead of x shows that

the same is true for any large enough m[53e2qn´, 2eqn+1]. However, this implies that

54

for all sufficiently large m and any congruent cubes R and R, μ((TmR)R)>

D9μ(R)μ(R), which implies that (X, T) is totally ergodic with respect to μ and

topologically mixing. It remains to show that (X, T) is in fact totally uniquely ergodic.

Our proof is similar to the argument of Furstenberg used earlier, however since this is

about a flow under a function and his argument was about skew products, we present

the proof in its entirety here. Fix any M N. Since TM is ergodic with respect to μ,

μ-a.e. every point of X is (TM, μ)-generic. Since μ is shift-invariant, and since shifts

in the last coordinate commute with TM, if a point (v, x, y, t)X is (TM, μ)-generic,

(v, x, y, t) is as well for all 0<t<g(x, y). This implies that for λk+2-a.e. (v, x, y),

the fiber {(v, x, y, t)}0t<g(x,y) consists of (TM, μ)-generic points. Denote by G this set

of (v, x, y) which give rise to (TM, μ)-generic fibers. Choose any (v, x, y, t)X. We

know that for large n, P(nM, x, y, t)[.3nM, nM], and that P(nM, x, y, t) is increasing

in n. Therefore, the set {P(nM, x, y, t) :nN} has positive density. The skew

product U on Tk+2 defined by U(v, x, y)=(Sk,α,fv, x+γ, y+γ) is totally uniquely

ergodic with respect to λk+2 for the same reason as in the proof of Theorem 2.4.1. (The

only difference is that here the base case is the rotation on T3 given by (x, y, z)

(x+α, y+γ, z+γ) instead of an irrational rotation on T. However, since (y, γ,

and γ are rationally independent, this rotation is also totally uniquely ergodic and

totally minimal.) This implies that the set {I : U(v, x, y)G} has density one.

Together, these facts imply that there exists n such that U(nM,x,y,t)(v, x, y)G, or

that TnM(v, x, y, t)=(g, s) for some gG and s R. This means that TnM(v, x, y, t)

is (TM, μ)-generic, and so (v, x, y, t) is as well. Since (v, x, y, t)X was arbitary,

every point in X is (TM, μ)-generic, and so TM is uniquely ergodic with respect to

55

μ. Since μ(U)>0 for every nonempty open set U, TM is minimal as well. Since M

was arbitrary, (X, T) is totally uniquely ergodic, totally minimal, and topologically

mixing.

2.5 Some counterexamples on connected manifolds

Proof of Theorem 2.1.19: Our transformation is T2d+7,α,γ,γ,f for properly chosen

(y, γ, γ, and f. We always take (y to be the golden ratio 5-12 because of a classical

fact from the theory of continued fractions:

Lemma 2.5.1. For any n N, the distance from no to the nearest integer is greater

than 13n.

γ and γ can be any irrational elements of T satisfying the hypotheses of Theo-

rem 2.4.2. What remains is to define f. Before doing so, we use our sequence {pn} to

construct another increasing sequence of integers. For any n, take wn=l(pn, 0,0,0).

In other words, for any vT2d+4, Tpn(v, 0,0,0)=(Swn(v), wnγ, wnγ, pn-gwn(0,0)).

(Here S=S2d+4,α,f and T=T2d+7,α,γ,γ,f .) As before, for large n, wn[.3pn, pn].

We claim that wn+1<(wn+1-wn)d+1 for all large enough n. This is because

wn+1-wn=l(pn+1-pn, wnγ, wnγ,pn-gwn(0,0)):

( Swn+1v, wn+1γ, wn+1γ, pn+1-gwn+1(0,0))=Tpn+1(v, 0,0,0)

=Tpn+1-pn(Tpn(v, 0,0,0))=Tpn+1-pn(Swnv, wnγ, wnγ, pn-gwn(0,0))

56

=(Swn+(pn+1-pn,wnγ,wnγ,pn-gwn(0,0))(Swn(v)),

(wn+l(pn+1-pn, wnγ, wnγ,pn-gwn(0,0)))γ,

(wn+l(pn+1-pn, wnγ, wnγ,pn-gwn(0,0)))γ,pn+1-gwn+1-1(0,0))

Therefore, since pn+1-pn, wn+1-wn[.3(pn+1-pn),pn+1-pn] for large n for

the same reasons as before. This implies for large n that wn+1<pn+1<(pn+1-pn)d<

(103(wn+1-wn))d<(103)d(wn+1-wn)d<(wn+1-wn)d+1 since wn+1-wn. We

wish to choose f so that Swn(0) is bounded away from 0, where 0T2d+4 is the zero

vector.

We define f as an infinite sum: F(v1)=v1+i=1cisxi,εi(v1) is a function from T

to R, and f(v1)=F(v1) (mod t) is a self-map of T. In this sum, ciR+ with

i=1ci<1, xiT and εi>0 will be chosen later, and the function sx,ε(y) for any

xT and ε>0 is a function defined by

sx,ε(y)=

The pertinent properties of sx,ε are that it is nonzero only on the interval [x-ε, x+

ε], it attains a maximum of επ at y=x, and that its derivative is bounded from

above in absolute value by 12. Since each term cisxi,εi in the definition of F is a

differentiable function with derivative bounded from above in absolute value by ci2,

and since i=1ci<1 and the identity function has derivative one everywhere, F is

a differentiable function with F(v1) for all v1 T. This shows that for any

57

choice of ci with i=1ci<1, and for any choice of xi, εi, by Theorem 2.4.2, (X, T) is

totally minimal, totally uniquely ergodic, and topologically mixing.

We wish to choose f so that 1Swn(0) is bounded away from 0. The only quantities still

to be chosen are ci, xi and εi. We note that for any 1<k<2d+4 and any j>k-1,

πk(Sj(0))=i=0j-k+1 f(iα). This can be proved by a quick induction, and is

left to the reader. In particular, if we make the notation yn=π2d+4(Swn(0)) for any

wn>2d+3, then yn=i=0wn-2d-3 f(iα). Our goal is to choose ci, xi, and εi

so that ywn=13 for all sufficiently large n. To do this, we choose xn=wn(y for all

n, and εn=inf0i<wn+1 , iwn|xn-iα|>13wn+1 by Lemma 2.5.1. This guarantees that

sxn,εn(iα)=0 for any 0<i<wn+1, iwn. This means that each choice of ci that

we make changes the values of yn for only n>i, and allows us to finally inductively

define ci.

Recall that our goal is to ensure that yn=i=0wn-2d-3 f(iα)=13 for all

sufficiently large n. Note that since {wn} is an increasing sequence of integers, wn>n

for all n. We have already shown that wn<(wn-wn-1)d+1 for all large n, and so

wn-wn-1>n1d+1, and so wn=w1+i=2n(wi-wi-1)>i=2ni1d+1>n2(n2)1d+1>

14n1+1d+1 for all large n, and so n=1wn-1 converges, a fact which will be important

momentarily. We choose N so that n=N+1wn-1<16π(2d+2)! . The procedure for

defining the sequence cn is then as follows: ci=0 for 1<i<N. For any n>N,

assume that yi=13 for N+1 <i<n. Then, we choose cn so that yn+1=13. Note that

for any n>1, yn=hn(c1, ..., cn-2)+cn-11πεn-1 for some hn : Tn-2T.

This means that taking cn=π(13-hn+1(c1,...,cn-1))(mod 1)εn(2d+2wn+1-wn-1) gives yn+1=13. Note that

58

>12(2d+2)!(wn+1-wn)2d+2 for large n, and recall that εn>13wn+1 for all

n by Lemma 2.5.1. This means that cn<6π(2d+2)!wn+1(wn+1-wn)-(2d+2) , which,

by the hypothesis on the sequence {wn}, is less than 6π(2d+2)!(wn+1)-1, again for

sufficiently large n. This means that n=1cn<6π(2d+2)!n=N+1wn+1-1<1 by

definition of N. We have then chosen cn so that d(Swn(0), 0)>d(yn, 0)=13 for all

n>N. For n<N, note that π1(Swn(0))=wn(y 0, since (y ( Q. This means that

for all n, d(Swn(0), 0)>min(13, min1nNd(wn(y, 0))>0.

However, by definition, π2d+4(Tpn(0,0,0,0))=π2d+4(Swn(0)) for all n. Therefore,

Tpn(0,0,0,0) is bounded away from (0, 0, 0, 0), and so since (X, T) is totally uniquely

ergodic, totally minimal, and topologically mixing, we are done.

We note that there was nothing special about the number 13 in this proof, and so the

proof of the following corollary is trivially similar:

Corollary 2.5.2. For any increasing sequence {wn} of integers with the property that

for some integer d, wn+1<(wn+1-wn)d+1 for all suffiffifficiently large n, and for any

sequence {zn}T, there exists f satisfying the hypotheses of Theorem 2.4.1 so that

for all suffiffifficiently large n, π2d+4((S2d+4,α,f)wn(0))=zn.

Proof of Theorem 2.1.16: Our transformation is T2d+9,α,γ,γ,f for the same (y, γ,

and γ as before. We use the same strategy as we did to prove Theorem 2.1.15; in

other words, we will be forcing certain types of nonrecurrence behavior along a set

comprised of a union of infinitely many shifted subsequences of {pn}.

59

We now proceed roughly as we did in proving Theorem 2.1.15. We define a sequence

{tn} by shifting different pn by different amounts. First, define the intervals of integers

Bj=[j!+1, (j+1)!]N for every jN, and take any partition of N into infinitely

many disjoint infinite sets C1, C2, . . .. We denote the elements of Ci, written in

increasing order, by ci(1), ci(2) - Choose some s1 large enough so that pn+1-pn>2.1

for n> (inf Cs1) !, define the set D1=jCs1Bj, and then define tn=pn+1 for all

nD1. Next, choose some s2 large enough so that pn+1-pn>2ċ2 for n> (inf Cs2) !,

and define D2=jCs2Bj, and then define tn=pn+2 for all nD2. Continuing in

this way, we may inductively define Dk for all kN so that Dk=jCskBj for some

sk with the property that pn+1-pn>2k for n> (inf Csk) !, and then define tn=pn+k

for all nDk. For any nk=1Dk, tn=pn. Note that by the construction, for

any n, k where tn=pn+k, it must be the case that nDk, and therefore that

n>(infCsk)!, and so that pn+1-pn>2k. Therefore, since tn+1>pn+1 for all n,

tn+1-tn>pn+1-pn-k>pn+1-pn2, and so {tn} is increasing. Since n-1> (inf Csk) !,

pn-pn-1>2k, and so in particular pn>2k. This implies that tn=pn+k<2pn

for all n. Finally, we see that tn+1<2pn+1<2(pn+1-pn)d<2d+1(tn+1-tn)d for

all large enough n. Since tn+1-tn>pn+1-pn2 as n, this means that

tn+1<(tn+1-tn)d+1 for large enough n.

We again define a sequence {wn}: for any n, take wn=l(tn, 0,0,0). In other words,

for any vT2d+6, Ttn(v, 0,0,0)=(Swn(v), wnγ, wnγ, tn-gwn(0,0)). (Here S=

S2d+6,α,f and T=T2d+9,α,γ,γ,f .) For exactly the same reasons as in the proof of

Theorem 2.1.19, wn+1<(wn+1-wn)d+2 for large n.

60

Therefore, by using Corollary 2.5.2, for any sequence {zn}T, we may choose f

such that π2d+6(Ttn(0,0,0,0))=π2d+6(Swn(0))=zn for all n. We define zn=13 for

any nDk where nBj for j=csk(i) with odd i, and zn=23 for any nDk where

nBj for j=csk(i) with even i. For any zn not defined by these conditions, zn may

be anything. (We can take zn=0 for such n if it is convenient.) Take hC(X)

such that h(v, x, y, t)=0 if v2d+6=23, h(v, x, y, t)=1 if v2d+6=13, infxXh(x)=0,

and supxXh(x)=1. Now, we note that

{zX : 1Nn=0N-1h(Tpnz) does not converge}

(n>0k>n{zX :. |{i.1<i<k,π2d+6(Tpi(z))=13}|k>34})

(n>0k>n{zX :. |{i.1<i<k,π2d+6(Tpi(z))=23}|k>34}),

and that the latter set, call it B, is clearly a Gδ. We will show that B is dense in X.

Choose any nonempty open set UX. By minimality of T, there is some k so that

Tk(0,0,0, O)U. By construction, tn=pn+k for all nDk. Also by construction,

Dk=jCskBj, and π2d+6 (Ttn(0,0,0,0)) =π2d+6(Tpn(Tk(0,0,0,0))) =13 for j=csk(i)

and i odd. But then for any odd integer i, π2d+6(Tpn(Tk(0,0,0,0)))=13 for any

n[(csk(i))!+1, (csk(i)+1)!] , and so

.|{i.1<i<(csk(i)+1)!,π2d+6(Tpi(Tk(0,0,0,0))=13}|(csk(i)+1)!>csk(i)csk(i)+1,

which is clearly larger than 34 for sufficiently large k. Similarly, for any even integer

i, π2d+6(Tpn(Tk(0,0,0,0)))=23 for any n[(csk(i))!+1, (csk(i)+1)!] , and so

61

.|{i.1<i<(csk(i)+1)!,π2d+6(Tpi(Tk(0,0,0,0)))=23}|(csk(i)+1)!>csk(i)csk(i)+1,

which is also greater than 34 for sufficiently large k. This implies that Tk(0,0,0,0)

B, and so that BU is nonempty. Since U was arbitrary, this shows that B is dense,

and so a dense Gδ. Therefore, B is a residual set by the Baire category theorem, and

for every zB, 1Nn=0N-1h(Tpnz) does not converge.

We note that exactly as in the proof of Theorem 2.1.15, this in fact shows that

for a residual set of zX, liminfN1Nn=0N-1h(Tpnz)=0=infxXh(x) and

limsupN1Nn=0N-1h(Tpnz)=1 =supxXh(x).

2.6 A counterexample about simultaneous recurrence

Given a totally minimal system (X, T), any point xX, and any two positive integers

r and s, since (X, Tr) and (X, Ts) are minimal it must be true that x{Trnx}nN

and x{Tsnx}nN. It is then the case that there exist sequences of positive integers

{ni} and {mi} such that Trnixx and Tsmixx. By Theorem 2.1.17, for a

residual set of x, it is possible to choose ni=mi. To prove Theorem 2.1.21, we must

exhibit an example of a totally minimal system for which this residual set is not all

of X.

Proof of Theorem 2.1.21: For any n, we define a symbolic dynamical system

(Xn, σ) as follows: define xnΩ={0,1}Z by xn(i):=χAn ( iαn (mod 1)), where

62

{(yn}nN is any sequence of rationally independent irrational reals, and An={0}

i=1[nn2i, n+1n2i) . Then, define a to be the left shift on Ω, and T to be the countable

product of a on ΩN : T(ω1, ω2, ...) :=(σω1, σω2, ...). X is then defined to be the

orbit closure of x= (x1, x2, ...) under T:X={Tnx}nZ.

We claim that (X, T) and x prove Theorem 2.1.21. Consider any r>s>0, and

choose n>r. Consider a sequence {ni} of integers where Trnixx and Tsnix

x. Then clearly σrnixnxn and σsnixnxn. Since xn(0)=1, this implies

that xn(rni)=xn(sni)=1 for all large enough i. Therefore, rni(yn (mod 1), sni(yn

(mod 1) An for all large i. This implies that rsni(yn (mod 1) rAn (mod 1) sAn

(mod t) for all large i. However, suprAn=r(n+1)n2<1, so rAn (mod 1) =rAn and

sAn (mod 1) =sAn. We then see that rsni(yn (mod 1) rAnsAn. If rsni(yn

(mod 1) 0, then rsni(yn [rnn2j, r(n+1)n2j)[snn2k, s(n+1)n2k) for some positive integers j, k.

Since n>r, s, it must be the case that j=k, and then we have a contradiction since

rn=(r-1)n+n>sn+n=s(n+1). Therefore, rsni(yn (mod t) =0, and since

(ynQ, ni=0 for all large enough i.

It remains to be shown that (X, T) is totally minimal. Note that for any mZ,

(X, Tm) is a system defined similarly to (X, T), with the sequence of irrationals

{(yn}nN given by (yn=m(yn for n N. Therefore, it suffices to show that (X, T) is

minimal, since the definition of (X, T) requires only that the set of irrationals {(yn}N

is rationally independent, and if this is true, it will be true of {(yn}nN as well. To

show that (X, T) is minimal, we must show that the orbit of any y=(y1, y2, ...) X

under T is dense in X. It suffices to show that for any finite set of words {wi}1ik

63

with to; a subword of xi for 1<i<k, that there exists m >0 such that σm(yi)[wi]

for 1<i<k.

Choose any 1<i<k. Since wi is a subword of xi, there exists p such that σpxi[wi].

This implies that if we denote by pi the length of wi, then the set j=0i-1Bj(i) contains

p(y (mod 1), where Bj(i)=(Ai-j(yi) (mod t) if wi(j)=1 and Bj(i)=((Ai-j(yi)

(mod 1) )c if wi(j)=0. We now claim that the fact that j=0i-1Bj(i) is nonempty

implies that it contains an open interval. Note that for each j, Bj(i) is either a union

of half-open intervals or a union of half-open intervals with a singleton, and that due

to the irrationality of (yi and the rationality of all endpoints of intervals in Ai, the

set of endpoints for intervals in Bj(i) is disjoint from the set of endpoints of intervals

in Bj(i), for any 0<jj<pi. Since j=0i-1Bj(i), there exist IjBj(i) such

that j=0i-1Ij, each Ij is either a half-open interval or a singleton, at most one

Ij is a singleton, and the Ij have no endpoints in common. This implies that either

j=0i-1Ij contains an interval, implying that nj=0i-1Bj(i) does as well, or that for some

0<j<pi, Ij is the singleton { -jαi (mod 1)}. In this case, Ij is not a singleton for

jj, and so 0j<i,jj, Ij contains an interval I, and Ij={ -jαi (mod 1)} lies in

the interior. Choose ε such that Ij+εI. Since Ij is a singleton, Bj(i),=(Ai-j(yi)

(mod 1). Choose k such that n+1n2k<ε. Then [nn2k, n+1n2k)-j(yi (mod 1) -Bj, and by

our choice of ε, [nn2k, n+1n2k)-j(yi (mod t) I as well. Therefore, ([nn2k, n+1n2k)-j(yi

(mod 1) ) (0j<i,jj, Ij) contains an interval, and since IjBj(i) for jj and

[nn2k, n+1n2k)-j(yi (mod 1) -Bj(i),, we see that j=0i-1Bj(i) contains an interval as well.

For each 1<i<k, denote by Ji some interval contained in j=0i-1Bj(i) Choose

64

some δ<min1ik|Ji|. We now note that since the sequence {(yn}nN is rationally

independent, the set {(nα1 (mod 1), nα2 (mod 1), . . . , n(yk (mod 1))}nN is dense

in [0, 1)k. Therefore, there exists r such that the set B={(nα1 (mod 1), nα2

(mod 1), . . . , n(yk (mod 1)) }0n<r is δ-dense, meaning that every point in [0, 1)k is

within a distance of less than δ from some point in B. This means that {(nα1

(mod 1), nα2 (mod 1), . . . , n(yk (mod 1)) }sn<s+r=B+(sα1, sα2, . . . , s(yk) (mod t)

is also δ-dense for any sZ. We note that then for any 1 <i<k and any

mZ, m(yi (mod 1) Ji implies that m(yi (mod 1) j=0i-1Bj(i), and so that

xi(m)xi(m+1) . . . xi(m+li-1) =wi, or that σm(xi)[wi]. Define P=max1ikli.

Now, choose any y= (y1, y2, ...) X. Since y{Tnx}nZ, there exists p such that

yi(0)yi(1) . . . yi(r+l)=xi(p)xi(p+1) . . . xi(p+r+P) for 1<i<k. Then, by the above

comments, there exists m[p, p+r]Z such that xi(m)xi(m+1) . . . xi(m+li-1) =wi

for 1<i<k. Therefore, yi (rn -p) yi (rn -p+1) . . . yi (rn -p+l) =xi(m)xi(m+

1) . . . xi(m+l) for 1<i<k, and since p>pi for 1<i<k, σm-p(yi)[wi] for

1<i<k. Since {wi}1ik were arbitrary, we have shown that the orbit of y under

T is dense in X, and since yX was arbitrary, that (X, T) is minimal. We already

showed that this in fact implies that (X, T) is totally minimal, and so are done.

2.7 Questions

There are some natural questions motivated by these results. For any totally minimal

and totally uniquely ergodic system (X, T), any nonempty open set UX with

65

μ(U)(0,1), and any xU, take the set A= {a N : TnxU}. Since (X, T) is

uniquely ergodic, the density d(A):=limn|{1,2,...,n}A|n of A equals μ(U)>0, and

so the sequence {an} of the elements of A written in increasing order does not satisfy

the hypotheses of any of our theorems. However, since xU, and since TanxU

for all n, Tanx is bounded away from x.

For a similar example, take T to be a totally minimal and totally uniquely ergodic

isometry of a complete metric space (X, d) with diameter greater than 2. Take x, y

X with d(x, y)>2, and define fC(X) where f(z)=0 for all zB1(x) and

f(z)=1 for all zB1(y). Then, take the sets A={nN : TnxB12(y)} and B=

{nN : TnxB12(x)}. Since (X, T) is uniquely ergodic, d(A)=μ(B12(y))>0 and

d(B)=μ(B12(x))>0. For any zB12(x), d(Tanz, y)<d(Tanz, Tanx)+d(Tanx, y)=

d(z, x)+d(Tanx, y)<12+12=1, and so f(Tanz)=1. Also for any zB12(x),

d(Tbnz, x)<d(Tbnz, Tbnx)+d(Tbnx, x)=d(z, x)+d(Tbnx, x)<12+12=1, and so

f(Tbnz)=0. This means that by making a sequence {cn} by alternately choosing

longer and longer subsequences of {an} and {bn}, we could create such {cn} with

d({cn})>0 (so {cn} does not satisfy the hypotheses of any of our theorems) where

IIIII N1Nn=1Nf(Tcnz) fails to converge for any z in the second category set B12(x).

The point of these examples is to show that our hypotheses are certainly not the only

ones under which examples of the types constructed in this paper exist. This brings

up the following questions:

Question 2.7.1. For what increasing sequences {pn} of integers does there exist a

66

totally minimal and totally uniquely ergodic system (X, T) and a point xX such

that Tpnx is bounded away from xl.?

Question 2.7.2. For what increasing sequences {pn} of integers does there exist a

totally minimal and totally uniquely ergodic system (X, T) and a function fC(X)

such that limN1Nn=1Nf(Tpnx) fails to converge for a set of x of second category l.?

Also, it is interesting that we could create examples for a wider class of sequences

{pn} when X was not connected. We would like to know whether or not this is

necessary, i.e.

Question 2.7.3. Given an increasing sequence {pn} of integers and a totally minimal

totally uniquely ergodic topological dynamical system (X, T) and x X such that Tpnx

is bounded away from x, must there exist a system with the same properties where X

is a connected space?

Question 2.7.4. Given an increasing sequence {pn} of integers and a totally minimal

totally uniquely ergodic topological dynamical system (X, T) and fC(X) such that

limN1Nn=1Nf(Tpnx) fails to converge for a set of x of second category, must

there exist a system with the same properties where X is a connected space?

67

(JIAPTER 3

PERTURBATIONS OF MULTIDIMENSIONAL SHIFTS

OF FINITE TYPE

3.1 Introduction

Consider a symbolic dynamical system with positive topological entropy htop(X). (A

rigorous definition of topological entropy appears below, but informally it is the expo-

nential growth rate of the number of different n-letter words appearing as subwords

in elements of X.) Fix any word wA[1,...,n] for nN which appears as a subword

of some element of X. Then, define XwΩ to be the set of all elements of X in

which w is not a subword. Clearly XwX, and so the topological entropy of Xw

is not more than that of X. We wish to estimate the size of the drop in topological

entropy htop(X)-htop(Xw), and how this quantity behaves as n. To more

rigorously state and approach this problem, we begin with some definitions. Some of

these terms have already been defined in Chapter 2, but we repeat the definitions for

the more general setup of Chapter 3.

We denote by Rj1,j2,...,jd the j1×j2× . . . ×jd parallelepiped i=1d{1, 2, ..., ji} in Zd,

and use the shorthand notation Γj for the cube Rj,j,...,j . j1, . . . , jd are the sizes of

Rj1,...,jd, and j is called the size of Γj. We use d to denote the ι metric on Rd : for

68

p, qRd, d(p, q):=|p-q|=max1id|pi-qi|. For each 1<i<d, we denote by

ei the ith element of the standard basis for Zd, and for p, qRd, the distance in

the ei-direction between p and q is defined to be |pi-qi|.

Definition 3.1.1. An alphabet is a finite set. For any alphabet A, we define the Zd-

shift action {σv}vZd on Ω=AZd as follows: for any vZd and xΩ, (σv(x))(u)=

x(v+u) for all uZd .

Definition 3.1.2. AZd-subshift on an alphabet A is a set XΩ with the following

trvo properties:

(i) X is shift-invariant, meaning that for any xX and pZd, σp(x)X.

(ii) X is closed in the product topology on Q.

For any Zd-subshift X, (X, {σv}vZd) is a symbolic dynamical system.

Definition 3.1.3. A Borel probability measure of Ω supported on a subshift X is er-

godic if the measure-preserving dynamical system (X, B(X), μ, {Tv}vZd) is ergodic,

where B(X) is the Borel σ-algebra of X.

Definition 3.1.4. A word u on the alphabet A is any mapping from a non-empty

subset S of Zd to A, where S is called the shape of u. For words u and u, where

u has shape S, we say that u is a subword of u if there exists pZd such that

u(q)=u(q+p) for all qS. In this case, we say that u occurs in u at S+p, or

that u(S+p)=u.

Definition 3.1.5. A word w with shape S is periodic with respect to vZd if

S(S-v) and w(u)=w(u+v) for all uS(S-v). We also say that v is

a period of w. A word is aperiodic if it has no periods.

69

Definition 3.1.6. For any TSZd, we say that trvo words u and u with shape

S agree on T if u(T)=u(T), i.e. u and u have the same letters on T.

Definition 3.1.7. The boundary of thickness k of a subset S of Zd, which is

denoted by S(k), is the set of points p of S for which there exists a point q ZdS

with d(p, q)<k. Whenever we refer to the boundary of a shape S, we mean the

boundary of thickness one.

Definition 3.1.8. T Zd is called a copy of a shape S if T =S+v for some

v Zd. This v is then called the difference of S and T, denoted by T-S. We say

that a point p S corresponds to a point q T if p =q+(S-T).

Definition 3.1.9. The language of aZd-subshift X, denoted by L(X), is the set of

words which appear as subwords of elements of X. The set of words with a particular

shape S which are in the language of X is denoted by LS(X).

Definition 3.1.10. For aZd-subshift X, and for any finite shape SZd, the number

of words in LS(X) is denoted by HS(X), and the natural logarithm of this quantity

is denoted by hS(X).

Definition 3.1.11. AZd-shift of finite type is aZd-subshift defined by specifying

a finite collection of words on A, call this collection F, and then taking X =ΩF

to be the set of elements of Ω which do not contain any member of F as subwords.

Different collections F could induce the same shift of finite type; for any fixed X,

the type of X is the minimum nonnegative integer t such that for some F consisting

entirely of words with shape Γt, X =ΩF. A shift of type trvo is called a Markov

shift.

70

Definition 3.1.12. A word v with shape S forces a word w with shape T in a shift

of finite type X if any x X with x(S)=v also has x(T)=w.

Definition 3.1.13. A subshift X is topologically transitive if there exists xX

for which {σp(x)}pZd is dense in X, i.e. if there is xX which contains every word

in L(X) as a subword.

Definition 3.1.14. The topological entropy of a subshift X, denoted by htop(X),

is defined by

hRj1,j2, ' jd(X)

htop(X)=li.qg˙1,j2,, j1j2ċ. . jd

Definition 3.1.15. The measure-theoretic entropy of a subshift X with respect

to a shift-invariant probability measure μ on X, which is denoted by hμ(X), is defined

by

1

hμ(X)= lim f(μ([w])),

j1,j2,...,jdj1j2ċ. . jd

wLRj1, ' jd(X)

where f(x)=-xlnx for x>0 and f(0)=0.

Both of these limits exist by subadditivity.

We now restate the question from the beginning of Chapter 3. Suppose that a Z-

subshift X on an alphabet A is given, and consider any n-letter word wLΓn(X).

One can then define the subshift Xw as consisting of all elements of X which do not

contain w as a subword. Since XwX, htop(Xw)<htop(X), but it is natural to

wonder how much the entropy decreases by. For example, is it necessarily the case

71

that htop(X)-htop(Xw)0 as n? It is not hard to check that this is not always

the case. For example, if X is minimal, then every xX contains every wL(X) as

a subword. This means that for any word wL(X), Xw=, and so htop(Xw)=0.

This means that minimality is possibly too globally defined for our purposes. It is

then natural to move to the completely local definition of a shift of finite type. In

this setup, as was already mentioned in Chapter 1, there is the following result of

Douglas Lind:

Theorem 3.1.16. ([L], p. 360, Theorem 3) For any topologically transitive Z-shift

of finite type X=ΩF with positive topological entropy htop(X), there exist constants

CX, DX, and NX such that for any n> NX and any word wLΓn(X), if we denote

by Xw the shift of finite type ΩFu{w}, then

CXehtop(X)n<htop(X)-htop(Xw)<DXehtop(X)n.

Lind was apparently examining this question in order to prove some results about

the automorphism group of a shift of finite type; he needed to know that htop(X)-

htop(Xw)0 as the size n of the word w approaches infinity. ([BoyLR]). Some

work on bounds of this type in the case where X=AΩ was also done by Guibas and

Odylzko. ([GO1], [GO2])

We wish to prove a version of Theorem 3.1.16 for multidimensional shifts of finite

type, i.e. shifts of finite type on Zd with d>1. In the remainder of this section, we

will discuss two relatively significant changes that we make in the statement of the

multidimensional extension. This discussion will have the dual purposes of motivating

72

our result and illustrating some of the issues that arise in higher-dimensional symbolic

dynamics. Our methods are similar in places to those used in [QT], where the effects

on entropy of removing words from shifts of finite type are also examined.

The first problem in giving amultidimensional version of Theorem 3.1.16 comes in

changing the exponents in the denominators. The most reasonable guess is that

instead of using ehtop(X)n, in the extension wwee should have ehtop(X)nd However, we

will show that this is a bit too much to hope for. Consider the following examples:

take Y=Ω. Y is clearly a shift of finite type, with the set of forbidden words .

For every jN, we define a shift of finite type as follows: define the alphabet A(j)=

LΓj(Y)={0,1}Γj, and define fj : Y(A(j))Zd by (fj(x))(p)=x(p+Γj-(1, ..., 1))

for every xY and pZd. This map is usually called the higher block presentation

of Y. (see [LM]) In other words, for every pZd, (fj(x))(p) is defined to be the

subword of x which lies in Γj+p- (1, ..., 1), or a copy of Γj whose least vertex in

the usual lexicographic order on Zd is p. Figure 3.1 illustrates the action of f2 on a

typical element of Y for d=2: (the lower-leftmost entry of each array is at the point

(1, ..., 1) Zd)

In Figure 3.1, the horizontal and vertical lines separate letters of the alphabet. So,

the left half shows a word with shape Γ4 whose letters are elements of A, namely

zeroes and ones, and the right half shows a word with shape Γ3 on the alphabet A(2),

which are themselves words with shape Γ2 whose letters are zeroes and ones.

With fj so defined, fj(Y) is a subshift with alphabet A(j). In fact it is easily checkable

that fj(Y) is a Markov shift; x(A(j))Zd is an element of fj(Y) if and only if for

73

1101000111

00 01 11 11 f2 00 01 01 11 11 11

1

1

0

1

0

0

1

1

0

1

1

1

1

0

0

0

1 1

0 0

1 0

0 1

0 1

1 1

0 0

0 1

0 1

1 1

1 1

1 1

0 1

1 0

1 1

0 0

1 1

0 0

Figure 3.1: f2 's action on a sample element of Y

every pZd and 1<i<d, x(p)(Rj,...,j,j-1,j,...,j+ei)=x(p+ei)(Rj,...,j,j-1,j,...,j),

where the j-1 is in the ith subindex in both cases. Since fj is a shift-commuting

bicontinuous bijection between Y and fj(Y), fj(Y) is topologically conjugate to Y

for every j O.

Since Y and fj(Y) are such canonical examples of shifts of finite type (the full shift Y

is in some sense the simplest shift of finite type, and every fj(Y) is conjugate to it),

any meaningful extension of Lind's result should certainly apply to them. fj can be

considered as a map between words just as easily as a function between subshifts: in

the diagram above, if one disregards the dots surrounding the words, then f2 can be

thought of as sending uLΓ4(Y) to f2(u)LΓ3(f2(Y)). For this reason, and since fj

is a topological conjugacy for all j, if we consider Yw the shift of finite type resulting

74

from removing some word w with shape Γn from L(Y), and (fj(Y))fj(w) the shift of

finite type resulting from removing fj(w) from L(fj(Y)), then Yw and (fj(Y))fj(w)

are topologically conjugate as well, and therefore have the same topological entropy.

In other words, the removal of a Γn word from L(Y) results in the same drop in

topological entropy as the removal of a Γn-j+1 word from L(fj(Y)). Now, let us

suppose that Lind's result can be extended by simply changing the n in the exponent

to an nd. This would result in something that looks like this:

Possible Theorem 3.1.17. For any d >1 and any topologically transitive shift

X =ΩF of finite type with positive topological entropy htop(X), there exist constants

CX, DX, and NX such that for any n > NX and any word w LΓn(X), if we denote

by Xw the shift of finite type ΩFu{w}, then

CXehtop(X)nd<htop(X)-htop(Xw)<DXehtop(X)nd.

We can now show that this possible extension cannot be true; suppose that there

exist such constants CY, DY, and NY, which satisfy Possible Theorem 3.1.17 for Y,

and for any j>1, Cfj(Y), Dfj(Y), and Nfj(Y), which satisfy Possible Theorem 3.1.17

for fj(Y). Then, for any word wLΓn(Y), where n> NY,

CYehtop(Y)nd<htop(Y)-htop(Yw)<DYehtop(Y)nd.

By the previous argument, this means that for any j,

CYehtop(Y)nd<htop(fj(Y))-htop((fj(Y))fj(w))<DYehtop(Y)nd. (3. 1)

75

And by the definitions of Cfj(Y), Dfj(Y), and Nfj(Y), and since fj(w)LΓn-j+1(fj(Y)),

as long as n-j+1 >Nfj(Y), we know that

Cfj(Y)ehtop(fj(Y))(n-j+1)d<htop(fj(Y))-htop((fj(Y))fj(w))<Dfj(Y)ehtop(fj(Y))(n-j+1)d. (3.2)

However, since d>1, for large enough n, since htop(fj(Y)) =htop(Y), ehtop(fj(Y))nd

grows much more quickly than ehtop(Y)(n-j+1)d Therefore (3.1) and (3.2) contradict

each other. The problem that must be addressed is that due to examples such as

these, there must be some sort of a "tolerance range" around n in any meaningful

multidimensional version of Theorem 3.1.16. In other words, we should amend our

possible extension to the following:

Possible Theorem 3.1.18. For any d >1 and any topologically transitive shift

X =ΩF of finite type with positive topological entropy htop(X), there exist constants

CX, DX>0, AX, BX, and NXN such that for any n > NX and any word w

LΓn(X), if we denote by Xw the shift of finite type ΩFu{w}, then

CXehtop(X)(n+AX)d<htop(X)-htop(Xw)<DXehtop(X)(n+BX)d.

Since j could be arbitrarily large in the definition of fj(Y), it must be the case that AX

and BX depend on X rather than being absolute constants. In fact, such constants

AX and BX could be thought of as being present in Theorem 3.1.16 as well, but

hidden inside the constants CX and DX. This new version is still not true, however.

76

Figure 3.2: A portion of a sample element of Z

This time, the problem is not with the conclusion, but the hypotheses. Topological

transitivity is not nearly a strong enough hypothesis in several dimensions to get

the type of result that we are after, and in fact we will see that topological mixing

is not nearly good enough either. To show this, we consider a subshift called the

checkerboard island shift, which we call Z, defined by Quas and §ahin in [QS]. Z is

a Z2-Markov shift, on an alphabet A with thirteen letters. Z is also topologically

mixing; for a proof see [QS].

All letters of A and elements of LΓ2(X) are displayed in Figure 3.2. In other words,

a two-by-two word of symbols is in L(Z) if and only if it appears as a subword of the

word in Figure 3.2. Consider the word with shape Γ8 in Figure 3.3, which we call a4.

It will not be shown here (details are in [QS]), but an occurrence of a4 in any element

of Z forces the arrows and diagonal lines surrounding a4 in Figure 3.2, along with

a boundary of thickness one filled with empty spaces. In other words, wherever a4,

77

which has shape Γ8, appears, it forces a word, which we call b4, of shape Γ14. b4

appears in Figure 3.4.

Figure 3.3: a4

Figure 3.4: b4

In fact, for every n>4, there is an analogous word an (a checkerboard of size Γ2n-2

with a boundary filled with arrows and diagonal lines in an analogous way to that

78

of a4) which forces a word bn of size Γ4n-2 around it. We claim that for any n, the

shift Zan obtained by removing an from L(Z) and the shift Zbn obtained by removing

bn from L(Z) are equal. Since an is a subword of bn, it is obvious that ZanZbn.

Suppose that Zbn≦̸Zan. Then there exists xZ which has an occurrence of

an, but no occurrence of bn, which, as has already been stated, is not possible. So,

Zan=Zbn . This implies that htop(Z)-htop(Zan)=htop(Z)-htop(Zbn). However, since

anLΓ2n(Z) and bnLΓ4n-2(Z), we have a contradiction to Possible Theorem 3.1.18

for the same reason as we did before. Since this is true for all n, if we wish to adjust

the conditions of the conclusion again, we will have to change our "tolerance range"

from [n+AX, n+BX] to something like [nAX, nBX], which would weaken the result

considerably. To obtain a conclusion of the strength we want, we need a stronger

mixing property than topological mixing. The notion which we use is that of strong

irreducibility.

Definition 3.1.19. AZd-subshift X is strongly irreducible with uniform filling

length R if for any trvo subsets S, T of Zd, and for all p Zd such that d(S, T+p)>

R (that is, for all q S and r T+p, d ( q, r)>R), and for any elements x, xX,

there exists y X such that y(S)=x(S) and y(T+p)=x(T+p).

For the sake of comparing, we also include a definition of topological mixing whose

formulation looks similar to that of strong irreducibility:

Definition 3.1.20. AZd-subshift X is topologically mixing if for any trvo subsets

S and T of Zd, there exists RS,T such that for all p Zd with the property that

79

d(S, T+p)>R, and for any elements x, xX, there exists yX such that

y(S)=x(S) and y(T+p)=x(T+p).

The difference in the two definitions is subtle but important: for strong irreducibility,

the distance R that must be present between two shapes in order to make interpolation

possible is independent of the shapes involved, whereas in topological mixing, this

distance can depend on the shapes. This distinction is irrelevant in one dimension,

since strong irreducibility and topological mixing are the same concept for Z-subshifts.

(To see this, note that for a Z-shift of finite type, a distance sufficient for interpolating

between two letters of the alphabet is sufficient to interpolate between any two words.)

However, strong irreducibility is a strictly stronger notion in more than one dimension.

Pertinent for our purposes is the fact that if x(S) =w for some x in a stongly

irreducible shift of finite type X, then the occurrence of w at S can only "force"

letters to appear in the region of Zd wrttnn distance R of S. This is because for any

vZd with d(v, S)>R and any aA, by the definition of strong irreducibility

there is some yX with y(S)=w and y(v)=a. For this reason, the checkerboard

island system Z is topologically mixing but not strongly irreducible. Armed with this

stronger hypothesis, we can now refine Possible Theorem 3.1.18 a bit more:

Possible Theorem 3.1.21. For any d >1 and any strongly irreducible shift X =ΩF

of finite type with uniform filling length R and positive topological entropy htop(X),

there exist constants CX, DX>0, AX, BX, and NXN such that for any n > NX

and any word w LΓn(X), if we denote by Xw the shift of finite type ΩFu{w}, then

80

CXehtop(X)(n+AX)d<htop(X)-htop(Xw)<DXehtop(X)(n+BX)d.

In fact the above is now provable with correct choices for the pertinent constants,

and so we state our main result:

Theorem 3.1.22. For any d >1 and any strongly irreducible shift X =ΩF of finite

type with uniform filling length R and positive topological entropy htop(X), there exist

constants NXN and DXR such that for any n > NX and any word w LΓn(X),

if we denote by Xw the shift of finite type ΩFu{w}, then

1ehtop(X)(n+44R+70)d<htop(X)-htop(Xw)<DXehtop(X)(n-2R)d.

Before beginning with the preliminaries of a proof, we make an observation: our

example shifts of finite type fj(Y) used above to show the necessity of the constants

AX and BX in this statement did not actually show that AX and BX are not the

same; it is possible that Afj(Y)=Bf, ċ(Y)=j for all j. However, we may now introduce

another example to show that in fact AX and BX must be distinct for certain X.

Consider, again for any j>1, and any dimension d, the shift of finite type Zj on

the alphabet {0, 1} defined by the set of forbidden words Fj={w{0,1}Γj :

w has at least two ones}. In other words, Zj consists of all infinite words of zeroes

and ones on Zd such that any two ones are a d-distance of at least j from each other.

By definition, Zj is clearly a shift of finite type. We also claim that it is strongly

irreducible with R=j- 1: consider any words xLS(Zj) and xLT(Zj) such

that d(S, T)>j-1. Take y{0,1}Zd defined by y(S)=x, y(T)=x, and y(p)=0

81

for any pST. We claim that yZj : suppose, on the contrary, that yZj.

Then there exist a, bZd such that d(a, b)<j, and y(a)=y(b)=1. If a, bS,

then x contains two ones a d-distance less than j apart and cannot be in L(X), a

contradiction. We arrive at a similar contradiction if a, bT. Clearly we cannot

have aS and bT since d(S, T)>j-1, and similarly it cannot be the case that

aT and bS. But since the only ones in y are in S or T, we have exhausted

all possible cases. Therefore, yZj. By definition, Zj is strongly irreducible with

R=j-1. We claim that in the language of Possible Theorem 3.1.21, it must be the

case that AZjBZj, and in fact that AZj-BZj>2j- 2 =2R.

To prove this, we first make a definition: for any n>0, we define the word an with

shape Γnj+1 by an(p)=1 if and only if all coordinates of p are equal to one modulo

j. anL(Zj) since it contains no two ones a d-distance of less than j apart. Let

us also for any n>0 define the word bn with shape Γ(n+2)j-1 where bn has an as

a subword occupying the central Γnj+1, and has zeroes on all of Γ(n+2)j-1(j-1). By the

definition of Zj, any occurrence of an in an element of Zj forces an occurrence of bn

containing it. Therefore, as argued for fj(Y), (Zj)an=(Zj)bn for any n. However,

the size of the shape of bn is 2j- 2 bigger than that of an for any n, and so for any

AZj, BZj, CZj, and DZj which satisfy Possible Theorem 3.1.21, it must be the case

that AZj-BZj>2j - 2=2R for any j. Although this does not show that the

bounds in Theorem 3.1.22 are optimal, it does show that AX- BX must be at least

linear in R, and so the value of 46R+70 for AX - BX attained in Theorem 3.1.22

can be improved by at most additive and multiplicative constants.

82

Before beginning to prove our main result, we now briefly outline the content of this

paper. The main idea in the proof of Theorem 3.1.22 is to use strong irreducibility

to find a word ww where w and w agree on Γn(t), and then define a map which

replaces all occurrences of w by w in large words vLΓk(X). In order to know how

many-to-one this map is, it is important to know how many times w and w appear in

a "typical" vLΓk(X). In Section 3.2, we use ergodic measures of maximal entropy

to show that for fixed n, any wLΓn(X), and large k, there are "many" words v

in LΓk(X) for which we have good bounds from above and below on the number of

occurrences of w in v. Namely, for a large set of v, the number of occurrences of w is

roughly kdμ¯([w]), where μ¯ is an ergodic measure of maximal entropy on X. We then

use a result of Burton and Steif ([BuSl], [BuS2]) to give upper and lower bounds on

μ¯([w]) of the type e-htop(X)(n+A)d

There is a slight problem though; it is entirely possible that a replacement of w

by w could create a new occurrence of w, which could make this replacement map

extremely unwieldy and hard to deal with. What we would like is a word w such

that replacing w by w can never create a new occurrence of w. Unfortunately, this

is impossible for some choices of w. In Section 3.3, we prove a slightly weaker fact

that is still sufficient for our purposes; for any wLΓn(X), there exists a subword

wLΓm(X) with m not much smaller than n for which there exists ww where

w and w agree on Γm(t) and where a replacement of w by w can never create a new

occurrence of w.

Section 3.4 contains the proof of Theorem 3.1.22. The proof is done by considering

83

maps between L(X) and L(Xw) which either introduce or destroy occurrences of w

(which is done by using the result of Section 3.3 to perform various replacements) , and

estimating the sizes of images or preimages under these maps. We can make certain

helpful assumptions about the words in which these replacements take place (such as

the approximate number of occurrences of w) by using the results of Section 3.2.

In Section 3.5, we deal with a simplified version of our main result where the word w in

question is already chosen to have a nice property. (namely, that there exists w such

that replacing w by w can neither accidentally create new w or w or destroy existing

w or w ) In this case, we prove that (ln2)μ¯w([w])<htop(X)-htop(Xw)<2 (In 2) μ-([w])

for any μ¯ an ergodic measure of maximal entropy on X and μ¯w an ergodic measure

of maximal entropy on Xw . We then show that it is not possible to prove a lower

bound on htop(X)-htop(Xw) of the form Cμ¯([w]) for constant C.

In Section 3.6, we prove a corollary using our methods: for any alphabet A, and

given any set of words w1, . . . , wm where the size of wi+1 is much greater than that

of wi for 1<i<m, Yw1,...,wm, where Y=Ω is the full shift. This can be seen

to be related to some of the so-called undecidability questions in Zd shifts of finite

type; given a set F of words, it is algorithmically undecidable whether or not ΩF

is nonempty. The corollary described gives a condition under which this question is

easily answered.

Finally, in Section 3.7, we give some open questions that arise in the course of proving

our results.

84

3.2 Some measure-theoretic preliminaries

We will now move towards proving Theorem 3.1.22. We first need a lemma about

strongly irreducible systems:

Lemma 3.2.1. For any Zd-subshift X which is strongly irreducible with uniform fill-

ing length R and for any rectangular prism Rj1,j2,...,jd, ehtop(X)j1j2...jd<HRj1,j2, ' jd(X)

<ehtop(X)(j1+R)(j2+R)...(jd+R).

Proof: We first prove the first inequality: take any rectangular prism Rj1,...,jd . For

any kN, since Rkj1,...,kjd can be partitioned into td disjoint copies of Rj1,...,jd,

HRkj1, ' kjd(X)<HRj1, ' jd(X)kd Take natural logarithms of both sides and divide by

kdj1...jd:

hRkj1,,kjd(X)kdj1...jd<hRj1,,jd(X)j1...jd,

and by letting k approach infinity, we see that htop(X)<hRj1,,jd(X)j1...jd, which implies

that ehtop(X)j1...jd<HRj1, ' jd(X).

We prove the second inequality in almost the same way: again consider any rect-

angular prism Rj1,...,jd . For any positive integer k, Rk(j1+R),k(j2+R),...,k(jd+R) can be

broken into kd copies of Rj1+R,...,jd+R, and we can choose copies of Rj1,...,jd inside each

Rj1+R,...,jd+R so that the d-distance between any pair of these Rj1,...,jd is greater than

R. This is illustrated in Figure 3.5.

(A note on figures: all figures in this paper are drawn with d=2, but this is in some

sense without loss of generality; all of the ideas that we use can be illustrated via

85

Figure 3.5: Rk(j1+R),k(j2+R),...,k(jd+R)

two-dimensional pictures. All constructions and descriptions are carried out in full

generality. )

By strong irreducibility, for any way of filling these kd copies of Rj1,...,jd with words in

L(X), the remainder can be filled to make a word in LRk(j1+R),k(j2+R), ' k(jd+R)(X). This

implies that HRj1, ' jd(X)kd<HRk(j1+R),k(j2+R), ' k(jd+R)(X). Take natural logarithms of

both sides and divide by kd(j1+R)(j2+R)... (jd+R):

hRj1,,jd(X)(j1+R)...(jd+R)<hRk(j1+R),,k(jd+R)(X)kd(j1+R)...(jd+R),

and by letting k approach infinity, we see that hRj1,,jd(X)(j1+R)...(jd+R)<htop(X), or that

HRj1, ' jd(X)<ehtop(X)(j1+R)(j2+R)...(jd+R).

86

To prove Theorem 3.1.22, we will be repeatedly using the concept of a measure of

maximal entropy.

Definition 3.2.2. A shift-invariant probability measure μ on a subshift X is called

a measure of maximal entropy if hμ(X)=htop(X).

Such measures are said to have maximal entropy because of the following Variational

Principle:

Theorem 3.2.3. For any Zd-subshift X, htop(X)=suphμ(X), where μ ranges over

all shift-invariant Borel probability measures on X. This supremum is achieved for

some μ.

See [M] for a proof. A useful observation is that since the extreme points of the

set of measures of maximal entropy are ergodic, and since the set of measures of

maximal entropy is nonempty for subshifts, any subshift also has an ergodic measure

of maximal entropy. (See [Wal] for details.) We need the following proposition:

Proposition 3.2.4. ( [BuS2], p. 281, Proposition 1.20) Let μ be a measure of maximal

entropy for aZd-shift X of finite type t. Then for any shape UZd, the conditional

distribution of μ on U given a fixed word wL(UC)(t) (X) is uniform over all words

xLU(X) such that the word y rvith shape U (& ) defined by y(U)=x and

y((Uc)(t))=w is in L(X).

We have the following convenient restatement:

Proposition 3.2.5. For any Zd-shift X of finite type t, and for any measure μ of

maximal entropy on X, and for any word w LS(X), and for any shape S(t)T

87

S, μ([w])=μ([w(T)])|LS(X)[w(T)]|. In other words, any trvo words with the same shape S which

agree on T have the same measure under μ.

Proof: Consider any two words w, wLS(X) with w(T)=w(T). We claim that

((SS(t))c)(t)S(t). Consider p((SS(t))c)(t) By definition, p(SS(t))c

and there exists qSS(t) such that d(p, q)<t. Suppose that pSc. Then,

since d(p, q)<t and since qS, by definition, qS(t), a contradiction. Therefore,

pS. Since p(SS(t))c, this implies that pS(t). Therefore, ((SS(t))c)(t)

S(t). We define x=w(SS(t)) and x=w(SS(t)). Since S(t)T, and since

((SS(t))c)(t)S(t), w(((SS(t))c)(t))=w(((SS(t))c)(t)). Therefore, if we de-

fine z=w(((SS(t))c)(t)), then since either x or x could fill SS(t) while z fills

((SS(t))c)(t), we may take U=SS(t) and use Proposition 3.2.4 to see that

μ(w)=μ(w).

Lemma 3.2.6. For any strongly irreducible Zd- shift X of finite type t with uniform

filling length R, and for any measure μ of maximal entropy on X, and for any word

u with shape S a subset of ΓM,

1H(X),(su(SC)(t+R))(su(SC)(t+R))(t),<μ([u])<1HSS(R)(X).

Proof: Before we begin the proof, we recall that for any positive integer j, S(j) is

the boundary of thickness j of S, and that (Sc)(j) is the boundary of thickness j of

88

the complement of S, i.e. the set of all points pS for which there exists qS with

d(p, q)<j.

Fix uLS(X). We begin by bounding μ([u]) from below. Choose any word v

L(su(Sc)(t+R))(t)(X). We claim that d(S, (S (Sc)(t+R))(t))>R. Consider any pS

and q (SU(Sc)(t+R))(t). By definition, there exists rS (Sc)t+R such that

d(q, r)<t. Also by definition, d(p, r)>R+t. Therefore, by the triangle inequality,

d(p, q)>R, as claimed. This means that by strong irreducibility, there exists a

word wvLsu(Sc)(t+R)(X) such that wv(S) =u and wv((S (Sc)(t+R))(t))=v.

By Proposition 3.2.5, μ([wv])=μ([v])|Lsu(SC)(t+R)(X)[v]|>μ([v])H(su(SC)(t+R))(su(SC)(t+R))(t)(X).

The inequality comes from the fact that the number of words in L(X) with shape

S (Sc)(t+R) with the fixed word v on (S(Sc)(t+R))(t) is clearly less than or equal

to the number of words in L(X) with shape (S (Sc)(t+R))(S (Sc)(t+R))(t) For

clarity, we rewrite:

μ([wv])=μ([v])H(X),(su(SC)(t+R))(su(SC)(t+R))(t), ċ (3.3)

If we sum (3.3) over all possible choices for v, then we get

vL(Su(SC)(t+R))(t)(X)μ([wv])=vL(X)(su(SC)(t+R))(t)μ([v])H(X),(su(SC)(t+R))(su(SC)(t+R))(t),ċ (3.4)

Note that all wv are distinct, and for all wv, wv(S)=u. Therefore,

vL(Su(SC)(t+R))(t)[wv](X)[u],

89

and so vL(su(SC)(t+R))(t)(X)μ([wv])<μ([u]). Since vL(su(SC)(t+R))(t)(X)[v]=X,

vL(su(SC)(t+R))(t)(X)μ([v])=1. By combining these facts with (3.4), we see that

μ([u])>1H(X),(su(SC)(t+R))(su(SC)(t+R))(t), ċ

We now bound μ([u]) from above. Choose any word vLsu(SC)(t) (X) with v(S)=u.

We wish to use Proposition 3.2.5 with (Sc)(t) as our T. To do so, we need to know that

(S (Sc)(t))(t)(Sc)(t)SU (Sc)(t). The second containment is trivial. To prove the

first, suppose that p (S (Sc)(t))(t). By definition, there then exists qS (Sc)(t)

such that d(p, q)<t. Since qS (Sc)(t), in particular qS, or qSc. Therefore,

since d(p, q)<t, p(Sc)(t) by definition. We now apply Proposition 3.2.5:

μ([v])=μ([v((Sc)(t))])|Lsu(Sc)(t)(X)[v((Sc)(t))]| (3.5)

We claim that d(SS(R), (Sc)(t))>R; choose any pSS(R) and q(Sc)(t).

Clearly qSc. If d(p, q)<R, then pS(R), a contradiction. Thus, d(p, q)>R. This

implies that for any v, and for any vLSS(R)(X), there exists yLsu(SC)(t) (X) with

y((Sc)(t))=v((Sc)(t)) and y(SS(R))=v. Therefore, |Lsu(Sc)(t)(X)[v((Sc)(t))]|>

HSS(R) (X). By using this fact and summing (3.5) over all possible v, we get

vLsu(SC)(t)(X)[u]μ([v])<vLsu(SC)(t)(X)[u]μ([v((Sc)(t))])HSS(R)(X). (3.6)

Note that since all v are distinct, vLsu(SC)(t)(X)[u][v]=[u], and so

90

vLsu(SC)(t)(X)[u]μ([v])=μ([u]). Also note that since all v are distinct, but all have

v(S)=U, it must be the case that all v((Sc)(t)) are distinct, and so

vLsu(SC)(t)(X)[u]μ([v((Sc)(t))])<1.

Combining these facts with (3.6), we see that

μ([u])<1HSS(R)(X),

which, along with the lower bound on μ([u]) already achieved, completes the proof.

To avoid confusion, from now on we use μ¯ to denote an ergodic measure of maximal

entropy on a subshift X, and μ to denote a shift-invariant probability measure which

may or may not be of maximal entropy.

Lemma 3.2.7. For any Zd-subshift X, and for any shift-invariant ergodic probability

measure μ on X, and for any finite set of words u1LS1(X), u2LS2(X), ..., uj

LSj(X) and ε>0, if we denote by Ak,ε,μ,u1,...,uj(X) the set of words in LΓk(X) which

have be rween kd(μ([ui])-ε) and kd(μ([ui])+ε) occurrences of ui for all 1<i<j,

then liminfkln|Ak,ε,μ,u1,,uj(X)|kd>hμ(X).

Proof: (For brevity, we use the shorthand notation μ(Ak,ε,μ,u1,...,uj(X)) for the mea-

sure of the union of all cylinder sets corresponding to words in Ak,ε,μ,u1,...,uj (X).) Fix

any ε>0. Firstly, we notice that it is a simple consequence of Birkhoff's ergodic

theorem that limkμ(Ak,ε,μ,u1,...,uj(X))=1: for each 1<i<j,

91

limk1kdpΓkχ[ui] (σp(x))=μ([ui]) for almost every xX,

where a represents the Zdacti˙on by shifts on X. Convergence almost everywhere

implies convergence in measure, meaning that

limkμ ({ x : 1kdpΓkχ[ui](σp(x))(μ([ui])-ε, μ([ui])+ε) for 1<i<j} )=1. (3.7)

Denote by Ak,ε,μ,u1,...,uj (X) the set whose measure is taken on the lefthand side of

(3.7). Then whether or not xX lies in Ak,ε,μ,u1,...,uj(X) depends only on its values

on Γk+M, where M is the minimum integer such that all Si are subsets of some copy

of ΓM. Therefore, Ak,ε,μ,u1,...,uj(X) is a union of cylinder sets of words in LΓk+M(X).

It is not hard to see that for large k, any word whose cylinder set is a subset of

Ak,ε2,μ,u1,...,uj (X) is an element of Ak+M,ε,μ,u1,...,uj(X). Combining this with the fact

that (3.7) is true for any ε>0, we see that limkμ(Ak,ε,μ,u1,...,uj(X))=1.

We now write out the formula for the measure-theoretic entropy of X:

hμ(X, ξ)=limk1kduLΓn(X)f(μ([u])),

where f(x)=-xlnx for x>0 and f(0)=0. Since ξ is a measure-theoretic

generator for the a-algebra of Borel sets in X, we know that hμ(X, ξ)=hμ(X). (For

more information on measure-theoretic and topological generators, see [Wal])We now

partition LΓk (X) into the two pieces Ak , ε , μ , u1 , ... , uj (X) and Ak , ε , μ , u1 , ... , uj (X)c :

92

f(μ([u]))

hμ(X)=limk(1kd

uAk,ε,μ,u1, ' uj(X)

+1kduAk,ε,μ,u1, ' uj(X)cf(μ([u]))).

To estimate each summand, we use the easily checkable fact that for any set of

nonnegative reals α1, α2, . . ., (yj whose sum is β, the maximum value of i=1jf(αi)

occurs when all terms are equal, and is β In jβ. Using this, we see that the above sum

is bounded from above by

1kd(μ(Ak,ε,μ,u1,...,uj(X))ln|Ak,ε,μ,u1,...,uj(X)|μ(Ak,ε,μ,u1,...,uj(X)))

+1kd((1-μ(Ak,ε,μ,u1,...,uj(X)))lnHΓk,(X)1-μ(Ak,εμ,u1,...,uj(X))).

Since, as k, μ(Ak,ε,μ,u1,...,uj(X)) approaches 1 and ln|LΓk(X)|kd approaches htop(X)<

, the second term approaches zero as k. By replacing μ(Ak,ε,μ,u1,...,uj(X)) by

1 in the limit in the first term, we see that

hμ(X)<limkinf1kd In |Ak,ε,μ,u1,...,uj(X)|,

which was exactly what needed to be shown.

Corollary 3.2.8. For any strongly irreducible Zd-shift X of finite type t with uniform

filling length R, for any ε>0, and for any positive integers k and M, denote by

Ak,ε,M(X) the set of words u in LΓk(X) with be rween

93

kd(1H(X),(siu(sic)(t+R))(siu(siC)(t+R))(t), -ε) and kd(1H(X),sisi(R), +ε) occurrences of ui

LSi(X) for every word ui with shape SiΓM. Then,

limkln|Ak,ε,M(X)|kd=htop(X).

Proof: Using a combination of Lemmas 3.2.6 and 3.2.7 for any fixed μ¯ an ergodic

measure of maximal entropy on X shows that

limkinf ln|Ak,ε,M(X)|kd>hμ¯(X)=htop(X).

And by the definition of topological entropy,

limksupln|Ak,ε,M(X)|kd<limksuphΓk(X)kd=htop(X).

3.3 A replacement theorem

We must prove an auxiliary theorem about replacements which is mostly combinato-

rial in nature, and which will be integral in the proof of the upper bound portion of

Theorem 3.1.22.

Theorem 3.3.1. For any strongly irreducible Zd-Markov shift X with uniform filling

length R, there exists N depending on X such that for any choice of wLΓn(X)

with n>N, there exists wLΓm(X) a subword of w, where m>n-n1-13d, and

wLΓm(X) so that w and w agree on the boundary, and with the property that

94

replacing an occurrence of w by w in any element of X cannot possibly create a nerv

occurrence of w.

Proof: First let us make sure that the statement of Theorem 3.3.1 is totally clear.

What this means is that we find w, wLΓm(X) such that w is a subword of w, w

and w agree on the boundary, and such that the following is true: for any xX

with x(S) =w for some S a copy of Γm, if one defines xX by replacing x(S)

by w, then for any T a copy of Γm such that x(T)=w, it must be the case that

1

1

x(T)=w. We define l =(dlnnhtop(X))d¯+1 or l ={

(dlnnhtop(X))d¯+2, whichever is

odd. Then, HLΓ, (X) is, by Lemma 3.2.1, at least ehtop(X)ld>edlnn=nd. This is

greater than the number of words with shape Γl which occur as subwords of w, and

so therefore there exists a word aLΓ, (X) which is not a subword of w. This allows

us to make a definition:

Definition 3.3.2. Given a subword u of w, choose a subword of u of shape Γ2R+l

which does not contain any portion of the boundary of u, and use strong irreducibil-

ity to replace this subword by any word of shape Γ2R+l which has a as the subword

occupying its central Γl. The word that u has become after this replacement is called

a standard replacement of u, and so is any word created in this way.

Choosing the Γ2R+l to be replaced to be disjoint from the boundary of u ensures that

any standard replacement of u agrees on the boundary with u. This means that since

X is a Markov shift, any occurrence of u in a point xX can be replaced by a

standard replacement u, and the resulting element of Ω is still in X.

Definition 3.3.3. A subword u LΓk(X) of w has Property A if for every standard

95

Figure 3.6: A standard replacement of u

replacement u of u, replacing u by u in some xX could possibly create a nerv

occurrence of u. In other words, for every standard replacement u of u, there exists

xX and S, T copies of Γk such that x(S)=u, x(T)u, and if we define xX

by replacing x(S) by u, then x(T)=u.

Clearly, if we can find u a subword of w which is large enough and does not have

Property A, we will be done.

The organization of this induction is a bit odd, so we give a quick overview. We will

construct a sequence wj of subwords of w, where w1=w and each is a subword of

the previous. Each wj has shape a cube. We will be able to show that one of these wj

does not have Property A, and that it is large enough in the sense of Theorem 3.3.1.

However, the construction of these wj depends on a fact about periodicity of subwords

of w which have Property A, which we must prove first. We will set up and prove

96

this periodicity fact for any wj even though wj has not yet been defined for j>1.

This is not a problem though; this proof depends only on wj being a subword of w,

which will be clearly true once the definitions of wj are able to be made.

Let's assume that wj has Property A for some j. For the purpose of the following

constructions, it is helpful to picture wj as fixed in space, so say that wj has shape

Γnj, i.e. the cube with size nj which lies entirely within the positive octant of Zd

and has a vertex at (1, 1, ..., 1). To make sure to distinguish it from other copies of

Γnj, we call the shape that this fixed occurrence of wj occupies Bj. We take a copy

of Γ3(2R+l)nj(1-12d) or Γ3(2R+l)(nj(1-12d)-1) (call it Aj, and its size kj), where the size

is chosen to have the same parity as nj, and which is central in Bj, and partition

it into (3(nj(1-12d)))d or (3(nj(1-12d))-1)d disjoint copies of Γ2R+l . Consider only

the interior copies of Γ2R+l in this partition, i.e. the ones which are disjoint from the

boundary of Aj. For large nj, there are more than 2dnjd-12 of these, and to each one

we may associate a standard replacement of wj, and since wj has Property A, also

a copy of Γnj which overlaps Bj which could somehow be filled with wj at the same

time that Bj is filled with the standard replacement in question. We use Rj to denote

the set (possibly with repetitions) of these copies of Γnj.

Recall that each of these represents a possible new occurrence of wj created by per-

forming a standard replacement. Since any of the standard replacements that we

would perform is done by changing only letters in a particular Γ2R+l in Bj, it must

be the case that every element of Rj contains some portion of its associated Γ2R+l .

(Otherwise, the supposed "new" occurrence of wj would have already existed before

97

Figure 3.7: A sample element S of Rj

the replacement.) Also, since every standard replacement results in some occurrence

of a in Aj, and since a is not a subword of wj, none of the elements in Rj contains

all of its associated Γ2R+l . In Figure 3.7, the element of Rj shown, which we call

S, is associated to the standard replacement given by changing the letters in the

highlighted Γ2R+l .

We first make the claim that Rj contains no repeated elements, i.e. that two distinct

copies of Γ2R+l must be associated to distinct elements of Rj.

In Figure 3.8, suppose that S is associated to the standard replacements correspond-

ing to each of U and V, the two highlighted copies of Γ2R+l . This means that there

98

Figure 3.8: An element S of Rj associated to two standard replacements

exists a word uLBjS(X) with u(S)=wj and u(Bj)=wj, where wj is a standard

replacement of wj made by changing only the letters in U. The letters in US must

be changed from their values in wj in order for the occurrence of wj in S to be "new."

Therefore, u(BjU)=wj(BjU), and u(US)wj(US). Similarly, there exists

a word uLBjS(X) with u(S)=wj, u(BjV)=wj(BjV), and u(VS)

wj(VS). Since U and V are disjoint, this implies that u(US)wj(US) and

u(US)=wj(US), and so that u(US) u(US). But u(S)=u(S)=wj,

and since USS we have a contradiction. Now we know that Rj consists of more

than 2 dnjd-12 distinct copies of Γnj.

99

Definition 3.3.4. A suboctant of Bj is a subcube of Bj which shares vertices with

both Bj and Aj, is co ntained in Bj, and whose intersection with Aj contains only the

shared vertex. (See Figure 3.9.)

Figure 3.9: The suboctants of Bj

It is fairly clear upon observation that each element of Rj must contain some suboc-

tant of Bj, since each element of Rj contains some portion of Aj. Since Rj has more

than 2dnjd-12 distinct elements, there is ssoomm e suboctant 0j which is contained in more

than njd-12 of the copies of Γnj in Rj.

We now prove a fact about the periodicity of a set which is contained in two elements

of Rj. Suppose that two distinct elements S and S of Rj are given such that S-S,

which we denote by v, is a multiple of ei for some 1<i<d. We make the additional

too

assumption that both S and S contain some suboctant O of Bj, and assume without

loss of generality that O is the least suboctant lexicographically of Bj, i.e. O has the

point (1, ..., 1) as a vertex, and that v is a negative multiple of ei. (This is without

loss of generality because one can make these assumptions simply by renaming S and

Sand/or reflecting Bj several times, which does not affect our proof.) We will show

that certain portions of wj must be periodic based just on these hypotheses.

Figure 3.10: Elements S, S of Rj whose difference is a multiple of ei

In Figure 3.10, t is any point of Bj such that t+v is also in Bj, O is the suboctant

of Bj opposite O, q is the vertex shared by O and Bj, r is the vertex of Bj opposite

101

q, p is the vertex of S corresponding to r in Bj, and v=t-(p+v). P and P are

subcubes of O and O, respectively, which share vertices with Aj and whose size is

nj-kj2-kj, or the size of O minus the size of Aj. (The dotted line portions are copies

of Aj to remind us of the definition of P and P, but are themselves irrelevant and

are not named.) We define t to be r+v, giving us two new points t and t+v.

Since S, SRj, they may each be filled with wj when Bj is filled with the correct

standard replacement of wj. Call U the copy of Γ2R+l on which wj is altered to make

the standard replacement which allows S to be filled with wj, and V the copy of

Γ2R+l on which wj is altered to make the standard replacement which allows S to be

filled with wj. (A quick note: in several figures in this paper, we represent a point

pZd by the vector which points from 0 to p.)

We first claim that if t, t+vP, then t, t+vBj(U V). Suppose that

t, t+vP. Since S contains some portion of V, but does not contain all of V, the

same can be said of Aj ; S contains some portion of Aj, but not all of Aj. Therefore,

some coordinate of p+v is between nj-kj2+1 and nj+kj2. (We always use the word

"between" in the inclusive sense.) The fact that S has nonempty intersection with

Aj also implies that all coordinates of p+v are between nj-kj2+1 and nj, since all

coordinates of all elements of Aj are at lleuausutt nj-kj2+1. Since tP, every coordinate

of t is between kj+1 and nj-kj2. This implies that every coordinate of v is nonpositive

and at least -nj+1, and also that one coordinate of v is at least -nj-kj2+1. Then,

t=r+v is in Bj, and has one coordinate at least nj+kj2+1, and therefore does not

lie in Aj, and so is not in U or V. Since t+vP, the same argument shows that

102

t+v is in Bj, but does not lie in U or V. Clearly, since U and V are disjoint from

the boundary of Aj by their construction, t, t+v are not in U or V either.

Suppose that t, t+v, t, t+vBj(U V). Then by noting that S can be filled with

wj when Bj is filled with a standard replacement of wj which agrees with wj outside

of UV, we can infer that wj(t)=wj(t+v) (this is because tS corresponds

to t+vBj), and by noting that S can be filled with wj when Bj is filled with

a standard replacement of wj which agrees with wj outside of UV, we infer that

wj(t)=wj(t) and wj(t+v)=wj(t+v). (This is because tS corresponds to

tBj, and t+vS corresponds to t+vBj) This implies that wj(t)=wj(t+v).

This conclusion was dependent only on the fact that t, t+v, t, t+vBj(U V).

We have already shown that this is true for any t, t+vP, and so wj(P) is periodic

with respect to v.

Note that in the course of this argument, we have also shown that wj(t)=wj(t+v).

We claim that any pair of points in P which are separated by v are t and t+v

for some choice of t, t+vBj(U V). To do this, we just determine t from t :

if tP, then every coordinate of v is between -nj-kj2 and -kj+1. As already

mentioned, all coordinates of p+v are between nj-kj2+1 and nj, and one coordinate

is between nj-kj2+1 and nj+kj2. This implies that p+v+v=t has all coordinates

between 1 and nj, and so is in Bj, and that one coordinate is between 1 and nj-kj2+1,

which implies that t either does not lie in Aj or lies in the boundary of Aj, and, in

either case does not lie in U or V. Since t+vP, the same argument shows that

t+v is in Bj, but does not lie in U or V. We have then shown that t and t+v could

103

be any pair of points in P separated by v, and so the fact that wj(t)=wj(t+v) in

the above argument shows that wj(P) is also periodic with respect to v.

We will use this fact momentarily; for now let's recall that we earlier showed that

some suboctant Oj of Bj is contained in more than njd-12 distinct elements of Rj. As

before, denote by q the vertex of Bj shared by Bj and Oj, and by r the vertex of Bj

opposite q. It is clear upon observation that for any element S of Rj which contains

Oj, the vertex of S which corresponds to r in Bj must be contained in Bj. Fix any ei

in Zd. Bj can be partitioned into njd-1 sets {w+mei}0m<nj where w ranges over all

points w in Bj with wi=1, and by tthllee above comments, there are more than njd-12

distinct points in Bj which are vertices of elements of Rj which contain Oj. By the

pigeonhole principle, this implies that one of the sets {w+mei}0m<nj contains more

than nj of these vertices. Again by the pigeonhole principle, this implies that there

are two elements of Rj, call them S and S, which contain Oj and where S-S is a

multiple of ei whose length is less than nj. We make the notation vi(j):=S-S.

This in turn implies that wj is periodic with respect to vi(j) on the regions above

described as P and P, and since these regions are dependent on Oj and Oj, we call

them Pj and Pj . Since this can be done for all 1<i<d, wj(Pj) and wj(Pj) are

periodic with respect to v1(j), v2(j), . . . , vd(j) where for each 1<i<d, vi(j) is a multiple

of ei with length less than nj. This implies that their lengths are less than n as

well, since nj<n.

Definition 3.3.5. A word w is purely periodic if for some positive integers n1, ..., nd,

it is periodic with respect to niei for 1<i<d.

104

We have then shown that if wj has Property A, then there are regions Pj and Pj

as described above so that wj(Pj) and wj(Pj) are purely periodic with all periods

less than n. In particular, the preceding is true for j=1 and w1=w. For

each 1 <i<d, we can then choose vi(1) a multiple of ei which is a period for

w1(P1) and w1(P1) and which has length less than n. We now choose w2 a subword

of w1. For sufficiently large n1, we take w2 to be the subword of w1 with shape

B2=Γn1-((10ċ32d)(2R+l)n11-12d) and which still has q, the vertex shared by O1 and

B1, as a vertex. The purpose of this is to cause the forced purely periodic portion P1

of w1 to contain a purely periodic central cube within w2. We will choose subsequent

wj to be purely periodic on a central cube as well, which we will denote by Cj, and

so wj(Cj) will always be a purely periodic subword of wj. In fact, for each j, we will

choose wj+1 so that wj+1(Cj+1) is a subword of wj(Cj), and so each wj(Cj) will be

periodic with respect to vi(1) for 1<i<d. Due to the construction of w2, we can

take C2 to have shape Γ10ċ32d(2R+l)n11-12d in w2. We also move B2 in space to lie

entirely within the positive octant of Zd with one vertex at (1, ..., 1) (in other words,

B2=Γn2), so that the phrasing of the earlier arguments still works.

This choice of w2 was done to create C2 : from now on we will follow a different

inductive procedure. We need one more definition:

Definition 3.3.6. A superoctant of any Bj (j > 1) is a subcube which shares

vertices with both Bj and Cj, is contained in Bj, and contains Cj.

There is then a natural one-to-one correspondence between the superoctants and

suboctants of Bj ; for every superoctant, there is exactly one suboctant which is a

105

subset of it. For each j>2 where wj has Property A, we now describe how to

construct wj+1. For each such j, we define Tj to be the set of superoctants O¯ of Bj

such that wj(O¯) is periodic with respect to vi(1) for 1<i<d. We denote the size of Cj

by mj. Since w1(P1) is periodic with respect to each vi(1), and since w1(P1)=w2(O¯)

for a superoctant O¯ of w2, we have then already shown that T2 contains O, and

m2=10 ċ32d(2R+l) n11-12d. We assume that mj>10(2R+l)n11-12d for all j, and

in fact this will be clear once the construction is finished.

Let's now suppose that for some j>2, wj has Property A. The idea is that we take

wj+1 to be a suitable subword of wj where Cj+1 is smaller than Cj, but Tj+1 is strictly

larger than Tj. To show this, we have to prove a couple of claims. Firstly, we claim

that any superoctant of Bj on which wj is purely periodic with periods less than n

and which contains Cj must be periodic with respect to each vi(1), i.e. must be in Tj.

Since mj>10(2R+l)n1-12d, clearly mj>4n for all n.

We now claim that if Tj contains any pair of opposite superoctants O¯ and O¯ of Bj,

then neither of the suboctants O and O associated to them can possibly be contained

in any element of Rj.

We again assume without loss of generality that O is the least suboctant lexicograph-

ically of Bj. In Figure 3.11, wj(O¯O¯) (the portion of wj shaded) is purely periodic

with periods vi(1), and U is the Γ2R+l which is changed to make the standard replace-

ment corresponding to SRj. We assume that OS and derive a contradiction.

We have denoted by C the region (US)+(Bk-S), i.e. C in Bj corresponds to

US in S. CO¯ because OS, and for some 1<i<d, it is the case that

106

Figure 3.11: An element S of Rj which contains O

C consists of points whose ith coordinate is between nj-kj+1 and nj. We then

choose ui which is a multiple of vi(1) such that C+ui is a subset of O¯ and U+ui

is a subset of O¯ which is disjoint from U (this can be done since vi(1) has length less

than n; some multiple of it is greater than kj, but still less than nj-kj2) and take

D=C+ui and E= (&|"1 S)+ui. It must be the case that filling Bj with a standard

replacement for wj in which no letters of BjU are changed and at least some letters

in US are changed can be done simultaneously with filling S with wj. Call the

word with shape BjS with this property u. We can restate our assumptions then

107

by saying u(BjU)=wj(BjU), u(US) wj(US), and u(S)=wj. Since

US in S corresponds to C in Bj, and since u(S)=wj, it must be the case that

u(US)=wj(C). By periodicity of wj(O¯) with respect to ui, wj(C)=wj(D).

Since D in Bj corresponds to E in S, and since u(S)=wj, wj(D)=u(E). Since

u(BjU)=wj(BjU) and E and U are disjoint, u(E)=wj(E). Finally, by period-

icity of wj(O¯) with respect to ui, wj(E)=wj(US). But we have then shown that

u(US)=wj(US), a contradiction. Therefore, if two opposite superoctants are

contained in Tj, then neither of the suboctants associated to them may be contained

in any element of Rj.

Bj 0j' Ojy

Pj'

oj¯ cj

Aj

oj Pj

Figure 3.12: Bj

Finally, we may make our inductive step. Consider any j>2 and wj with Property

A. By the earlier argument, we may then conclude that there are subcubes Pj and Pj

108

as defined above so that wj(Pj) and wj(Pj) are purely periodic with periods at most

n. However, in the proof of this, we made use of the fact that the suboctant Oj of

Bj corresponding to Pj is contained in many elements of Rj. Therefore, by the fact

just shown, one of Oj or Oj corresponds to a superoctant of Bj which is not in Tj.

First, we take a subword of wj, call it wj, obtained by removing the boundary of

thickness kj from Bj. In other words, we take wj to be the subword whose shape is

the central Γnj-2kj of Bj, which we call Bj.

Pj' BJ

cj

Aj

Pj

Figure 3.13: Bj

wj(Pj) and wj(Pj) are now subwords of wj occupying suboctants of Bj. By construc-

tion, wj(Pj) and wj(Pj) are also purely periodic with periods at most n. However,

109

one of them is associated with a superoctant which is not in Tj, and we may as-

sume that without loss of generality it is Pj. Also, since the size mj of Cj is at

least 10(2R+l) n1-12d , and since the size of Aj is at most 3(2R+l)n1-12d , the overlap

between Pj and Cj is a cube whose size is at least mj3. Denote this cube by Cj+1.

Bj+1 is defined to be a subcube of Bj which shares the same vertex with Bj that

Bj shares with Pj, with the correct size so that the overlap between Pj and Cj is

central. In doing this, we reduce the size of wj by at most mj. wj+1 is defined to be

wj (Bj+1). Then, since wj+1(Cj+1) is a subword of wj(Cj), it is also purely periodic

with respect to each vi(1), and by construction, Cj+1 is central in Bj+1, so indeed Cj+1

has the necessary properties. Each superoctant in Tj corresponds to a superoctant

in Tj+1, and wj+1(Pj) now occupies a superoctant of Bj+1, call it O, on which wj+1

is purely periodic with periods at most n. We claim that this means that O¯Tj+1

as well. For each 1<i<d, choose a period vi(j) of wj+1(O¯) which is a multiple

of ei whose length is less than n. For every j, as already mentioned, wj+1(Cj+1)

is periodic with respect to vi(1) for 1<i<d. Now, consider any pO¯ such that

p+vk(1)O¯ for some fixed 1<k<d. Since the size mj+1 of Cj+1 is much greater

than n, there exists some linear combination i=1daivi(j) with aiZ, call it v, such

that p+vCj+1 and (p+v)+vk(1)Cj+1. Then, wj+1(p)=wj+1(p+v) since v is

a sum of periods for wj+1(O¯). wj+1(p+v)=wj+1(p+v+vk(1)) since wj+1(Cj+1) 1s

periodic with respect to vk(1) Finally, wj+1(p+v+vk(1)) =wj+1(p+vk(1)) since v is

a sum of periods for wj+1(O¯), and therefore is itself a period of wj+1(O¯). Then, by

definition, wj+1(O¯) is periodic with respect to vk(1) as well, and since k was arbitrary,

with respect to every vi(1) for 1<i<d. By definition, this shows that O¯Tj+1.

110

Since O¯ corresponds to a superoctant of Bj that is not in Tj, Tj+1 contains at least

one more element than Tj, and mj+1>mj3. We again move Bj in space so that it

lies entirely within the positive octant of Zd and has one vertex at (1, ..., 1). Since

m2>10 ċ32d(2R+l)n1-12d, this implies that mj>10(2R+l)n1-12d for 1<j<2d.

We will show that this induction never need proceed beyond j=2d, and so in the

process will justify the claim already used that mj>10(2R+l)n1-12d for all j that

we consider.

If each wj for 1<j<2d has Property A, this means that T2d+1 must have more

than 2d elements. But, there are only 2d superoctants of B2d+1, and so we have a

contradiction. This implies that some wj must not have Property A, which we call

w. w is created by at most 2d truncations of w, each of which reduces the size by at

most k2+m2, and so since k2<m2 for large enough n, the size m of w is at least

n-1O ċ 2d+132d(2R+l) n1-12d. Since for large n, 2R+l <clnn for some constant

c, we can then say that there exists N for which m>n-K In nċ n1-12d for n>N

and some constant K. Then, take N>N so that A lnn<n16d for n>N. This

implies that for n>N, m>n-n1-13d. Since w does not have Property A, there

exists a standard replacement of w, call it w, which agrees on the boundary with w

by definition of standard replacement, and with the property that replacing w by w

in some xX cannot possibly create a new occurrence of w. Thus, Theorem 3.3.1

is proved.

nt

We can prove as a corollary a version of Theorem 3.3.1 for any strongly irreducible

shift of finite type:

Corollary 3.3.7. For any strongly irreducible Zd-shift X of finite type t with uniform

filling length R, there exists N depending on X such that for any choice of wLΓn(X)

with n>N, there exists a subword w of w, with shape Γm, where m>n-n1-13d, and

wLΓm(X) so that w and w agree on the boundary of thickness t, and with the

property that replacing an occurrence of w by w in an element of X cannot possibly

create a nerv occurrence of w.

Proof: Fix X a strongly irreducible Zd-shift of finite type t and uniform filling length

R. Take the alphabet A(t) whose letters are the elements of LΓt(X), i.e. the words

in the language of X whose shape is Γt, and then define a map f from X to (A(t))Zd

where for any pZd, (f(x))(p) is the subword of x which occupies Γt+p, i.e. the

copy of Γt whose least vertex lexicographically is p. Then, f(X) is a Markov shift

with alphabet A(t) . The reader can check that f(X) is still strongly irreducible, but

with a uniform filling length of R+t-1. In short, this is because two shapes which

are a distance of j apart in an element of x correspond to shapes a distance of j+t-1

apart in f(x) for j C.

Since f(X) is a strongly irreducible Markov shift, by Theorem 3.3.1 there exists N

such that for any wLΓn(f(X)) with n>N, there exist wLΓm a subword of

w, with m>n-n1-13d, and w with the same shape as w and which agrees on the

boundary with w such that replacing w with w in xf(X) cannot result in the

creation of a new occurrence of w.

112

Now, take any word wLΓn(X) with n>N+t-1. Although f is defined

as a map from one subshift to another, it should be fairly clear that as defined,

it can also function as a map from LΓk+t-1(X) to LΓk(f(X)). So we can define

f(w)LΓn-t+1(f(X)). For ease of notation, we define n¯=n-t+1, so n¯>N.

Then, by Theorem 3.3.1, we can find vLΓm(f(X)) a subword of f(w), with

m>n¯-n¯1-13d, and v which agrees with v on the boundary so that replacing v

with v in any element of f(X) cannot create a new occurrence of w. f-1(v) and

f-1(v) are then subwords of w with shape Γm+t-1 which agree on the boundary

of thickness t. We wish to show that replacing f-1(v) by f-1(v) in some xX

cannot possibly result in a new occurrence of f-1(v). Suppose that this is not the

case. Then there is xX and two copies of Γm+t-1, call them S and S, such that

x(S)=f-1(v), x(S)f-1(v), and if we define xX by replacing x(S) by f-1(v),

then x(S)=f-1(v). If we apply f to all of the objects in the above description,

we arrive at a contradiction to the definition of v and v. It is therefore the case

that replacing f-1(v) with f-1(v) in any xX cannot create a new occurrrence of

f-1(v). Since v is a subword of f(w), f-1(v) is a subword of w. m>n¯-n¯1-13d,

and the size of the shape of f-1(v) is m+t-1 >n-n¯1-13d>n-n1-13d, and so we

are done.

Corollary 3.3.7 is the main tool in the proof of Theorem 3.1.22; very roughly, the

idea is that for each occurrences of w in some large word in X, we destroy this

occurrence of w by replacing the occurrence of w which is a subword of it by w.

113

The requirement that new occurrences of the destroyed words not appear during this

process is to ensure that the process of getting rid of these occurrences of w takes

as few steps as possible. It is natural to wonder why we go to the trouble of dealing

with subwords in the statement of Corollary 3.3.7; i.e., why is it not possible, instead

of dealing with the intermediate step of taking a subword of w, to just find a word

ww such that replacing w by w can never result in a new occurrence of w being

created? The answer is that there are examples of strongly irreducible shifts of finite

type X and words wL(X) for which this is impossible! Here is a quick example.

Consider, for any dimension d and any n>1, the full shift Y=Ω, and the word

wLΓn(Y) defined by w(v)=1 if v1=v2=...=vd=1, and w(v)=0 if 1<vi<n

for 1<i<d and vi>1 for some 1<i<d. w then has zeroes at every point of

Γn except the least lexicographically. We claim that for any ww, there is some

xY such that x(Γn)=w, and replacing x(Γn) by w creates a new occurrence of

w. Consider any ww. Suppose that w contains a one, i.e. there is some vΓn

such that w(v)=1. Since ww, we can assume that v(1,1, ..., 1). Consider

xY defined by taking x(1, 1, ..., 1)=1 and x(v)=0 for all other vZd . Then,

x(Γn)=w. Create a new xY by replacing x(Γn) by w. Then, it is not hard to

check that x(Γn+ (v- (1, 1, ..., 1)))=w, and that x(Γn+(v-(1,1, ..., 1))) is the

word consisting of all zeroes, and so not equal to w. This means that the replacement

involved in changing x to x created a new occurrence of w. The only other possibility

for w is that w is the word consisting of all zeroes, i.e. w(v)=0 for all vΓn. If

this is the case, then define xY by taking x(0, 1, ..., 1)=x(1,1, ..., 1) =1 and

x(v)=0 for all other vZd . Again, x(Γn)=w. Create xY by replacing x(Γn)

114

by w. Then, it is not hard to check that x(Γn-e1)=w, and that x(Γn-e1) had two

ones, and is thus not equal to w. This means that in this case also, the replacement

involved in changing x to x created a new occurrence of w. We have then shown

that for any ww, it is possible that a replacement of w by w could create a new

occurrence of w, and therefore the extra step of taking subwords in Theorem 3.3.1

and Corollary 3.3.7 is in fact necessary.

3.4 The proof of the main result

Proof of Theorem 3.1.22: We take X to be a strongly irreducible shift of finite

type t. For any word w in LΓn(X), we call Xw the shift of finite type resulting

from removing w from the language of X. We begin by proving an upper bound

on htop(X)-htop(Xw) for sufficiently large n. By Corollary 3.3.7, as long as n is

sufficiently large, there exists wLΓm(X) a subword of w with m>n-n1-13d and

w with the same shape as w and which agrees with w on Γm(t) with the property

that replacing w by w cannot create new occurrences of w.

For any k>m, we will define a mapping φk : LΓk+4m(X)LΓk(Xw). φk will

actually be defined as a composition of three maps: (yk : LΓk+4m(X)LΓk+4m(X),

βk : (yk(LΓk+4m(X))LΓk+4m(Xw), and γk : ( βk(yk)(LΓk+4m(X))LΓk(Xw). To

define these we need a definition:

Definition 3.4.1. Trvo overlapping occurrences of the word w which occur in copies

S and S of Γm are said to have overlap of type B if |S -S|>m2-2n1-13d.

115

We point out an important property of this definition: given a fixed occurrence of w,

there are at most 3dmd copies of Γm which could be filled with an overlapping w, and

so there are at most 3dmd words which are made up of a pair of w whose overlap is of

type B. Call these words u1 , u2, . . . , ub, where b<3dmd, and call Si the shape of ui for

any 1<i<b. Given any uLΓk+4m(X), we define (yk(v) by finding each occurrence

of any of the ui within u, and in each one, replacing the first lexicographically of

the two occurrences of w which make up ui by w. In order to make this operation

well-defined, we must specify an order for these replacements to be done in; let us

just choose the usual lexicographic order, and search for and replace each ui in turn.

(This means that we first replace the lexicographically first occurrence of u1, then

find the new lexicographically first occurrence of u1 and replace it, and continue until

no u1 remain. We then perform the same procedure for u2, u3, etc.) Since replacing

w by w can never create a new occurrence of w, the resulting Γk+4m-word, which

we call (yk(u), contains no ui, i.e. contains no pair of occurrences of w with overlap

of type B.

For any (yk(u)(yk(Γk+4m(X)), we define ( βk(yk)(u) as follows: find the first

occurrence of w lexicographically in (yk(u), and replace the w which is a subword of

this occurrence of w by w. Denote by i the word that w becomes when its subword

w is replaced by w. Repeat this procedure until there are no occurrences of w left,

and call the resulting Γk+4m word ( βk(yk)(u). There is one fact that we must check;

namely that when performing any of these replacements, we do not accidentally create

a new occurrence of w. For a contradiction, assume that a particular replacement of

w by w could create a new occurrence of w.

116

Figure 3.14: Intersecting occurrences of w

In Figure 3.14, S is a copy of Γn which is filled with w, T is a copy of Γm which is

filled with w, U is the Γ2R+l on which letters are changed to change the word on T

from w to w, S is a copy of Γn which will be filled with w once the replacement

is done, T is the copy of Γm in S corresponding to T in S, which will therefore be

filled with w once the replacement is done, and Aj is the copy of Γ3(2R+l)nj1-12d or

Γ3(2R+l)(nj1-12d-1) central in S in which U must lie, which comes from the proof

of Theorem 3.3.1. For the same reasons as in the proof of Theorem 3.3.1, S cannot

contain all of U, but must contain some nonempty subset of U. Since the replacement

cannot possibly create a new occurrence of w, T must have already been filled with

117

w before the replacement occurred. This implies that before the replacement, T

and T were filled with overlapping occurrences of w. We claim that these two

occurrences had overlap of type B. Since UAj, and since the size of Aj is at most

3(2R+l) n1-12d, the distance in the ei-direction between the center of U and the

center of Aj is less than 32(2R+l)n1-12d for each 1<i<d. Aj is central in S, so

the center of Aj is the same as the center of S. Since T is a subcube of S whose

size is at most n1-13d shorter the distance in the ei-direction between the center of S

and the center of T is less than 12n1-13d for each 1<i<d. For the same reason, the

distance in the ei-direction between the center of S and the center of T is less than

12n1-13d for each 1<i<d. Finally, since U intersects the boundary of S, and since

U has size 2R+l, there exists 1<i<d for which the distance in the ei-direction

between the center of S and the center of&is between m2-(R+l2) and m2+(R+l2).

Putting all of these facts together, we see that there exists 1<i<d so that the

distance in the ei-direction between the center of T and the center of T is between

m2-32(2R+l)n1-12d-n1-13d-(R+l2) and m2+32(2R+l)n1-12d+n1-13d+(R+l2), which

implies that for large enough n, it is between m2-2n1-13d and m2+2n1-13d, which

implies that before the replacement, T and T were indeed filled with occurrences

of w with overlap of type B. However, (yk(u) contained no pair of occurrences of w

with overlap of type B, and so since all replacements in the definition of βk involve

replacing some w with w, no new occurrences of w can be created during these

replacements. Therefore, there can never be a pair of occurrences of w with overlap

of type B during the application of βk to a member of (yk(Γk+4m). We therefore have

a contradiction. Our original assumption was wrong, and none of these replacements

ns

can create a new occurrence of w. So, when these replacements are finished, we have

a word with shape Γk+4m, which we called ( βk(yk)(u), which has no occurrences of

w.

The definition of γk is much simpler: for any uΓk+4m(X), γk(u) is the subword of u

occupying its central Γk. For any uΓk+4m(X), we define φk(u)=(γkβk(yk)(u).

We now claim that φk(u) is in L(Xw), i.e. can be extended to a configuration on Zd

which is in X and contains no occurrences of w. To do this, we inductively define

a sequence of words dj, where djLΓk+4jm(X) and each dj has no occurrences of

w. Take d0=φk(u) and d1=(βk(yk)(u). For any dj for j>0, define dj+1 as

follows: since djL(X), it can be extended to an infinite configuration in X. Use

this fact to create a word djLΓk+4(j+1)m(X) which has dj as the word occupying

its central Γk+4jm. We then create dj+1 by performing the same replacements as

in the definitions of (yk and then βk, in other words first finding and replacing any

occurrences of ui for any 1<i<b, and then finding and replacing any occurrences

of w. For the same reasons as above, dj+1 has no occurrences of w. Since dj+1 agrees

on the boundary of thickness t with dj, dj+1L(X), completing the inductive step

and allowing us to define dj for all j. We also note that for any j, since dj contained

no occurrences of w or ui for any 1<i<b, any occurrences of these words in dj

must have had nonempty overlap with the newly created portion, i.e. must have been

contained in the boundary of thickness 4m of Γk+4(j+1)m. As a result, dj+1 and dj

must agree on their respective central Γk+4(j-1)m, since no replacements could have

affected that portion. This means that we may define the "limit" of the dj : for each

119

j, since djL(X), there exists zjX with zj(Γk+4jm)=dj. Since dj+1 and dj agree

on their respective central Γk+4(j-1)m, the limit of the zj exists, call it x. Since X

is a subshift, it is closed and so xX. Since the central Γk of each dj is equal to

φk(u), φk(u) is a subword of x. Finally, since each dj has no occurrences of w and x

is a limit of the dj, x has no occurrences of w. This means that φk(u)LΓk(Xw) as

claimed.

We have now defined φk on all of LΓk+4m(X). For any ε>0, we can then deal

with the restriction of φk to Ak+4m,ε,3n(X). Since the shape of w is ΓnΓ3n, any

element of Ak+4m,ε,3n(X) has fewer than (k+4m)d(1HΓn-2R(X)+ε) occurrences of w,

and since each ui has shape a subset of Γ3n, any element of Ak+4m,ε,3n(X) has fewer

than (k+4m)d(1H(X),sisi(R), +ε) occurrences of ui for each 1<i<b. By Lemma 3.2.1,

HΓn-2R(X)>ehtop(X)(n-2R)d We now bound HSiSi(R)(X) in a similar fashion. First

let's look at a picture of SiSi(R) for any 1<j<b. Recall that Si is the union of

two overlapping copies of Γm, so SiSi(R) is the union of two overlapping copies of

Γm-2R concentric with the original copies of Γm. By the definition of type B overlap,

there exists 1<i<d for which the distance in the ei-direction between the centers

of these two Γm-2R is between m2-2n1-13d and m2+2n1-13d.

In Figure 3.15, we denote the directio n in question by ei, the two copies of Γm-2R

by S and S, and the distance in the ei-direction between the centers of S and S

by c. We then define T to be a rectangular prism which is a subset of S a distance

of R+1 away from T. The sizes of T are m-2R in every direction but ei, and

c-R>m-2n1-13d-R in the ei direction

120

1ei

Figure 3.15: SiSi(R)

Clearly by just taking very large m we can assume that this dimension is greater

than m-3n1-13d. Since d(T, S)=R+1 and T, SSiSi(R), by strong irreducibil-

ity HSiSi(R)(X)>HT(X)HS(X)>ehtop(X)[(m-2R)d-1(m2-3n1-13d)+(m-2R)d]. R, t, and

n1-13d are all small in relation to m for large enough m, and m asympotically ap-

proaches n as n, so for large n, this bound is greater than ehtop(X)1.4nd Using

these bounds, we can rewrite the above statement about Ak+4m,ε,3n(X) a little more

easily: every element of Ak+4m,ε,3n(X) has fewer than (k+4m)d(e-htop(X)(n-2R)d+ε)

occurrences of w and fewer than (k+4m)d(e-1.4htop(X)nd+ε) occurrences of any ui

for 1<i<b.

121

We now prove the upper bound on htop(X)-htop(Xw). We do this by proving

upper bounds on |αk-1(v)Ak+4m,ε,3n(X)| for any v(yk(LΓk+4m(X)), on |βk-1(v)

(yk(Ak+4m,ε,3n(X)) | for any vLΓk(X), and on |γk-1(v)| for any vLΓk(Xw). The

first upper bound is fairly straightforward: consider any uAk+4m,ε,3n(X). Since

b<3dmd, the total number of letters in u which are part of an occurrence of any

ui is at most (2md)(3dmd)(k+4m)d(e-1.4htop(X)nd+ε), which for large m is less than

(k+4m)d(e-1.3htop(X)nd+3d+1m2dε). Now, consider any v(yk(LΓk+4m(X)). By

the previous reasoning, any uαk-1(v)Ak+4m,ε,3n(X) differs from v on less than

(k+4m)d(e-1.3htop(X)nd+3d+1m2dε) letters, and so the number of possible u is at most

|A|(k+4m)d(e-13htop(X)nd+3d+1m2dε)i=0+3d+1m2dε)(k+4m)d(e-13htop(X)nd

<|A|(k+4m)d(e-13htop(X)nd+3d+1m2dε) (k+4m)d

<e(k+4m)de-htop(X)(n-2R)d+3d+1m2dεln|A|

for large values of n.

The second upper bound, the one on |βk-1(v)αk(Ak+4m,ε,3n(X))| for any vLΓk(X),

is slightly more difficult. Consider any u(yk(Ak+4m,ε,3n(X)). Since u is in the image

of (yk, u has no occurrences of any ui . Consider any pair of overlapping occurrences of

w in u, say at S and S copies of Γn. Then these occurrences of w contain occurrences

of w as subwords, say at T and T copies of Γm with TS, TS, and T-T=

S-S. Also define U to be the copy of Γ2R+l in T which would be changed to change

u(T) to w, and U to be the corresponding copy of Γ2R+l in T. Since there are no

122

occurrences of ui, it must be the case that |T -T|<m2-2n1-13d. This implies

that T contains the central Γ3(2R+l)n1-12d of S for large n, and so that T contains U.

Similarly, T contains U. Therefore, if u(T) is replaced by w sometime during the

replacements defining βk, then u(S) will be changed to something other than w, and if

u(T) is replaced by w at some point in these replacements, then u(S) will be changed

to something other than w. In other words, in any replacement involved in changing

u to βk(u) , if u(S) is changed from w to w¯, then all occurrences of w with nonempty

overlap with S are also changed. Since new occurrences of w cannot be created during

these replacements, this implies that (βk(u))(S)=i for any such S. Note that we

have also shown that the copies of Γn where replacements occur in the application of

βk must be disjoint. We now wish to bound from above the number of occurrences

of i in βk(u) for any u(yk(Ak+4m,ε,3n(X)). By the definition of Ak+4m,ε,3n(X), for

any uAk+4m,ε,3n(X), u has at most (k+4m)d(e-htop(X)(n-2R)d+ε) occurrences of

w¯. We now claim that each of the replacements involved in (yk and βk could create at

most 9d new occurrences of w¯, which rests on an aperiodicity property of w¯: namely,

for large n, and for any xX with x(S) =x(S)=i for S and S copies of Γn,

|S-S|>n3.

To show this, recall that in the definition of w, we ensure that w contains a sub-

word aLΓ2R+, (X) which is not a subword of w. In addition, a lies in the central

Γ3(2R+l)n1-12d in w. Also recall that w is asubword of w whose shape has size at

least n-n1-13d. Together, these imply that for large n, i contains a as a subword of its

central Γn4. Since i and w agree outside this central Γn4, and since a is not a subword

of w, this implies that any occurrences of a in i have nonempty intersection with its

123

central Γn4, and that there is at least one such occurrence of a. Now consider xX

such that x(S)=x(S)=w¯, where SS. Also consider T a copy of Γ2R+l such that

x(T)=a. There are two possibilities: either T lies inside S or T does not lie inside

S. If T lies inside S, then since x(T)=a, T must have nonempty intersection with

the central Γn4 of S. Again since x(T)=a, T must have nonempty intersection with

the central Γn4 of S as well. This implies that |S-S|<n4+(2R+l). However,

then S-S is a period of w¯. Since 0|S-S|<n4+(2R+l) <n3 for large

n, there is some positive integer k so that T+k(S-S) lies in S, but has empty

intersection with the empty central Γn4 of S. But by periodicity of i with respect to

S-S, x(T+k(S-S))=x(T)=a, which gives a contradiction. Therefore, it must

be the case that T does not lie inside S. Since T has nonempty intersection with the

central Γn4 of S, this implies that |S-S|>3n8-(2R+l) >n3 for large n.

Now consider any replacement of w by w in the replacements defining (yk and βk,

and suppose that this replacement occurs at S a copy of Γm. Specifically, suppose

that v, v LΓk+4m(X) where v(S)=w, v(S)=w, and v and v agree outside of S.

Also assume that T a copy of Γn is a location for an occurrence of i newly created

in this replacement, i.e. v(T)=w¯, v(T)w¯. Since v and v agree outside S, S and

T must have nonempty intersection, meaning that any possible location for T given

a fixed S lies in a copy of Γm+2n concentric with S. However, by the aperiodicity

fact about i above, any T, T both of which contain newly created occurrences of

i must satisfy |T-T|>n3. This means that at each replacement, there are no

more than 9d newly created occurrences of w¯. We now bound from above the total

number of replacements performed in applying (yk to uAk+4m,ε,3n(X). As already

124

shown, for large n the total number of occurrences of any ui in uAk+4m,ε,3n(X) is

smaller than 3dmd(k+4m)d(e-1.4htop(X)nd+ε), and so the total number of occurrences

of i created during these replacements is less than (27m)d(k+4m)d(e-1.4htop(X)nd+

ε), which is less than (k+4m)d(e-htop(X)(n-2R)d+(27m)dε) for large n. We now

bound from above the total number of replacements performed in applying βk to

(yk(u). For this, we need to bound from above the total number of occurrences

of w in (yk(u). u itself had at most (k+4m)d(e-htop(X)(n-2R)d+ε) occurrences of

w by definition of Ak+4m,ε,3n(X). While changing u to (yk(u), less than 3dmd(k+

4m)d(e-1.4htop(X)nd+ε) replacements are made. Each one of these could create at

most (m+2n)d new occurrences of w by the same reasoning we used for i (we

just don't have any aperiodicity facts about w to use.) Therefore, (yk(u) has less than

(k+4m)d(e-htop(X)(n-2R)d+ε)+(m+2n)d3dmd(k+4m)d(e-1.4htop(X)nd+ε) occurrences

of w, which is less than (k+4m)d(2e-htop(X)(n-2R)d+(12m)dε) for large n. This

means that the number of replacements involved in changing (yk(u) to (βk(yk)(u)

is less than (k+4m)d(2e-htop(X)(n-2R)d+(12m)dε) as well. Since each of these can

create at most 9d new occurrences of w¯, the number of occurrences of i created

during these replacements is less than (k+4m)d(2ċ9de-htop(X)(n-2R)d+(108m)dε).

So, the total number of occurrences of i created in the process of changing u to

( βk(yk)(u) is less than (k+4m)d((2ċ9d+1)e-htop(X)(n-2R)d+((108m)d+(27m)d)ε).

Since u itself contained at most (k+4m)d(e-htop(X)(n-2R)d+ε) occurrences of i

by definition of Ak+4m,ε,3n(X), this means that ( βk(yk)(u) contains less than (k +

4m)d(20de-htop(X)(n-2R)d+(216m)dε) occurrences of w¯.

We now collect the three facts we need to bound |βk-1(v)(yk(Ak+4m,ε,3n(X))| from

125

above for any vLΓk(X). Firstly, for any uAk+4m,ε,3n(X), we know that for any

S a copy of Γn which is the location of a replacement during the process of changing

(yk(u) to (βk(yk)(u), ((βk(yk)(u))(S)=w¯. In other words, once w is changed to i

during the application of βk to (yk(u), that occurrence of i will not be changed during

future replacements. Secondly, all replacements perfromed in the application of βk

to (yk(u) are disjoint. This means that knowing ( βk(yk)(u), along with knowing the

locations of all copies of Γn where replacements occur in changing (yk(u) to (βk(yk)(u),

is enough to uniquely determine (yk(u). Finally, ( βk(yk)(u) contains fewer than

(k+4m)d(20de-htop(X)(n-2R)d+(216m)dε) occurrences of w¯. This implies that for any

vLΓk(X), |βk-1(v)(yk(Ak+4m,ε,3n(X))|<2(k+4m)d(20de-htop(X)(n-2R)d+(216m)dε), the

total number of subsets of the locations of occurrences of i in v which could have

been the locations of replacements in the application of βk.

Finally, we give an upper bound on |γk-1(v)| for any vLΓk(X), which is straight-

forward: any uγk-1(v') must have its central Γk filled with v. This means that

|γk-1 (v) |<|A|(k+4m)d-kd<|A|5dmkd-1 for large k.

Putting all of these upper bounds together, we see that |φk-1(v)Ak+4m,ε,3n(X)|<

e(k+4m)de-htop(X)(n-2R)d+ln|A|3d+1m2dε . 2(k+4m)d(20de-htop(X)(n-2R)d+(216m)dε) . |A|5dmkd-1

<e(k+4m)dCe-htop(X)(n-2R)d+Dε

for constants C independent of n and k and D independent of k. This implies that

HΓk(Xw)>|Ak+4m,ε,3n(X)|e(k+4m)dCe-htop(X)(n-2R)d+Dε.

126

We then take natural logarithms of both sides, divide by (k+4m)d, and let k

to get

limkkd(k+4m)dhΓk(Xw)kd>limkln|Ak+4m,ε,3n(X)|(k+4m)d-Ce-htop(X)(n-2R)d-Dε,

and by using the definition of entropy and Corollary 3.2.8, we are left with

htop(Xw)>htop(X)-Ce -htop(X)(n-2R)d-Dε.

ε was arbitrary, so we may let it approach zero, leaving

htop(X)-htop(Xw)<Cehtop(X)(n-2R)d .

A different tactic must be used to prove a lower bound for htop(X)-htop(Xw). We

in a sense proceed in the opposite way from our upper bound: we will define a

map ψk which sends any word in LΓk(Xw) to a subset of LΓk(X) such that for any

uuLΓk(Xw), ψk(u) and ψk(u) are disjoint. This map will create occurrences of

w instead of eliminating them.

We first need another auxiliary combinatorial theorem, again about a sort of lack of

periodicity.

Theorem 3.4.2. For any d>1 and any strongly irreducible Zd-shift X of finite

type t with uniform filling length R, there exists a constant W dependent on X such

that for all suffiffifficiently large j, there is a word wj,dLΓjW(2W)(X) with the following

aperiodicity condition: there cannot exist trvo overlapping occurrences of wj,d such

that one has nonempty intersection with the empty central Γ(j-4)W of the other.

127

Figure 3.16: Disallowed and allowed pairs of overlapping wj,d

Proof: To make the statement of Theorem 3.4.2 more clear, we refer to Figure 3.16,

which gives an example of a pair of overlapping occurrences of wj,d which is ruled out

by the given aperiodicity condition (left) and a pair which is not. (right)

We begin by giving a concrete example, for any fixed d and j>3d+5, of an aperiodic

word aj,d in {0, 1}Γj . For any pΓj, we define aj,d(p)=1 if any pi for 1<i<d-1

is not equal to 1 or j. This leaves only aj,d(p) where each of p1, p2, . . . , pd-1 are 1 or j

to be defined. We think of this undefined portion as a set of 2d-1 one-dimensional j-

letter words: for every (p1, p2, ..., pd-1){1, j}d-1, we think of the yet-to-be-defined

aj,d(p1, p2, ..., pd-1,1), aj,d(p1,p2, ...,pd-1,2), ...,

aj,d(p1, p2, ..., pd-1, j) as the letters of a j-letter word. Now, to fill these portions in,

we define 2d-1j-letter words, which we will call h0, h1 , . . . , h2d-1-1 , such that no two

may overlap each other, i.e. if the final k letters of hi are equal to the initial k letters

128

of hi for some k>0, then k=j and i=i. We do this in the following way: the first

four letters of any hi are defined to be 0001, and the final four letters are defined to

be Ottt. The 3d-3 letters following the initial 0001 in any hi are defined as follows:

concatenate d-l th ree-letter words, determined by i'sd-1-digit binary expansion:

each 0 in the binary expansion corresponds to the word 001, and each 1 in the binary

expansion corresponds to Of 1. The remaining j-(3d+5) letters of hi which precede

the final Ottt are alternating zeroes and ones, beginning with a zero.

An example should be helpful: suppose that d=3 and j=18. Then we create

four 18-letter words h0, h1, h2, and h3. The initial four letters of h0 are 0001. The

next 3d-3=6 letters are dependent on the two-digit binary expansion of the

subscript 0: since it is 00, the next six letters are 001001. The next j-(3d+5)=4

letters are alternating zeroes and ones beginning with a zero, i.e. 0101, and the

final four letters are Otll. This gives h0=000100100101010111. Using similar

reasoning, we see that h1=000100101101010111, h2=000101100101010111, and

h3= 000101101101010111.

We claim that this set of 2d-1 words has the nonoverlapping property described earlier.

Suppose that for some k>0 and 0<i, i<2d-1, the initial k letters of hi are the

same as the final k letters of hi. k cannot equal 1 or 2, because the first k letters

of hi would then be 0 or 00, and hi does not end with either of those words. So,

k>3. Then the initial k letters of hi begin with 000, and since the only place that

000 occurs in hi is at the beginning, this implies that k=j, and since {hi}i=02d-1-1

contains 2d-1 distinct words, that i=i.

129

We now, for every choice of (p1,p2, ...,pd-1){1, j}d-1, choose an hi, and define

aj,d(p1, p2, ..., pd-1, i)=hi(i) for 1<i<j. Assigning the 2d-1hi to the 2d-1 choices

for (p1, p2, ..., pd-1) in a one-to-one way completes the definition of aj,d. We now make

the claim that aj,d is aperiodic. Suppose not; then there exists v= (v1, v2, ..., vd)0

with |vi|<j for 1<i<d such that aj,d(r)=aj,d(r+v) for all rΓj such that

r+vΓj. Since -v could serve as a period as well, we may assume without loss of

generality that vd>0. We choose a vertex q of Γj,d based on v: for each 1<i<d,

if vi>0, qi=1. If vi<0, qi=j. In this way, we ensure that q+vΓj, and

therefore that aj,d(q)=aj,d(q+v). Since qi{1, j} for 1<i<d-1 and qd=1,

by construction aj,d(q)=0. Therefore, aj,d(q+v)=0. However, again due to

construction, the only zeroes in aj,d lie at points whose first d -1 coordinates are

either 1 or j. This implies that the first d-1 coordinates of q+v are either 1 or j.

Let's denote by hi the j-letter word where hi(k)=aj,d(q+(k-1)ed) for 1<k<j,

and by hi the j-letter word where hi(k)=aj,d(q+v+ (k-1 -vd)ed). Due to the

supposed periodicity of aj,d, the first j-vd letters of hi are the same as the final

j-vd letters of hi. Since no two distinct words in {hn}n=12d-1 may overlap, vd=0 and

hi=hi. This implies that the first d -1 coordinates of q are the same as the first

Z-t coordinates of q+v, and therefore that the first Z-t coordinates of v are zero,

implying that v=0, a contradiction. Thus, aj,d is aperiodic.

We now use aj,d-1 as a tool to create, for any j>3d+2, a word bj,d{0,1}Γj(2) with

the aperiodicity property outlined in the conclusion of Theorem 3.4.2: there cannot

exist two overlapping occurrences of bj,d such that one has nonempty intersection

with the empty central Γj-4 of the other. bj,d(p) is defined to be 0 if pd=1 or 2,

130

and defined to be 1 if 3<pd<j-1. If pd=j, then bj,d(v)=aj,d-1(p1, ...,pd-1).

Suppose that bj,d is periodic with respect to v0. Since our supposed overlap is of

the form described it must be the case that |vi|<j-2 for 1<i<d. We can again

assume without loss of generality that vd>0. Choose a point q of Γj(2) as follows:

for each 1<i<d-2, take qi=1 if vi>0, and qi=j - 1 if vi<0. Choose

qd-1=j-1 -vd-1 if vd-1>0, and qd-1=1 -vd-1 if vd-1<0. Finally, take qd=1.

For any rZd all of whose coordinates are zero or one the first Z-t coordinates of

q+r are between 1 and j, and the dth coordinate is 1 or 2, implying that q+rΓj(2).

This means that a copy of Γ2 with its least vertex lexicographically at q is a subset of

Γj(2), call it S. Since S is composed entirely of points whose dth coordinate is 1 or 2,

bj,d(S) is filled with zeroes. Again, for any point r all of whose coordinates are zero or

one, the first d-2 coordinates of q+v+r are between 1 and j, the Z-dth coordinate

is 1, 2, j-1, or j, and the dth coordinate is between 1 and j. Therefore, any such

q+v+r is in Γj(2), rmplying that a copy of Γ2 with its least vertex lexicograhically at

q+v is a subset of Γj(2) as well, which is then S+v. By periodicity, bj,d(S+v) must

be filled with zeroes as well. However, by construction, for any copy of Γ2 which is

filled with zeroes in bj,d, the dth coordinate of its least vertex lexicographically is 1.

Since qd=1, it must be the case that vd=0. This implies that aj,d-1 is periodic

with period (v1, v2, ..., vd-1), which implies that vi=0 for 1<i<d-1 as well, and

so v=0, a contradiction. Therefore, bj,d has the desired aperiodicity property. Note,

however that this does not prove Theorem 3.4.2, since bj,d is a word in the full shift,

and there is no reason for it to be in L(X).

We now turn to our shift of finite type X. First, we will be constructing a word

131

yLΓR+4(X) for which Xy is nonempty. Obviously, finding a word y such that Xy

is nonempty could be done by using the already proven upper bound for htop(X)-

htop(Xy), but this may require y to have shape with very large size. We need the

shape of y to have a small size to prove as tight a lower bound as possible, and for

our proof, we need a lower bound on HΓk(X) for relatively small k. For large k,

Lemma 3.2.1 gives an exponential lower bound for HΓk(X), but doesn't really tell us

anything about small k. For this reason, we prove the following lemma:

Lemma 3.4.3. For any Zd-subshift X with htop(X)>0, and finite S Zd, HS(X)>

|S|.

Proof: We write S={s1, s2, ..., s|S|} where si comes before si+1 in the usual lex-

icographical order for 1<i<|S|, and make the notation Si={s1, ..., si} for all

1<i<|S|. Suppose that HS(X)<|S|. Then, since HS1(X)=|A|>1, it must

be the case that for some 1<j<|S|, HSj+1(X)<HSj(X). Since SjSj+1, this

means that for every word wLSj(X), there is a unique way to extend it to a word

in LSj+1(X). In other words, for any xX, given x(Sj), x(sj+1) is forced. By shift-

invariance of X, for any xX, given x(Sj-sj+1), x(0) is forced. Since S is finite,

take N>diam(S). Then, Sj-sj+1 consists of elements of Zd within a d-distance

of less than N from 0 and lexicographically less than 0. For this reason, we make

the notation Hn,d={ v(Γ2n-1 - n) : v is less than 0 lexicographically}, where

n= (n, n, ..., n). It is then clear that Sj-sj+1HN,d, and so we note that x(HN,d)

forces x(0). We need a few more pieces of notation; In,d={v(Γ2n-1-n) : vd<0},

Gn,k,d={ vZd : vi>0 for 1<i<d, i=1dni-1vi<k}, and Ln,k,d=(Γk+n-

132

n)Γk. We quickly note a few useful facts about these sets which can be checked by

the reader: Hn,d+1=In,d (Hn,d×{0}), and Gn,k,d+1=iLk=1nd(Gn,k-ind,d×{i}). a e

now claim that if x(Hn,d) forces x(0) for all xX, then x(Ln,k,d) forces x(Gn,k,d) for

any k. We prove this claim by induction on d.

For d=1, the claim is fairly easy to check (and is in fact a classical theorem due

to Hedlund and Morse ([HeM]) ); it amounts to showing that if any n consecutive

letters of any x force the next, then any n consecutive letters of x force the next k for

any k. But this is clear; if x(1), x(2), . . . , x(n) force x(n+1), then x(2), . . . , x(n+1)

force x(n+2), and we can proceed like this indefinitely. Thus, the claim is proven

for d=1. Now suppose that it is true for a fixed d, and we will prove it for d+1.

This proof will also be by induction; since we wish to show that x(Ln,k,d+1) forces

x(Gn,k,d+1), and since Gn,k,d+1=iLk=1nd(Gn,k-ind,d×{i}), it suffices to show that

x(Ln,k,d+1i=1j(Gn,k-ind,d×{i})) forces x(Gn,k-(j+1)nd,d×{j+1}) for every 0<

j<knd. Fix any such j, and suppose that x(Ln,k,d+1i=1j(Gn,k-ind,d×{i}))

is given. We will show that x(Gn,k-(j+1)nd,d×{j+1}) is forced. Consider any

v=(v, j+1) Gn,k-(j+1)nd,d×{j+1}. We claim that v+In,d+1Ln,k,d+1

i=1j(Gn,k-ind,d×{j}). Since In,d+1=(Γ2n-1-n)× {-n+1, ..., 0}, it suffices to

show that v+(Γ2n-1-n)(Ln,k,d+1i=1j(Gn,k-ind,d×{j}))(Zd×{i}) for

j-n<i<j. Since (Ln,k,d+1i=1j(Gn,k-ind,d×{j}))(Zd×{i})=(Ln,k,d

Γk)×{i} for -n+1<i<0, and (Ln,k,d+1i=1j(Gn,k-ind,d×{j})) (Zd×{i})=

(Ln,k,dGn,k-ind,d)×{i} for i>0, and since Gn,k-(i+1)nd,dGn,k-ind,dΓk for

i>0, it suffices to show that Gn,k-(i+1)nd,d+ ( Γ2n-1 - n) Ln,k,dGn,k-ind,d for

i>0. Consider any vGn,k-(i+1)nd,d and any v(Γ2n-1-n). By definition of

133

Gn,k-(i+1)nd,d, i=1dvini-1<k-(i+1)nd, and vi>0 for 1<i<d. By choice of v,

-n+1<vi<n-1 for 1<i<d. Therefore, -n+1 <(v+v)i for 1<i<d, and

i=1d(v+v)ini-1<(k-(i+1)nd)+(i=0d-1ni)<k-ind. There are then two cases; if

any coordinate of v+v is nonpositive, then v+vLn,k,d, and if all coordinates are

positive, then by definition, v+vGn,k-ind,d. So, indeed Gn,k-(i+1)nd,d+(Γ2n-1-

n) Ln,k,d Gn,k-ind,d, and so for any vGn,k-(j+1)nd,d×{j+1}, v+In,d+1

Ln,k,d+1i=1j(Gn,k-ind,d×{i}). This means that since Hn,d+1=In,d+1U(Hn,d×{0}),

for any v(Gn,k-(j+1)nd,d×{j+1}), x(v+(Hn,d×{0})) forces x(v). But now since

Ln,k-(j+1)nd,d×{j+1} Ln,k,d+1, by the inductive hypothesis x(Gn,k-(j+1)nd,d) 1s

in fact forced by x(Ln,k,d+1i=1j(Gn,k-ind,d×{i})). Since j was arbitrary here, as

described above this shows that x(Ln,k,d+1) forces x(Gn,k,d+1) as claimed, and so by

induction we see that for any d, if x(Hn,d) forces x(0) for all xX, then x(Ln,k,d)

forces x(Gn,k,d) for any k.

We finally return to our original example of X for which HS(X)<S. Recall that

we showed that there is then some N such that x(HN,d) forces x(0) for all x. By

the claim just shown, x(LN,k,d) then forces x(GN,k,d) for all k. This then shows that

HGN,k,d(X)<HLN,k,d(X) for any k. Note that ΓkdNdGN,k,d for all k. Therefore,

HΓn(X)<HGN,ndNd,d(X)<HLN,ndNd,d(X) for all n. Since |LN,ndNd,d|=(ndNd+

N)d-(ndNd)d<2Nd(ndNd)d-1=Cnd-1 for large n and a constant C independent

of n, we see that HΓn(X)<|A|Cnd-1 for all n, and so by definition of entropy,

htop(X)=limnlnHΓn(X)nd<limsupnCnd-1ln|A|nd=0. Therefore, htop(X)=0, a

contradiction to the hypotheses. Our initial assumption was therefore wrong, and so

HS(X)>|S| for all finite shapes S.

134

Before moving on with our proof, we must comment on Lemma 3.4.3. In the course

of the proof, we show that if for some total order < on Zd which is preserved under

addition, some finite set SZd, and some tS with t>s for all sS, x(S) forces

x(t), then the topological entropy of X is zero. It is natural to wonder whether the

assumption about ordering is necessary. In fact it is:

Lemma 3.4.4. For any finite SZd and tS such that there is no addition-

preserved total order on Zd with respect to which t is greater than all elements of S,

there exists a subshift X such that x(S) forces x(t) for all xX, but htop(X)>0.

Proof: Suppose that for some SZd finite and tS, it is the case that for no

addition-preserved total order on Zd is t greater than all elements of S. We claim

that this implies that t lies in the convex hull of S. To see this, suppose that t does

not lie in the convex hull of S. Then clearly there is a d-1-dimensional hyperplane

L of Rd such that t and S lie on opposite sides of L. Define the subspace L of Rd

which equals L-P for any pL. Without loss of generality, we may assume that

LZd=, since such an assumption requires only a small rotation of L. Then,

define an addition-preserved total order < on Zd by arbitrarily fixing one of the open

half-spaces H that L splits Rd into and defining v<v if v-vH {0}. Then,

since t and S are on opposite sides of L, for all sS, t-s is on the same side of L,

and so either t<s for all sS or t>s for all sS. By reversing < if necessary, we

may assume without loss of generality that t is greater than all elements of S under

<. This is a contradiction, and so t lies in the convex hull of S.

135

To make things easier, by shift-invariance, we can without loss of generality assume

that t=0:=(0, ..., 0). We will, for any S such that 0 lies in the convex hull of S,

construct a subshift X with positive topological entropy such that x(S) forces x(0)

for any xX. By decomposing the convex hull of S into simplices, we can, for some

1<d<d, find S a linearly independent subset of S of cardinality d+1 such that

the convex hull of S contains 0 and is contained in a d-dimensional subspace of Rd,

call it L. Denote the elements of S by s1, s2, ..., sd+1. Then there exist positive

reals n1 , . . . , nd+1 such that 0=i=1d+1nisi. Since all elements of S are in Zd, we can

take all ni to be rational, and so without loss of generality integers. This implies that

sd+1=i=1d-nind+1,si. Take N>nind+1,+1 for all 1<i<d. Take the d'-dimensional

parallelepiped R={i=1drisi : 0<ri<N} contained in L. One may then tessellate

L with copies of R:L=vZd(R+i=1dNvisi). (Intersecting with Zd on both sides

leads to a similar tessellation of LZd by copies of RZd.) If LRd<' then choose

linearly independent u1, ..., ud-d in Zd such that (LZd)+j=1d-d(Zuj)=Zd. We

will define X, a subshift with alphabet A={0,1} ×(RZd), such that x(S) forces

x(0) for any xX, which clearly implies that x(S) forces x(0) for any xX as well.

Choose any y Q. We define xX as follows: for any vZd and r(RZd),

define x(r+i=1dNvisi+j=1d-dvj+duj)=(y(v), r). Due to the already mentioned

tessellation of LZd by copies of RZd and tessellation of Zd by copies of LZd,

this defines x on all of Zd. Take X to be the set of all elements of Ω constructed in

this way. X is not necessarily shift-invariant, but by the construction, it is invariant

under shifts by Nsi for 1<i<d and uj for 1<j<d-d. Since the collection

{Ns1, Ns2, ..., Nsd, u1, ..., ud-d} is linearly independent, this means that a finite

136

union of shifts of X, call it X, is a shift-invariant set. It is also clear that X is closed,

and thus it is a subshift.

We first show that x(S) forces x(0) for any xX. By the definition of X, it is

sufficient to show that x(S+p) forces x(p) for any xX and pZd . Since pZd,

there exists some r(RZd) and vZd such that p=r+i=1dNvisi+j=1d-dvj+duj.

Since rR, r=i=1drisi for some positive reals ri. For any 1<i<d, r+si is

in R unless the si coefficient of r+si is greater than or equal to N. Therefore,

if r+siR for all 1 <i<d, then it must be the case that ri>N-1 for

all 1<i<d. But then since sd+1=i=1d-nind+1,si as noted earlier, and since

nind+1,<N-1 for all 1<i<d, r+sd+1=i=1d(ri-nind+1,)si, with all coefficients

in [0, N). This implies that r+sd+1R. We then showed that if r+siR for all

1<i<d, then r+sd+1R. In other words, there is some 1<j<d+1 such

that r+sjR. Fix this j. Then, p+sj=(r+sj)+i=1dNvisi+j=1d-dvj+duj.

Say that x(p+sj)=(r+sj, q) for some m{0,1}. Then by definition of X, since

p and p+sj lie in the same copy of R, it must be the case that x(p)=(r, q). This

means that x(p+sj) forced x(p) Since 1<j<d+1, this shows that x(S+p)

forces x(p) for any pZd, and so that x(S) forces x(0) for any xX.

It remains to show that htop(X)>0. Choose M >|Nsi|, |uj| for 1<i<d,

1<j<d-d. Then, R+{i=1dNkisi+j=1d-dkj+duj :|ki|<K}[-(K+

1)Md, (K+1)Md]d for any K. By the definition of X, there are at least 2|[-K,K]d|

ways to fill letters of x on R+{i=1dNkisi+j=1d-dkj+duj : |ki|<K}, and so we

see that H[-(K+1)Md,(K+1)Md]d(X)=HΓ2(K+1)Md+1(X)>2(2K+1)d This means that

137

hΓ2(K+1)Md+1(X)(2(K+1)Md+1)d>(ln2)(2K+1)d(2(K+1)Md+1)d>1(4Md)d

for all K, and therefore, by letting K approach infinity, we see that htop(X)>1(4Md)d>

0.

We claim that for a strongly irreducible shift of finite type X, HΓR+4(X)>(2R+4)d.

We define S1={1, 2} ×{1, 2, ..., R+4}d-1ΓR+4 and S2={R+3, R+4}×

{1, 2, ..., R+4}d-1ΓR+4. Then d(S1, S2)>R, and so by strong irreducibility

HΓR+4(X)>HS1(X)HS2(X), which is greater than 4(R+4)2d-2 by Lemma 3.4.3.

Since R+4>4,

4 (R+4)2d-2>4d-1(R+4)d>2d(R+4)d=(2R+8)d>(2R+4)d .

Therefore, HΓR+4(X)>(2R+4)d. Consider any yLΓ3R+7(X). Then, since there

are (2R+4)d copies of ΓR+4 contained in Γ3R+7, there are at most (2R+4)d different

words in LΓR+4(X) which are subwords of y. Since HΓR+4(X)>(2R+4)d, this means

that there is zLΓR+4(X) such that z is not a subword of y. Then, again using strong

irreducibility, we create xX such that for any vZd, x(ΓR+4+(2R+4)v)=z.

(See Figure 3.17.) Then, for any S a copy of Γ3R+7, x(S) contains a z, and therefore

x(S)y. Thus, y is not a subword of x, and so Xy contains at least one point and

1s nonempty.

138

R+3R+l

Figure 3.17: A point xXy

We will use y and bj,d to create, for W=8R+14, and for any j>3d+2, a word

wj,dLΓjW(2W)(X) such that there cannot exist two overlapping occurrences of wj,d

where one has nonempty intersection with the empty central Γ(j-4)W of the other.

To do this, we first partition ΓjW(2W) into disjoint copies of ΓW. The disjoint copies of

ΓW then have an obvious bijective correspondence to tthllee points of Γj(2), illustrated in

Figure 3.18.

We then use each entry of bj,d to assign entries of wj,d to the corresponding ΓW in

ΓjW(2W) For pΓj(2), if bj,d(p)=0, then the least copy lexicographically of ΓW-R in

the ΓW corresponding to p is filled with any word in LΓW-R(Xy), i.e. a word without

any occurrences of y. If bj,d(p)=1, then the least copy lexicographically of ΓW-R in

the ΓW corresponding to p has 2d occurrences of y placed inside it, each one sharing

a vertex with it.

139

The remainder of I jW(2W) is then filled to make a word in L(X) by using strong irre-

ducibility, since all of the filled portions are a distance of at least R+1 from each

other. We claim that this word wj,dLΓjW(2W)(X) has the desired aperiodicity con-

dition. Suppose not; then there exist two overlapping occurrences of wj,d such that

one has nonempty overlap with the empty central Γ(j-4)W of the other, which implies

that wj,d is periodic with respect to some v0 with |vi|<(j-2)W for 1<i<d.

We then define v by defining vi to be the closest multiple of W to vi for 1<i<d.

If two are equally close, choose either. In this way, we ensure that |vi-vi|<W2 for

each 1<i<d. We make the notation v :=v-v. Since each coordinate of v is

divisible by W, vW has integer coordinates.

Figure 3.18: The correspondence between copies of ΓW and points in Γj(2)

140

R+l rw

y y

R+l

y y

R+l R+1

Figure 3.19: How a copy of ΓW is filled if bj,d(p)=1

Since |vi|<(j-2)W for all i, every coordinate of vW has absolute value at most

j- 2. This implies that either v=0 or vW is the difference between two overlapping

occurrences of Γj(2), one of which has nonempty intersection with the empty central

Γj-4 of the other. Assume that the latter is the case. Then, by the already proven

aperiodicity condition on bn,d, this implies that there exist q, q+vWΓj(2) such that

bj,d(q)bj,d(q+vW). Without loss of generality, we assume that bj,d(q)=1 and

bj,d(q+vW)=0.

Let's call S the central ΓW-2R in the ΓW in ΓjW(2W) corresponding to q and T the

central ΓW-2R in the ΓW in ΓjW(2W) corresponding ttoo q+vW. Then T-S=v, and

wj,d is periodic with respect to v, wj,d(S(T-v))=wj,d((S+v)T). T-v=

(T-v)-v=S-v. Since every coordinate of v is at most W2, S(S-v),

and hence S(T-v), is nonempty. In fact, it is a rectangular prism Rs1,...,sd where

si>(W-2R)-W2=3R+7,1<i<d. Therefore, S(T-v) contains one of

141

the 2d copies of Γ3R+7 in S which share a vertex with S. Since bj,d(q)=1, and since

S is the central ΓW-2R in the ΓW corresponding to q, every one of these subcubes

of S contains an occurrence of y in wj,d. Therefore, wj,d(S(T-v)) has y as a

subword. However, since (S+v)TT, wj,d((S+v)T) is a subword of wj,d(T).

Since bj,d(q+vR)=0, and since T is the central ΓW-2R of the ΓW corresponding to

q+vR, wj,d(T) contains no occurrences of y. Therefore, wj,d((S+v)T) contains

no occurrences of y either. Since wj,d(S(T-v))=wj,d((S+v)T), we have a

contradiction.

The only remaining case is when v=0, i.e. |vi|<W2 for 1<i<d. We then simply

take an integer multiple nv of v such that some coordinate of nv is greater than W2,

but at most W. Then, nv is also a period of wj,d, and we can repeat the argument

just made for the same contradiction.

wo,d ΓW

w

Figure 3.20: f

142

We now return to our proof of the lower bound on htop(X)-htop(Xw). We suppose

that our word wLΓn(X) which we are removing from L(X) has n>(3d+5)W.

Take 0 to be the smallest integer such that (0-4)W >n+2R. Then, (0-4)W >

(3d+5)W, and so 0>3d+2. We also define n=oW+2R. By the definition of 0,

n<n+4R+5W<n+44R+70. Take μ¯w to be any ergodic measure of maximal

entropy on Xw . Since

wLΓn(Xw)μ¯w([w])=μ¯w(wLΓn(Xw)[w])=μ¯w(Xw)=1,

there must exist some word fLΓn, (Xw) such that μ¯w([f])>1HΓn,(Xw) . We note that

since XwX, HΓn, (Xw)<HΓn, (X). By Lemma 3.2.1, HΓn, (X)<ehtop(X)(n+R)d

Therefore, μ¯w([f])>e-htop(X)(n+R)d

This means that by using Lemma 3.2.7, we see that if, for any ε>0, we denote

by Bk,ε(Xw) the set of words in LΓk(Xw) which have at least kd(e-htop(X)(n+R)d-ε)

occurrences of f, then limkln|Bk,ε(Xw)|kd=htop(Xw).

In Figure 3.20, we show a word fLΓn-2R,(X) constructed as follows: the copy of

IoW(2W) is filled with wo,d as constructed in Theorem 3.4.2, and the central Γn is filled

with w. The remaining shaded portion is filled using strong irreducibility to create a

word fL(X).

We may now finally describe our map ψk. Consider any uBk,ε(Xw). By definition,

u has at least kd(e-htop(X)(n+R)d-ε) occurrences of f. We choose a set of disjoint

occurrences of f using a simple algorithm: choose any occurrence of f to begin, call

it f(1). At most 3 ddn occurrences of f overlap f(1), and so we choose any occurrence

143

of f which does not overlap f(1), and call it f(2). f(3) is any occurrence of f which

does not overlap f(1) or f(2), and we continue in this fashion. We are able to choose

at least kd3-dn-d(e-htop(X)(n+R)d-ε) disjoint occurrences of f in u in this way. We

then choose any nonempty subset of this chosen set of occurrences of f, and for each

chosen occurrence of f, if its shape is U a copy of Γn , we use strong irreducibility to

replace the central Γn-2R of U by f. ψk(u) is defined to be the set of words which

could be created by performing such replacements on u. The cardinality of ψk(u) is

then at least 2kd3-dn-d(e-htop(X)(n+R)d-ε)-1 for any uBk,ε(Xw).

We claim that ψk(u)ψk(u)= for any uuBk,ε(Xw). To show this, we

first show an auxiliary claim: that no occurrence of f is ever incidentally created

in the replacement process of ψk. In other words, when some occurrences of f in

uBk,ε(Xw) are replaced by words containing f to create an element of ψk(u) , the

only occurrences of f in the result are the ones which are subwords of each replaced

occurrence of f. Suppose that this is not the case; that for some vBk,ε(Xw) and

vψk(v), v contains an occurrence of f which occupies a copy of Γn-2R, which we

call B, which was not the central Γn-2R of a copy of Γn which was occupied by one

of the replaced occurrences of f in v. Denote by B the central Γn of B and by B the

copy of Γn in which B is central. Then v(B)=w. Since vL(Xw), v(B)w.

This implies that one of the replacements made had nonempty intersection with B,

otherwise v(B)=v(B), a contradiction. Call the Γn where this replacement

was made C, and call its central Γn-2RC. By our hypothesis, B was not one

of the replaced copies of Γn , and so BC. Since all of the replacements made

were disjoint, v(C)=f. We know that v(B)=f as well. Since CB,

144

|B-C|<n+n-n2=n+n2. But, n+n2=n-oW2+oW+n2. Since n=oW+2R+2t,

n-oW2+oW+n2=R+t+oW+n2. It was part of the definition of 0 that (0-4)W >n+

2R+2t, so R+t<(0-4)W-n2. Therefore, R+t+oW+n2<(0-4)W-n2+oW+n2=(0-2)W.

We have then shown that |B-C|<(0-2)W. Since B is central in B and C

is central in C, B-C=B-C and so |B-C|<(0-2)W as well. We make

one more notation: call E the boundary of thickness 2W of B and F the boundary

of thickness 2W of C. Since BC, EF. Then, since v(B)=v(C)=f,

v(E)=v(F)=wo,d. Also, E-F=B-C, so |E-F|<(0-2)W,

which implies that E has nonempty intersection with the central Γ(0-4)W of F, a

contradiction to the aperiodicity property of wo,d.

We have then shown that the only occurrences of f in any vψk(v) for some

vBk,ε(Xw) are those which occupy the central Γn-2R of replaced occurrences of f.

We also have a sort of converse result: for any such v, v, and for every U a copy Γn

which is filled with an occurrence of f in v which is replaced, the central Γn-2R of U

is filled with f in v. This fact rests on the disjointness of the replacements made in

the definition of ψk ; since these replacements are disjoint, each letter is changed at

most once.

Now, consider any vψk(v) for vBk,ε(Xw). Since vψk(v), v has at least one

occurrence of f. Since the only occurrences of f in v come from replacements during

the application of ψk, we know that for every U a copy of Γn-2R with v(U)=f, if

we call U the copy of Γn in which U is central, it must be the case that v(U)=f.

We also know that these are the only replacements performed in turning v into v,

145

since any others would have resulted in more occurrences of f. So, v determines the

set of replacements which were made in v. However, trivially v=v outside of the

regions where replacements occurred, and so v determines the letters of v outside

regions where replacements occurred, as well. Therefore, v is uniquely determined by

v, and so we know that for vvBk,ε(Xw), ψk(v)ψk(v)=.

Since |ψk(v)|>2kd3-dn-d(e-htop(X)(n+R)d-ε)-1 >2kd3-d+1n-d(e-htop(X)(n+R)d-ε) for v

Bk,ε(Xw), we have shown that

HΓk(X)>2kd3-d+1n-d(e-htop(X)(n+R)d-ε)|Bk,ε(Xw)|

for large k. Take natural logarithms of both sides, divide by kd, and let k to

see that

htop(X)>(ln2)(3-d+1n-d(e-htop(X)(n+R)d-ε))+htop(Xw).

Since ε was arbitrary, we allow it to approach zero, and so

htop(X)-htop(Xw)> III 2 . 3-d+1n-de-htop(X)(n+R)d,

which for large enough n, gives

htop(X)-htop(Xw)>e-htop(X)(n+R+1)d,

or, by replacing n by its maximum possible value n+44R+69,

htop(X)-htop(Xw)>1ehtop(X)(n+44R+70)d .

146

Combining this with the earlier upper bound on htop(X)-htop(Xw), we have

1ehtop(X)(n+44R+70)d<htop(X)-htop(Xw)<DXehtop(X)(n-2R)d.

3.5 A closer look at the main result

We here wish to briefly review the proof of Theorem 3.1.22, and point out that most

of the complications encountered in the proof were due to possible problems with

periodicity of w. If some assumptions about w's aperiodicity are made, the proof can

be simplified significantly. We will skip some of the details of these proofs, as they

are just simplified versions of arguments already made in the main text.

Theorem 3.5.1. For any strongly irreducible shift X =ΩF of finite type with uniform

filling length R and positive topological entropy htop(X), and for any words w, w

LΓn(X) such that w and w agree on the boundary of thickness t and any replacement

of w by w or vice versa cannot create or destroy any other occurrences of w and w,

if we denote by Xw the shift of finite type ΩFu{w}, then for any μ-w an ergodic measure

of maximal entropy on Xw and any μ- an ergodic measure of maximal entropy on X,

(ln2)μ¯w([w])<htop(X)-htop(Xw)<2(ln2)μ¯([w]).

Proof: To prove an upper bound on htop(X)-htop(Xw), define a mapping θk (similar

to φk from the upper bound portion of the proof of Theorem 3.1.22) on LΓk+2n(X):

147

given vLΓk+2n(X), one replaces the occurrences of w by w, going in order lexico-

graphically. Since none of these replacements can make new occurrences of w, this

process terminates. The word occupying the central Γk of the resulting word is then in

LΓk(Xw) for the same reasons as before, and we call this word θk(v). We then wish to

bound from above the size of 0 k-1 (v) Ak+2n , ε , μ¯ , w , w (X) for any vLΓk+2n(X) and any

ergodic measure of maximal entropy μ¯ on X. Recall that Ak+2n , ε , μ¯ , w , w (X) is the set of

words in LΓk+2n(X) which have between (k+2n)d(μ¯([x])-ε) and (k+2n)d(μ¯([x])+ε)

occurrences of x for x=w, w.

Consider any xθk-1 (v) Ak+2n , ε , μ¯ , w , w (X). Because of the fact that occurrences of

w and w are not incidentally created nor destroyed during the replacements made in

the process of changing x to v, the word occupying a shape S a copy of Γn will be

changed from w to w at some point during these replacements if and only if x(S)=w.

If x(S) =w, then v(S)=w, since once an occurrence of w is created, it cannot

be eliminated in any of the other replacements made in the process of changing x

to θk(x)=v. Therefore, any replacements made are at copies S of Γn such that

v(S)=w. We have then shown that if we denote by S1, . . . , Sp the set of copies S

of Γn such that v(S)=w, then x agrees with v on Γk(i=1pSi), and for each i,

x(Si)=w or x(Si)=w. This shows that there are at most 2p choices for x(Γk), and

so since there are at most |A|(k+2n)d-kd ways to fill Γk+2n(n) with letters, that |θk-1(v)|<

2p|A|(k+2n)d-kd But, v=θk(x), and xAk+2n,ε,μ¯,w,w(X). Therefore, p, the number

of occurrences of w in v, is equal to q+r, where q is the number of occurrences of

w in x and r is the number of occurrences of w in x. By definition of Ak+2n,ε,n(X),

q<(k+2n)d(μ¯([w])+ε) and r< (k+2n)d(μ¯([w])+ε). By Proposition 3.2.5,

148

since w and w agree on their boundary of thickness t,μ¯([w])=μ¯([w]). Therefore,

p<2(k+2n)d(μ¯([w])+ε), and so we see that if 0 k-1(v) Ak+2n , ε , μ¯ , w , w (X) is nonempty,

then it has cardinality at most 22(k+2n)d(μ¯([w])+ε)|A|(k+2n)d-kd This implies that

HΓk(Xw)>|Ak+2n,ε,μ¯,w,w(X)|22(k+2n)d(μ¯([w])+ε)|A|(k+2n)d-kd.

Take natural logarithms of both sides, divide by (k+2n)d, and let k to get

limkkd(k+2n)dhΓk(Xw)kd>limkln|Ak+2n,ε,μ¯,w,w(X)|(k+2n)d-2ln2(μ¯([w])+ε)

-ln|A|((k+2n)d-kd)(k+2n)d,

and by using the definition of entropy and Lemma 3.2.7, we see that

htop(Xw)>hμ¯(X)-2(ln2)(μ¯([w])+ε).

Since μ¯ is a measure of maximal entropy, and since ε can be arbitrarily small, we may

rearrange to arrive at

htop(X)-htop(Xw)<2(ln2)μ¯([w]).

We could then use Lemma 3.2.6 to further bound this from above, but we leave the

bound as it is for now, to emphasize the fact that our upper bound comes from μ¯([w]).

We can also give a similarly shortened proof of a lower bound on htop(X)-htop(Xw).

Define a map δk (similar to ψk from the proof of the lower bound portion of Theo-

rem 3.1.22) from LΓk(Xw) to 7(LΓk(X)) as follows: for any vLΓk(Xw) which has

at least one occurrence of w, take any nonempty subset of the occurrences of w in v

149

and replace them all by w. It is possible to perform all of these replacements simulta-

neously because the hypotheses of Theorem 3.5.1 imply that the portions of any two

occurrences of w which would need to be changed to turn them into w are disjoint.

(If not, then a replacement of one of them by w would destroy the other.) Therefore,

for any ergodic measure of maximal entropy μ¯w on Xw and for any vAk , ε , μ¯2 , w (X) ,

|δk(v)|>2kd(μ¯w([w])-ε)-1. For exactly the same reasons as in the earlier lower bound

proof, δk(v)δk(v)= for vvAk,ε,μ¯w,w(Xw). Therefore,

HΓk(X)>(2kd(μ¯w([w])-ε)-1) |Ak,ε,μ¯w,w(Xw)|.

Take natural logarithms of both sides, divide by kd, and let k to see that

htop(X)>(ln2)(μ¯w([w])-ε)+limsupln|Ak,ε,μ¯w,w(Xw)|kd,

k

and by using Lemma 3.2.7 and letting ε approach zero since it is arbitrary,

htop(X) > (In 2) μ-w([w])+hμ¯w(Xw).

Finally, we use the fact that μ¯w is a measure of maximal entropy on Xw to rewrite as

htop(X)-htop(Xw)>(ln2)μ¯w([w]).

We recall that the upper and lower bounds in Theorem 3.1.16 (the one-dimensional

case) have a ratio bounded away from zero and infinity, but that we showed that it is

150

impossible to have such close bounds in the multidimensional case which depend only

on the size of w. It is then natural to wonder whether or not it is possible, at least

under the hypotheses on w from Theorem 3.5.1 and for large n, to have a lower bound

of the form Cμ¯([w]) for htop(X)-htop(Xw). It turns out that this cannot be true if

there are multiple measures of maximal entropy. In one dimension, the uniqueness

of a measure of maximal entropy of any mixing shift of finite type is well-known,

but Burton and Steif ([BuSl], [BuS2]) have shown that there are multidimensional

strongly irreducible shifts of finite type which have more than one measure of maximal

entropy. In order to show that this type of lower bound is too good to be true, we

need a lemma:

Lemma 3.5.2. If X is a strongly irreducible Zd-shift of finite type t and if we denote

by En(X) the set of words w LΓn(X) for which there exists wLΓn(X) where w

and w agree on the boundary of thickness t and such that replacing w by w or vice

versa can never incidentally create nor destroy an occurrence of w or w, then for

any ergodic measure μ¯ of maximal entropy on X, limnμ¯(En(X))=1.

Proof: We claim that as long as n is sufficiently large, and w has the property

that given any two disjoint copies S and T of Γn contained in Γn, w(S)w(T),

then wEn(X). Consider any w with the property just described. Then create

a standard replacement of w in the same way as in the proof of Theorem 3.3.1, by

1

changing w only on a central copy of Γ2R+l . (Recall that l =(dlnnhtop(X))d¯+1 or

l =(dlnnhtop(X))1d+2 and that l is chosen so that there exists aLΓ, (X) so that a

is not a subword of w. We before chose l to be odd, but here we take it to have the

151

same parity as n so that there is a central Γl within Γn.) We denote by K the central

copy of Γ2R+l on which w is changed to create w, and so w(ΓnK) =w(ΓnK).

We have four cases that we would like to show are impossible:

Case 1: Suppose that a replacement of w by w could possibly create a new occur-

rence of w. More rigorously, that there exists xX and copies T and T of Γn such

that x(T)=w, and if we define by x the element of X created by replacing x(T) by

w, then x(T)=w, but x(T)w. This means that x(T)=w and x(T)=w.

Since the central Γl in w is occupied by a, this implies that |T-T|>n-l2, or else

x(T)=w would have to contain this occurrence of a. However, in order for x(T)

to be a newly created occurrence of w, it must be the case that T has nonempty

overlap with K+(T-Γn), and so |T-T|<n+(2R+l)2. By this upper bound on

|T-T|, TT contains a cube of size at least n-(2R+l)2, which must contain a copy

of Γn disjoint from the very small sets K+(T-Γn) and K+(T-Γn), call it U.

Since x(T)=w, x(U)=w(U-(T-Γn)). Since U is disjoint from K+(T-Γn),

w(U-(T-Γn))=w(U-(T-Γn)). Since x(T)=w, x(U)=w(U-(T-Γn)).

Therefore, w(U-(T-Γn))=w(U-(T-Γn)). But, since |T-T|>n-l2, which

is greater than n for large n, U-(T-Γn) and U-(T-Γn) are disjoint copies of

Γn, and so we have a contradiction.

Case 2: Suppose that a replacement of w by w could possibly destroy an existing

occurrence of w. More rigorously, that there exists xX and copies T and T of

Γn such that x(T)=w, and if we define by x the element of X created by replacing

152

x(T) by w, then x(T)w, but x(T)=w. This means that x(T)=w and

x(T)=w. Everything from here proceeds to a contradiction exactly as in Case 1.

Case 3: Suppose that a replacement of w by w could possibly destroy an existing

occurrence of w. More rigorously, that there exists xX and copies T and T of Γn

such that x(T)=w, and if we define by x the element of X created by replacing x(T)

by w, then x(T)w, but x(T)=w. This means that x(T)=w and x(T)=w.

However, since x(T)w, it must be the case that K+(T-Γn) has nonempty

intersection with T. This again implies that |T-T|<n+(2R+l)2 . It is then possible

to choose a positive integer k so that n<|k(T-T)|<n-n. Then choose

any U a copy of Γn so that U, U+k(T-T)T. Since x(T)=w, x(U)=w(U).

Since x(T)=w, x(U)=w(U+(T-T)). So, w(U)=w(U+(T-T)). In the same

way, we can see that w(U)=w(U+k(T-T)), and since |k(T-T)|>n, U and

U+k(T-T) are disjoint and so we have a contradiction.

Case 4: Suppose that a replacement of w by w could possibly create a new occur-

rence of w. More rigorously, that there exists xX and copies T and T of Γn such

that x(T)=w, and if we define by x the element of X created by replacing x(T) by

w, then x(T)=w, but x(T)w. This means that x(T)=w and x(T)=w.

However, since x(T)w, it must be the case that K+(T-Γn) has nonempty

intersection with T. This again implies that |T-T|<n+(2R+l)2 . It is then pos-

sible to choose a positive integer k so that n<|k(T-T)|<n-n. Then,

choose any U a copy of Γn so that U, U+k(T-T)T, and both are disjoint

from K+(T-Γn). Then, since x(T)=w, x(U)=w(U). Since x(T)=w,

153

x(U)=w(U+(T-T)). So, w(U)=w(U+(T-T)). In the same way, we can

see that w(U)=w(U+k(T-T)). Since U and U+k(T-T) are disjoint from

K+(T-Γn), w(U)=w(U) and w(U+k(T-T))=w(U+k(T-T)). Therefore,

w(U)=w(U+k(T-T)). Since |k(T-T)|>n, U and U+k(T-T) are disjoint

and so we have a contradiction.

Therefore, as long as w has w(S)w(T) for S and T disjoint copies of Γn in Γn,

wEn(X). This implies that

LΓn(X)En(X)vLΓn(X)((,([v]([v]+u))+u))u[-n,n]d,|u|>nu[-nn]d.

(3.8)

We can now use Lemma 3.2.6 to see that for any vLΓn(X), any u with |u|>

n, and any ergodic measure of maximal entropy μ¯ on X,

μ¯([v]([v]+u))<1H(X),(Γn(Γn+u))(Γn(Γn+u))(R), ċ

Since |u|>n,

(Γn(Γn+u))(Γn(Γn+u))(R)

=((Γn)(Γn)(R))(((Γn)(Γn)(R))+u).

Since |u|>n, d(((Γn)(Γn)(R)), ((Γn)(Γn)(R))+u)>n>R for

large n, and so by strong irreducibility,

H(((Γn)(Γn)(R))(((Γn)(Γn)(R))+u)X) =H(((Γn)(Γn)(R))X)2,

154

which is at least ehtop(X)2(n-2R)d by Lemma 3.2.1. Combining this with shift-invariance

of μ¯ and (3.8), we see that

μ¯(LΓn(X)En(X))< HΓn(X)(2n)2de-htop(X)2(n-2R)d

Again by Lemma 3.2.1, HΓn(X)<ehtop(X)(n+R)d Therefore,

μ¯(LΓn(X)En(X))<(2n)2de-htop(X)(2(n-2R)d-(n+R)d),

which clearly approaches zero as n.

Now, suppose that for any strongly irreducible Zd-shift of finite type X, there exist

constants C, N>0 and an ergodic measure of maximal entropy μ¯ on X such that

for any wEn(X) with n>N, Cμ¯([w])<htop(X)-htop(Xw). For a contradiction,

choose any X with more than one measure of maximal entropy. Since the extreme

points of the set of measures of maximal entropy are the ergodic measures of maximal

entropy, there must exist some ergodic measure of maximal entropy μ¯μ¯ on X.

Then we have

Cμ¯([w])<htop(X)-htop(Xw)<(2ln2)μ¯([w]) (3.9)

for all wEn with n>N. However, since μ¯ and μ¯ are ergodic, they must be

mutually singular. Therefore, for any ε>0, there exists some open set Uε such that

μ¯(Uε)<ε and μ¯(Uε)>1 -ε. By Lemma 3.5.2, for large enough nμ¯(UεEn)>1 -2ε,

155

and obviously μ¯(UεEn)<ε. Since Uε is an open set in X, it is a union of cylinder

sets. This means that there exists N>N so that for any n>N, there exists some

VεUε a union of cylinder sets of words in LΓn(X) with μ¯(VεEn)>1 -3ε, and

again μ¯(VεEn)<ε. However, we assumed that Cμ¯([w])<htop(X)-htop(Xw)<

(2 In n) μ¯([w]) for all wEn and n>N, and so since VεEn is a union of cylinder

sets associated to En-words, for any n>N we may sum (3.9) over wVεEn to see

that Cμ¯(VεEn)<( 2 In 2) μ¯(VεEn), and so Cε<( 2 In 2) (1 -3ε), a contradiction

for small enough ε. This implies that if d>1, then there does not exist a lower

bound on htop(X)-htop(Xw) of the form Cμ¯([w]) which holds for all words wEn

for sufficiently large n.

3.6 An application to an undecidability question

One of the complexities of multidimensional symbolic dynamics is that it is algorith-

mically undecidable, given just the set F of forbidden words, whether or not a shift

of finite type in Zd is nonempty. (See [B], [R], [Wan] for details.) One application of

Corollary 3.3.7 is a condition under which ΩF is nonempty:

Theorem 3.6.1. For any alphabet A, there exist F, GN such that for any m>0

and any finite set of words Fm={wkLΓnk(X) :1<k<m} satisfying n1>G

and nk>F(nk-1)4d2 for 1<k<m, ΩFm.

To prove this, we need the following lemma:

Lemma 3.6.2. For any strongly irreducible Zd-shift X=ΩF of finite type t with

uniform filling length R, there exist C, EN dependent only on d such that for

156

any wLΓn(X) with n>max(C(R+1)4d2, t2, E), if we denote by Xw the shift of

finite type ΩFu{w}, then there is some wLΓm(X) a subword of w such that Xw , is

nonempty and strongly irreducible with uniform filling length at most 2n+R.

Proof: Suppose that such a shift X is given. By Corollary 3.3.7, there exists N such

that for any wLΓn(X) with n>N, there exist w, wLΓm(X) such that w is a

subword of w, w and w agree on Γn(t), and replacing an occurrence of w by w in an

element of X cannot possibly create a new occurrence of w. (We are not using the

full force of Corollary 3.3.7 here, in that we do not care about the size m of w, only

that w exists.) By reviewing the proof of Corollary 3.3.7, we see that a sufficient

condition for the existence of w is that

(n-t+1) -(C ' (ln(n-t+1)htop(X)+R))(n-t+1)1-12d>0

for some constant C depending only on d. Some algebraic manipulation shows that

the above is implied if n>C2d(ln(n-t+1)htop(X)+R)2d+t. By Lemma 3.2.1, ehtop(X)(R+1)d>

HΓ1(X)>2, so htop(X)>ln2(R+1)d. Therefore, it is sufficient to have n>C2d(1ln2(R+

1)d In n+R)2d+t. Since (R+1)d>R and n>t2, for any n>1 a sufficient

condition is n>2C2d(2lnn(R+1)d)2d+n. There is a constant E dependent

only on d so that if n>E, (ln n)2d<n. For such n, it would be enough to have

n>2C2dn(2(R+1)d)2d+n, or by dividing both sides by n and then squaring

both sides, n>C(R+1)4d2 for some constant C dependent only on d. So, as long as

n>max(C(R+1)4d2, t2, E), such a w exists.

Consider any two shapes S, TZd such that d(S, T)>2n+R, and any words y

157

LS(Xw) and zLT(Xw). Since yLS(Xw), there exists yLsu(Sc)(n)(Xw) such

that y(S)=y. Similarly, there exists zLTU(Tc)(n)(Xw) such that z(T)=z. For

any pS (Sc)(n), by definition there exists pS such that d(p,p)<n. Similarly,

for any qT(Tc)(n), there exists qT such that d(q, q)<n. But d(S, T)>2n+R,

so d(p, q)>2n+R7. By the triangle inequality, d(p, q)>R. Therefore, since p, q were

arbitrary, d(S (Sc)(n), TU (Tc)(n))>R, and so by strong irreducibility of X there

exists xX such that x(S (Sc)(n))=y and x(T (Tc)(n))=z. We now fix any

ordering of the elements of Zd with a least element (say lexicographically with respect

to polar coordinates) and replace each element of w by w in turn with respect to

this order. In this way, we eventually arrive at xX which has no occurrences of w

and is thus an element of Xw. Note that since x(S (Sc)(n))=y and x(T (Tc)(n))

are words in L(Xw), they had no occurrences of w. Therefore, any of the replaced

occurrences of w had nonempty intersection with ((S (Sc)(n))u (T (Tc)(n)))c

Consider such a replaced occurrence which occupies U a copy of Γm. From the fact

just noted, there exists pU such that pS (Sc)(n) T (Tc)(n). This implies that

d(p, S) >n and d(p, T)>n. Therefore, since the size of U is m<n, U is disjoint

from both S and T. Since U is an arbitrary replaced occurrence of w, this implies

that x remained unchanged on S and T throughout the process of changing it to x,

and so x(S)=x(S)=y and x(T)=x(T)=z. By definition, we have shown that

Xw , is nonempty and strongly irreducible with uniform filling length at most 2n+R.

Proof of Theorem 3.6.1: We prove Theorem 3.6.1 by induction. Our inductive

158

hypothesis is that for any Fm as described in the hypotheses, ΩFm contains a strongly

irreducible shift of finite type with uniform filling length less than 4nm. First note

that by Theorem 3.1.22 and Lemma 3.6.2, there certainly exists G such that for

any word w1AΓn1 with n1>G, there is a subword w1 of w1 such that Ω{w1} is

nonempty and strongly irreducible with uniform filling length R1<4n1. We take

this G to be greater than the E from Lemma 3.6.2, and take our F to be 54d2C,

where C is from Lemma 3.6.2. Now suppose the hypothesis to be true for m, and

consider any Fm+1={w1 , w2, . . . , wm+1} satisfying the hypotheses of Theorem 3.6.1.

By the inductive hypothesis, Ω{w1,w2,...,wm} contains a strongly irreducible shift Xm

of finite type tm with uniform filling length Rm<4nm. There are two cases: either

wm+1L(Xm) or not. If wm+1L(Xm), then (Xm)wm+1=Xm, and so since

Xm=(Xm)wm+1ΩFm+1, in this case ΩFm+1 contains a strongly irreducible shift

of finite type with uniform filling length Rm<4nm<4nm+1 and the inductive

hypothesis is verified for m+1. So, we suppose that wm+1L(Xm). We will show

that nm+1>max(C(Rm+1)4d2, tm2, E). By the hypotheses of Theorem 3.6.1, we see

that nm+1>F(nm)4d2=54d2C(nm)4d2>C(5nm)4d2>C(Rm+1)4d2 Since the largest

size of a word in F is nm, tm<nm, and so nm+1>F(nm)4d2>tm2. Finally, nm+1>

n1>G>E. Therefore, wm+1 satisfies the hypotheses of Lemma 3.6.2 and so there

is wm+1 a subword of wm+1 such that (Xm)wm+1 is nonempty and strongly irreducible

with uniform filling length less than 42 m+1. Also, (Xm)wm+1(Xm)wm+1ΩFm+1,

and so the induction is complete. We have then shown that for any m, ΩFm contains

a nonempty shift and is therefore nonempty itself.

159

3.7 Questions

A few questions suggest themselves based on our results. The first, and perhaps most

obvious, is simply how the bounds in Theorem 3.1.22 could be improved. (We suspect

that at least the lower bound is far from optimal.)

Question 3.7.1. Can the constants AX and BX in the statement of Possible Theo-

rem 3.1.21 be improved from their values in Theorem 3.1.22 l.?

We know at least that the gap between AX and BX can be improved only up to

possible additive and multiplicative constants; we showed earlier that there are ex-

amples which force AX- BX>2R for strongly irreducible shifts of finite type with

arbitrarily large R, and our Theorem 3.1.22 has AX- BX=46R+70. However, it

would be nice to achieve optimal values.

Next, recall that we showed in the introduction that no result nearly as strong as

Theorem 3.1.22 can be true for shifts of finite type which are only assumed to be

topologically mixing. However, it is probable that something can be said. So, we can

ask

Question 3.7.2. If X is a topologically mixing Zd-shift X=ΩF of finite type with

positive topological entropy htop(X), and if we denote by Xw the shift of finite type

ΩFu{w}, then is it true that for every ε>0, there exists N such that for all w

LΓn(X) with n>N, htop(X)-htop(Xw)<ε ? If so, can anything be said about

the rate at which htop(X)-htop(Xw) approaches zero as the size n of wLΓn(X)

approaches infinity l.?

160

If it is the case that htop(X)-htop(Xw) approaches zero as n, then it seems

likely that some new techniques would be necessary for a proof, since there exist

non-strongly irreducible shifts of finite type whose languages contain words w for

which there exists no word ww agreeing with w on its border. For example,

in the checkerboard island shift Z, any square word which is a finite portion of a

checkerboard is forced by its boundary.

161

BIBLIOGRAPHY

[B] ROBERT BERGER, The undecidability of the domino problem, Mem. Amer.

Math. Soc. 66 (1966)

[Be] VITALY BERGELSON, Ergodic Ramsey Theory- an Update, Ergodic Theory

of Zd-actions, M. Pollicott and K. Schmidt, editors, London Math. Soc.

Lecture Notes Series 228, Cambridge University Press (1996), 1-61.

[BeL] VITALY BERGELSON AND ALEXANDER LEIBIVIAN, Polynomial extensions

of van der Waerden's and Szemeredi's theorems, Journal of AMS 9 (1996),

no. 3, 725-753.

[Bi] G.D. BIRKHOFF, Proof of the ergodic theorem, Proc. Nat. Acad. Sci. USA,

17 (1931) 656-660.

[Bou] JEAN BOURGAIN, Poin rwise ergodic theorems for arithmetic sets, Publ.

Math. IHES 69 (1989), no. 3, 5-45.

[Bou2] JEAN BOURGAIN, On poin rwise ergodic theorems for arithmetic sets, C. R.

Acad. Sci. Paris Sir. I Math. 305 (1987), no. 10, 397-402.

[BoyLR] MICHAEL BOYLE, DOUGLAS LIND, AND DANIEL RUDOLPH, The auto-

morphism group of a shift of finite type, Trans. Amer. Math. Soc., 306

(1988), no. 1, 71-114.

[BuSl] ROBERT BURTON AND JEFFREY STEIF, Non-uniqueness of measures of

maximal entropy, Ergodic Theory Dynam. Systems, 14 (1994), no. 2. 213-

235.

[BuS2] ROBERT BURTON AND JEFFREY STEIF, Nerv results on measures of max-

imal entropy, Israel J. Math. 89 (1995), nos. 1-3, 275-300.

[Fa] BASSAIVI FAYAD, Analytic mixing reparametrizations of irrational florvs, Er-

godic Theory Dynam. Systems 22 (2002), no. 2, 437-468.

162

[Fu]

HILLEL FURSTENBERG, Strict ergodicity and transformation of the torus,

Amer. J. Math. 83 (1961), 573-601.

[GO1] LEO GUIBAS AND ANDREW ODLYZKO, 6aximal prefix-synchronized codes,

SIAM J. Appl. Math. 35 (1978), 401-418.

[GO2] LEO GUIBAS AND ANDREW ODLYZKO, Periods in strings, J. Combin. The-

ory Ser. A, 30 (1981), 19-42.

[HaK] FRANK HAHN AND YITZHAK KATZNELSON, On the entropy of uniquely

ergodic transformations, Trans. Amer. Math. Soc. 126 (1967), 335-360.

[HaW] G.H. HARDY AND E.IVI. WRIGHT, An introduction to the theory of num-

bers, Clarendon Press, 1979.

[HeM] GUSTAV HEDLUND AND MARSTON Morse, Symbolic dynamics, Amer. J.

Math., 60 (1938), no. 4, 815-866.

[L] DOUGLAS LIND, Perturbations of shifts of finite type, Siam J. Discrete

Math. 2 (1989), no. 3, 350-365.

[LM DOUGLAS LIND AND BRIAN Marcus, An Introduction to Symbolic Dy-

namics and Coding, Cambridge University Press, 1995.

[M] MICHAEL MISIUREWICZ, A short proof of the variational principle for a

Z+n-action on a compact space, Asterisque 40 (1975), 147-157.

[QS] ANTHONY QUAS AND A.A. §AHIN, Entropy gaps and locally maximal en-

tropy in Zd subshifts, Ergodic Theory Dynam. Systems 23 (2003), 1227-

1245.

[QT] ANTHONY QUAS AND PAUL Trow, Subshifts of multi-dimensional shifts

of finite type, Ergodic Theory Dynam. Systems 20 (2000), no. 3, 859-874.

[R] RAPHAEL IVI. ROBINSON, Undecidability and nonperiodicity for tilings of

the plane, Invent. Math 12 (1971), 177-209.

[Wal] PETER WALTERS, An Introduction to Ergodic Theory, Springer-Verlag,

1982.

[Wan] HAO WANG, Proving theorems by pattern recognition II, AT&T Bell Labs.

Tech. J. 40 (1961), 1-41.

163

[Wi]

MAacuteTEacute WIERDL, Poin rwise ergodic theorem along the prime numbers, Israel

J. Math. 64 (1988), no. 3, 315-336.

164