Partially Observed Systems - Zijing "Jimmy" Hu

Partially Observed Systems

Zijing Hu

November 23, 2022

Contents

1 State Estimation For A Markov Chain 1

2 Optimal Control of A Noisily Observed MDP 2

3 Linear Quadratic Gaussian Systems 3

3.1 Gaussian Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.2 The Kalman Filtering Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

*This note is based on ECEN 755: Stochastic Systems by Dr. P. R. Kumar, TAMU.

1 State Estimation For A Markov Chain

There are three generic state estimation problems in dynamical systems, given observations y

t

:

1. Filtering: ﬁnd an estimator P r (x

t

= i | y

t

) of the current state x

t

.

2. Smoothing: ﬁnd an estimator P r (x(s) = i | y

t

) of the state x(s) at some previous time s < t.

3. Prediction: ﬁnd an estimator P r (x(s) = i | y

t

) of the state x(s) at some future time s > t.

Suppose that we get noisy observations y(x) ∈ Y of the state x

t

. Suppose that we already know

P r (y

t

= j | x

t

= i) .

We observe y

t

= [y

0

, y

1

, . . . , y

t

] and want to compute Pr(x

t

= i | y

t

) and P r(x

t+1

= i | y

t

).

Theorem 1.1. Normalized ﬁltering equation.

P r



x

t+1

= i | y

t



=

X

i

′

P r



x

t+1

= i, x

t

= i

′

| y

t



=

X

i

′

P r



x

t

= i

′

| y

t



P r



x

t+1

= i | x

t

= i

′

, y

t



=

X

i

′

P r



x

t

= i

′

| y

t



P r (x

t+1

= i | x

t

= i

′

)

1

P r



x

t+1

= i | y

t+1



=

P r (x

t+1

= i, y

t+1

)

P r (y

t+1

)

=

P r (x

t+1

= i, y

t

, y

t+1

)

P r (y

t+1

)

=

P r (x

t+1

= i, y

t

) P r (y

t+1

| x

t+1

= i, y

t

)

P r (y

t+1

)

=

P r (x

t+1

= i | y

t

) P r (y

t+1

| x

t+1

= i)

P r (y

t+1

) /P r (y

t

)

=

P r (x

t+1

= i | y

t

) P r (y

t+1

| x

t+1

= i)

P

j

P r (x

t+1

= j | y

t

) P r (y

t+1

| x

t+1

= j)

=

P r (x

t+1

= i | y

t

) P r (y

t+1

| x

t+1

= i)

Normalization F actor

=

P

i

′

P r (x

t

= i

′

| y

t

) P r (x

t+1

= i | x

t

= i

′

) P r (y

t+1

| x

t+1

= i)

Normalization F actor

= T

t+1

(π

t|t

, y

t+1

, u

t

) for controlled Markov processes with p

ij

(u)

Suppose the initial distribution p

0

(i) = P r (x

0

= i) is given.

P r



x

0

= i | y

0



=

P r



x

0

= i, y

0



P r (y

0

)

=

P r



y

0

| x

0

= i



P r (x

0

= i)

P

i

′

P r (y

0

| x

0

= i

′

) P r (x

0

= i

′

)

We can write these formula in a matrix form. Suppose that

p

t|t

=



p(x

t

= 1|y

t

) p(x

t

= 2|y

t

) · · · p(x

t

= n|y

t

)



D

t

=







p(y

t

|x

t

= 1)

p(y

t

|x

t

= 2)

.

p(y

t

|x

t

= n)







, e =







1

.

1







Then, we have

p

t+1|t+1

=

p

t|t

P

t+1

D

t

p

t|t

P

t+1

D

t

e

p

0|0

=

p

0

D

0

p

0

D

0

e

where P

t

is the dynamic transition probability matrix and D

t

is given.

2 Optimal Control of A Noisily Observed MDP

Definition 2.1. Separation Principle (A principle of separation of estimation and control).

0. Input action u

k−1

to a system and observe y

k

.

1. State estimation of π

k|k

with π

k−1|k−1

, y

k

, and u

k−1

.

2. Generate action u

k

= g(π

k|k

) with state estimation π

k|k

and policy g.

The expected cost of history dependant policy can be written as

J

g

k

= E

g

"

N−1

X

l=k

c

l

(x

l

, u

l

) + c

N

(X

N

) | z

k

#

2

where z

k

= (y

k

, u

k−1

) = {y

1

, . . . , y

k

, u

1

, . . . , u

k−1

}.

Theorem 2.2. Deﬁne

V

N

(π) = E



c

N

(x

N

) | π

N|N

= π



=

X

i

c

N

(i)π

i

V

k

(π

k|k

(z

k

)) = min

u

E



c(x

k

, u) + V

k+1

(T

k

(π

k|k

, y

k+1

, u)) | z

k



Then, we have (1) V

k

(π

k|k

(z

k

)) ≤ J

g

k

∀k, g and (2) if g is a separated policy which attains the “min”,

g is optimal.

Proof. We have J

g

N

= V

N

(π

N|N

). Suppose it is true for k = N, N − 1, . . . , k + 1. Then

J

g

k

= E

g

"

N−1

X

l=k

c

l

(x

l

, u

l

) + c

N

(x

N

) | z

k

#

= E

g

"

c

l

(x

k

, u

k

) +

N−1

X

l=k+1

c

l

(x

l

, u

l

) + c

N

(x

N

) | z

k

#

= E

g

"

c

l

(x

k

, u

k

) + E

g

"

N−1

X

l=k+1

c

l

(x

l

, u

l

) + c

N

(x

N

) | z

k+1

#

| z

k

#

≥ E

g



c

l

(x

k

, u

k

) + V

k+1

(π

k+1|k+1

) | z

k



= E

g



c

l

(x

k

, u

k

) + V

k+1

(T

k



π

k|k

, y

k+1

, u

k

)



| z

k



= E

g



E

g



c

l

(x

k

, u

k

) + V

k+1

(T

k



π

k|k

, y

k+1

, u

k

)



| z

k



| z

k



≥ E

g

h

min

u

E

g



c

l

(x

k

, u) + V

k+1

(T

k



π

k|k

, y

k+1

, u)



| z

k



| z

k

i

= E

g



V

k

(π

k|k

(z

k

)) | z

k



= V

k

(π

k|k

(z

k

))

3 Linear Quadratic Gaussian Systems

3.1 Gaussian Random Variable

Basic concepts:

Uncorrected: E



(X −

¯

X)

⊤

(Y −

¯

Y )



= 0

Independent: f

XY

(X, Y ) = f

X

(X)f

Y

(Y )

Independent ⇒ uncorrected

Theorem 3.1. For jointly Gaussian random variables, independent ⇔ uncorrected.

Proof.

f(x, y) =

1

2πσ

x

σ

y

(1 − ρ

2

)

1/2

e

−q /2

q =

1

1 − ρ

2

"



x − µ

x

σ

x



2

− 2ρ



x − µ

x

σ

x



y − µ

y

σ

y



+



y − µ

y

σ

y



2

#

It reduces to a product of two normal densities when ρ = 0.

3

Theorem 3.2. Suppose X, Y ∼ f

XY

(X, Y ) in which Y is observable but X is not. Let g(Y ) be an

estimation of X based on Y . Denote g

⋆

as the one minimizes E



∥X − g

⋆

(Y )∥

2



. Then, we have

g

⋆

(Y ) = E[X | Y ]

Proof.

E

h

(X − g

⋆

(Y ))

⊤

(g

⋆

(Y ) − g(Y ))

i

=E

h

E

h

(X − g

⋆

(Y ))

⊤

(g

⋆

(Y ) − g(Y ))

i

| Y

i

=E

h

(E[X | Y ] − g

⋆

(Y ))

⊤

(g

⋆

(Y ) − g(Y ))

i

≥ 0

E



∥X − g(Y )∥

2



=E



∥X − g

⋆

(Y )∥

2



+ E



∥g

⋆

(Y ) − g(Y )∥

2



− 2E

h

(X − g

⋆

(Y ))

⊤

(g

⋆

(Y ) − g(Y ))

i

≥E



∥X − g

⋆

(Y )∥

2



Theorem 3.3. Suppose (X, Y ) are jointly Gaussian distributed. Then we have

ˆ

X = E[X | Y ] = ¯x + Σ

xy

Σ

−1

y y

(Y − ¯y)

V ar(X −

¯

X) = Σ

xx

− Σ

xy

Σ

−1

y y

Σ

y x

and

E[(X −

¯

X)Y ] = 0

Theorem 3.4. Suppose (X, Y , Z) are jointly Gaussian distributed and Y and Z are independent.

Then we have

E [X | Y, Z] =

¯

X + Σ

XY

Σ

−1

Y Y

(Y − ¯y) + Σ

XZ

Σ

−1

ZZ

(Z −

¯

Z)

=

¯

X + [Σ

XY

, Σ

XZ

]



Σ

−1

Y Y

0

0 Σ

−1

Y Y



Y −

¯

Y

Z −

¯

Z



3.2 The Kalman Filtering Problem

Notation

• x

t+1

= Ax

t

+ Gw

t

• x

0

∼ N (¯x

0

, Σ

0

) is independent with w

t

∼ N (0, Q)

• x

t

cannot be observed but we can observe y

t

= Cx

t

+ Hv

t

• v

t

∼ N (0, R)

We are interested in estimating current state and predict next state.

•

ˆ

x

t|t

= E [x

t

| y

t

] and Σ

t|t

= Cov



x

t

−

ˆ

x

t|t



•

ˆ

x

t+1|t

= E [x

t+1

| y

t

] and Σ

t+1|t

= Cov



x

t+1

−

ˆ

x

t+1|t



The estimation and prediction rely on two types of update.

• Measurement update:

ˆ

x

t|t−1

→

ˆ

x

t|t

and Σ

t|t−1

→ Σ

t|t

ˆ

x

t|t

= E



x

t

| y

t−1

, y

t



= E

h

x

t

| y

t−1

,

˜

y

t|t−1

i

=

ˆ

x

t|t−1

+ Σ

x

t

˜

y

t|t−1

Σ

−1

˜

y

t|t−1

˜

y

t|t−1

˜

y

t|t−1

4

˜

y

t|t−1

= y

t

− ˆy

t|t−1

= y

t

− E



Cx

t

+ Hv

t

| y

t



= y

t

− C

ˆ

x

t|t−1



= C

˜

x

t|t−1

+ Hv

t



Σ

˜

y

t|t−1

,

˜

y

t|t−1

= Cov(y

y+1

−

ˆ

y

t|t−1

)

= CΣ

t|t−1

C

⊤

+ HRH

⊤

Σ

x

t

˜

y

t|t−1

= E

h

x

t



C

˜

x

t|t−1

+ Hv

t



⊤

i

= E

h



ˆ

x

t|t−1

+

˜

x

t|t−1



˜

x

⊤

t|t−1

C

⊤

i

= E

h

˜

x

t|t−1

˜

x

⊤

t|t−1

i

C

⊤

= Σ

t|t−1

C

⊤

where K

t

= Σ

x

t

˜

y

t|t−1

Σ

−1

˜

y

t|t−1

˜

y

t|t−1

is called Kalman gain.

• Time update:

ˆ

x

t|t

→

ˆ

x

t+1|t

and Σ

t|t

→ Σ

t+1|t

ˆ

x

t+1|t

= E



x

t+1

| y

t



= E



Ax

t

+ Gw

t

| y

t



= AE



x

t

| y

t



+ GE



w

t

| y

t



= A

ˆ

x

t|t

Σ

t+1|t

= Cov(x

t+1

−

ˆ

x

t+1|t

)

= Cov



Ax

t

+ Gw

t

− A

ˆ

x

t|t



= Cov



Ax

t

− A

ˆ

x

t|t



+ Cov [Gw

t

]

= AΣ

t|t

A

⊤

+ GQG

⊤

Each iteration follows three steps.

• Compute Kalman gain

K

t

= Σ

t|t−1

C

⊤



CΣ

t|t−1

C

⊤

+ HRH

⊤



−1

• Measurement update

ˆ

x

t|t

=

ˆ

x

t|t−1

+ K

t



y

t

− C

ˆ

x

t|t−1



Σ

t|t

= (I − K

t

C) Σ

t|t−1

• Time update

ˆ

x

t+1|t

= A

ˆ

x

t|t

Σ

t+1|t

= AΣ

t|t

A

⊤

+ GQG

⊤

5