Partially Observed Systems
Zijing Hu
November 23, 2022
Contents
1 State Estimation For A Markov Chain 1
2 Optimal Control of A Noisily Observed MDP 2
3 Linear Quadratic Gaussian Systems 3
3.1 Gaussian Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 The Kalman Filtering Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
*This note is based on ECEN 755: Stochastic Systems by Dr. P. R. Kumar, TAMU.
1 State Estimation For A Markov Chain
There are three generic state estimation problems in dynamical systems, given observations y
t
:
1. Filtering: find an estimator P r (x
t
= i | y
t
) of the current state x
t
.
2. Smoothing: find an estimator P r (x(s) = i | y
t
) of the state x(s) at some previous time s < t.
3. Prediction: find an estimator P r (x(s) = i | y
t
) of the state x(s) at some future time s > t.
Suppose that we get noisy observations y(x) Y of the state x
t
. Suppose that we already know
P r (y
t
= j | x
t
= i) .
We observe y
t
= [y
0
, y
1
, . . . , y
t
] and want to compute Pr(x
t
= i | y
t
) and P r(x
t+1
= i | y
t
).
Theorem 1.1. Normalized filtering equation.
P r
x
t+1
= i | y
t
=
X
i
P r
x
t+1
= i, x
t
= i
| y
t
=
X
i
P r
x
t
= i
| y
t
P r
x
t+1
= i | x
t
= i
, y
t
=
X
i
P r
x
t
= i
| y
t
P r (x
t+1
= i | x
t
= i
)
1
P r
x
t+1
= i | y
t+1
=
P r (x
t+1
= i, y
t+1
)
P r (y
t+1
)
=
P r (x
t+1
= i, y
t
, y
t+1
)
P r (y
t+1
)
=
P r (x
t+1
= i, y
t
) P r (y
t+1
| x
t+1
= i, y
t
)
P r (y
t+1
)
=
P r (x
t+1
= i | y
t
) P r (y
t+1
| x
t+1
= i)
P r (y
t+1
) /P r (y
t
)
=
P r (x
t+1
= i | y
t
) P r (y
t+1
| x
t+1
= i)
P
j
P r (x
t+1
= j | y
t
) P r (y
t+1
| x
t+1
= j)
=
P r (x
t+1
= i | y
t
) P r (y
t+1
| x
t+1
= i)
Normalization F actor
=
P
i
P r (x
t
= i
| y
t
) P r (x
t+1
= i | x
t
= i
) P r (y
t+1
| x
t+1
= i)
Normalization F actor
= T
t+1
(π
t|t
, y
t+1
, u
t
) for controlled Markov processes with p
ij
(u)
Suppose the initial distribution p
0
(i) = P r (x
0
= i) is given.
P r
x
0
= i | y
0
=
P r
x
0
= i, y
0
P r (y
0
)
=
P r
y
0
| x
0
= i
P r (x
0
= i)
P
i
P r (y
0
| x
0
= i
) P r (x
0
= i
)
We can write these formula in a matrix form. Suppose that
p
t|t
=
p(x
t
= 1|y
t
) p(x
t
= 2|y
t
) · · · p(x
t
= n|y
t
)
D
t
=
p(y
t
|x
t
= 1)
p(y
t
|x
t
= 2)
.
.
.
p(y
t
|x
t
= n)
, e =
1
1
.
.
.
1
Then, we have
p
t+1|t+1
=
p
t|t
P
t+1
D
t
p
t|t
P
t+1
D
t
e
p
0|0
=
p
0
D
0
p
0
D
0
e
where P
t
is the dynamic transition probability matrix and D
t
is given.
2 Optimal Control of A Noisily Observed MDP
Definition 2.1. Separation Principle (A principle of separation of estimation and control).
0. Input action u
k1
to a system and observe y
k
.
1. State estimation of π
k|k
with π
k1|k1
, y
k
, and u
k1
.
2. Generate action u
k
= g(π
k|k
) with state estimation π
k|k
and policy g.
The expected cost of history dependant policy can be written as
J
g
k
= E
g
"
N1
X
l=k
c
l
(x
l
, u
l
) + c
N
(X
N
) | z
k
#
2
where z
k
= (y
k
, u
k1
) = {y
1
, . . . , y
k
, u
1
, . . . , u
k1
}.
Theorem 2.2. Define
V
N
(π) = E
c
N
(x
N
) | π
N|N
= π
=
X
i
c
N
(i)π
i
V
k
(π
k|k
(z
k
)) = min
u
E
c(x
k
, u) + V
k+1
(T
k
(π
k|k
, y
k+1
, u)) | z
k
Then, we have (1) V
k
(π
k|k
(z
k
)) J
g
k
k, g and (2) if g is a separated policy which attains the “min”,
g is optimal.
Proof. We have J
g
N
= V
N
(π
N|N
). Suppose it is true for k = N, N 1, . . . , k + 1. Then
J
g
k
= E
g
"
N1
X
l=k
c
l
(x
l
, u
l
) + c
N
(x
N
) | z
k
#
= E
g
"
c
l
(x
k
, u
k
) +
N1
X
l=k+1
c
l
(x
l
, u
l
) + c
N
(x
N
) | z
k
#
= E
g
"
c
l
(x
k
, u
k
) + E
g
"
N1
X
l=k+1
c
l
(x
l
, u
l
) + c
N
(x
N
) | z
k+1
#
| z
k
#
E
g
c
l
(x
k
, u
k
) + V
k+1
(π
k+1|k+1
) | z
k
= E
g
c
l
(x
k
, u
k
) + V
k+1
(T
k
π
k|k
, y
k+1
, u
k
)
| z
k
= E
g
E
g
c
l
(x
k
, u
k
) + V
k+1
(T
k
π
k|k
, y
k+1
, u
k
)
| z
k
| z
k
E
g
h
min
u
E
g
c
l
(x
k
, u) + V
k+1
(T
k
π
k|k
, y
k+1
, u)
| z
k
| z
k
i
= E
g
V
k
(π
k|k
(z
k
)) | z
k
= V
k
(π
k|k
(z
k
))
3 Linear Quadratic Gaussian Systems
3.1 Gaussian Random Variable
Basic concepts:
Uncorrected: E
(X
¯
X)
(Y
¯
Y )
= 0
Independent: f
XY
(X, Y ) = f
X
(X)f
Y
(Y )
Independent uncorrected
Theorem 3.1. For jointly Gaussian random variables, independent uncorrected.
Proof.
f(x, y) =
1
2πσ
x
σ
y
(1 ρ
2
)
1/2
e
q /2
q =
1
1 ρ
2
"
x µ
x
σ
x
2
2ρ
x µ
x
σ
x
y µ
y
σ
y
+
y µ
y
σ
y
2
#
It reduces to a product of two normal densities when ρ = 0.
3
Theorem 3.2. Suppose X, Y f
XY
(X, Y ) in which Y is observable but X is not. Let g(Y ) be an
estimation of X based on Y . Denote g
as the one minimizes E
X g
(Y )
2
. Then, we have
g
(Y ) = E[X | Y ]
Proof.
E
h
(X g
(Y ))
(g
(Y ) g(Y ))
i
=E
h
E
h
(X g
(Y ))
(g
(Y ) g(Y ))
i
| Y
i
=E
h
(E[X | Y ] g
(Y ))
(g
(Y ) g(Y ))
i
0
E
X g(Y )
2
=E
X g
(Y )
2
+ E
g
(Y ) g(Y )
2
2E
h
(X g
(Y ))
(g
(Y ) g(Y ))
i
E
X g
(Y )
2
Theorem 3.3. Suppose (X, Y ) are jointly Gaussian distributed. Then we have
ˆ
X = E[X | Y ] = ¯x + Σ
xy
Σ
1
y y
(Y ¯y)
V ar(X
¯
X) = Σ
xx
Σ
xy
Σ
1
y y
Σ
y x
and
E[(X
¯
X)Y ] = 0
Theorem 3.4. Suppose (X, Y , Z) are jointly Gaussian distributed and Y and Z are independent.
Then we have
E [X | Y, Z] =
¯
X + Σ
XY
Σ
1
Y Y
(Y ¯y) + Σ
XZ
Σ
1
ZZ
(Z
¯
Z)
=
¯
X +
XY
, Σ
XZ
]
Σ
1
Y Y
0
0 Σ
1
Y Y
Y
¯
Y
Z
¯
Z
3.2 The Kalman Filtering Problem
Notation
x
t+1
= Ax
t
+ Gw
t
x
0
N (¯x
0
, Σ
0
) is independent with w
t
N (0, Q)
x
t
cannot be observed but we can observe y
t
= Cx
t
+ Hv
t
v
t
N (0, R)
We are interested in estimating current state and predict next state.
ˆ
x
t|t
= E [x
t
| y
t
] and Σ
t|t
= Cov
x
t
ˆ
x
t|t
ˆ
x
t+1|t
= E [x
t+1
| y
t
] and Σ
t+1|t
= Cov
x
t+1
ˆ
x
t+1|t
The estimation and prediction rely on two types of update.
Measurement update:
ˆ
x
t|t1
ˆ
x
t|t
and Σ
t|t1
Σ
t|t
ˆ
x
t|t
= E
x
t
| y
t1
, y
t
= E
h
x
t
| y
t1
,
˜
y
t|t1
i
=
ˆ
x
t|t1
+ Σ
x
t
˜
y
t|t1
Σ
1
˜
y
t|t1
˜
y
t|t1
˜
y
t|t1
4
˜
y
t|t1
= y
t
ˆy
t|t1
= y
t
E
Cx
t
+ Hv
t
| y
t
= y
t
C
ˆ
x
t|t1
= C
˜
x
t|t1
+ Hv
t
Σ
˜
y
t|t1
,
˜
y
t|t1
= Cov(y
y+1
ˆ
y
t|t1
)
= CΣ
t|t1
C
+ HRH
Σ
x
t
˜
y
t|t1
= E
h
x
t
C
˜
x
t|t1
+ Hv
t
i
= E
h
ˆ
x
t|t1
+
˜
x
t|t1
˜
x
t|t1
C
i
= E
h
˜
x
t|t1
˜
x
t|t1
i
C
= Σ
t|t1
C
where K
t
= Σ
x
t
˜
y
t|t1
Σ
1
˜
y
t|t1
˜
y
t|t1
is called Kalman gain.
Time update:
ˆ
x
t|t
ˆ
x
t+1|t
and Σ
t|t
Σ
t+1|t
ˆ
x
t+1|t
= E
x
t+1
| y
t
= E
Ax
t
+ Gw
t
| y
t
= AE
x
t
| y
t
+ GE
w
t
| y
t
= A
ˆ
x
t|t
Σ
t+1|t
= Cov(x
t+1
ˆ
x
t+1|t
)
= Cov
Ax
t
+ Gw
t
A
ˆ
x
t|t
= Cov
Ax
t
A
ˆ
x
t|t
+ Cov [Gw
t
]
= AΣ
t|t
A
+ GQG
Each iteration follows three steps.
Compute Kalman gain
K
t
= Σ
t|t1
C
CΣ
t|t1
C
+ HRH
1
Measurement update
ˆ
x
t|t
=
ˆ
x
t|t1
+ K
t
y
t
C
ˆ
x
t|t1
Σ
t|t
= (I K
t
C) Σ
t|t1
Time update
ˆ
x
t+1|t
= A
ˆ
x
t|t
Σ
t+1|t
= AΣ
t|t
A
+ GQG
5