A General Correction for Measurement Error

Zijing Hu

November 13, 2022

Hu (2008) shows that one can recover the true value of the interested variable using three measure-

ments (one of them needs to be binary) under certain assumptions.

Suppose that X, Y , and Z are observed variables (with measurement errors) that are dependent on

⋆

. X and Z need to be discretized if they are continuous measurements. Y ∈ {0, 1} is binary.

Assumption 1: X ⊥ Y ⊥ Z | X

⋆

or X ⊥ Y ⊥ Z | X

⋆

, W , where W are covariates, making the

assumption less restrictive. With this assumption, we have

P r(X, Y, Z) =

⋆

P r(X, Y, Z, X

⋆

)

⋆

P r(X | Y, Z, X

⋆

)P r(Y | Z, X

⋆

)P r(Z | X

⋆

)P r(X

⋆

)

⋆

P r(X | X

⋆

)P r(Y | X

⋆

)P r(Z | X

⋆

)P r(X

⋆

) given the assumption

Thus, we can rewrite above formula in a matrix form

XY Z

= M

X|X

⋆

Y |X

⋆

′

⋆

where

XY Z







P r(Y, X = x

, Z = z

) · · · P r(Y, X = x

, Z = z

)

P r(Y, X = x

, Z = z

) · · · P r(Y, X = x

, Z = z

)







X|X

⋆







P r(X = x

| X

⋆

= x

⋆

) · · · P r(X = x

| X

⋆

= x

⋆

)

P r(X = x

| X

⋆

= x

⋆

) · · · P r(X = x

| X

⋆

= x

⋆

)







Y |X

⋆







P r(Y | X

⋆

= x

⋆

)

P r(Y | X

⋆

= x

⋆

)







⋆







P r(Z = z

, X

⋆

= x

⋆

) · · · P r(Z = z

, X

⋆

= x

⋆

)

P r(Z = z

, X

⋆

= x

⋆

) · · · P r(Z = z

, X

⋆

= x

⋆

)







We also have

P r(X, Z) =

⋆

P r(X, Z, X

⋆

)

⋆

P r(X | Z, X

⋆

)P r(Z | X

⋆

)P r(X

⋆

)

⋆

P r(X | X

⋆

)P r(Z | X

⋆

)P r(X

⋆

) given the assumption

Therefore, the matrix form of the above formula is

= M

X|X

⋆

′

⋆

Assumption 2: M

⋆

is a full rank matrix (invertible), which can be tested. Then we have

−1

= M

′

−1

⋆

−1

X|X

⋆

XY Z

−1

= M

X|X

⋆

Y |X

⋆

′

⋆

′

−1

⋆

−1

X|X

⋆

= M

X|X

⋆

Y |X

⋆

−1

X|X

⋆

= BΛB

−1

where Λ and B can be obtained using eigendecomposition.

Computational tricks

• In practice, Y need to be ﬁxed, i.e., Y = y.

• We use control variable W to ensure that Assumption 1 holds.

• One need make reasonable assumptions, e.g., P r(y | X

⋆

= 1, w) > P r(y | X

⋆

= 0, w), to re-rank

diagonal elements in Λ and match them with true values.

• Elements in the same column in B need to be normalized to sum 1 as the columns of matrix B

are probabilistic distributions. An alternative way to constrain parameters is to use extremum

estimator:

θ = arg min



XyZ|w

−1

XZ|w

− M

X|X

⋆

y |X

⋆

−1

X|X

⋆



s.t. (i) θ

∈ [0, 1] ∀i

(ii) P r(y | X

⋆

, w) is a monotonic function of X

⋆

(iii) P r(X = x | X

⋆

= i, w) = P r(X = x | X

⋆

= j, w) ∀ i = j

where



XyZ|w



j,k

i=1

= x

, Y

= y, Z

= z

)

= w) /

i=1

= w)



XZ|w



j,k

i=1

= x

, Z

= z

)

= w) /

i=1

= w)

References

Hu, Yingyao (2008). “Identiﬁcation and estimation of nonlinear models with misclassiﬁcation error

using instrumental variables: A general solution”. In: Journal of Econometrics 144(1), pp. 27–61.