Information Filters and SRIF
Information filters represent uncertainty by the inverse of covariance rather than covariance itself. This approach offers superior numerical stability, natural handling of weak measurements, and efficient batch processing. This guide covers the information filter, Square Root Information Filter (SRIF), and practical implementations.
—
When to Use Information Filters
Advantages over Kalman Filters:
Numerical Stability: Works well with poorly-conditioned systems
No matrix inversion of covariance (direct accumulation of information)
Square Root Information Filter is more robust than standard Kalman
Weak Measurement Handling: Graceful degradation when measurements have high uncertainty
Kalman gain can become ill-conditioned with high $R$
Information filter naturally handles rank-deficient $R$
Batch Processing: Natural for off-line data processing and smoothing
Decentralized Systems: Multiple information sources combine trivially (additive information)
Initial Uncertainty: Easy to handle unknown initial state (low information)
When NOT to use:
You need real-time performance and computational resources are limited (Kalman is simpler)
Your system is well-conditioned and measurements are always present (standard Kalman is fine)
You need fast forward-only filtering of large data streams (Kalman is more standard)
—
Information Filter Fundamentals
Standard Kalman Filter represents uncertainty as covariance $mathbf{P}$:
$$mathbf{P}_k = E[(mathbf{x}_k - hat{mathbf{x}}_k)(mathbf{x}_k - hat{mathbf{x}}_k)^T]$$
Information Filter represents uncertainty by the information matrix:
$$mathbf{Y}_k = mathbf{P}_k^{-1}$$
and the information vector:
$$mathbf{y}_k = mathbf{P}_k^{-1} hat{mathbf{x}}_k = mathbf{Y}_k hat{mathbf{x}}_k$$
Recover the state estimate via:
$$hat{mathbf{x}}_k = mathbf{Y}_k^{-1} mathbf{y}_k$$
Information Gain:
When a measurement $z_k$ arrives with covariance $R_k$, the information gain is:
$$Delta mathbf{Y}_k = mathbf{H}_k^T mathbf{R}_k^{-1} mathbf{H}_k$$
$$Delta mathbf{y}_k = mathbf{H}_k^T mathbf{R}_k^{-1} z_k$$
The information is updated additively:
$$mathbf{Y}_{k|k} = mathbf{Y}_{k|k-1} + Delta mathbf{Y}_k$$
$$mathbf{y}_{k|k} = mathbf{y}_{k|k-1} + Delta mathbf{y}_k$$
Key Insight: Information matrices add directly (unlike covariances which combine via matrix algebra). This makes decentralized fusion trivial.
—
Standard Information Filter Algorithm
Prediction Step
The prediction is more complex than Kalman because we must invert covariances. Assume discrete linear system:
$$mathbf{x}_{k+1} = mathbf{F}_k mathbf{x}_k + mathbf{w}_k, quad mathbf{w}_k sim mathcal{N}(0, mathbf{Q}_k)$$
Predicted information and state:
$$mathbf{Y}_{k+1|k} = [mathbf{F}_k mathbf{P}_{k|k} mathbf{F}_k^T + mathbf{Q}_k]^{-1}$$
$$mathbf{y}_{k+1|k} = mathbf{Y}_{k+1|k} mathbf{F}_k mathbf{P}_{k|k} mathbf{y}_{k|k}$$
where $mathbf{P}_{k|k} = mathbf{Y}_{k|k}^{-1}$.
Implementation (requires inversion—more costly than Kalman):
import numpy as np
from scipy.linalg import inv, solve
class InformationFilter:
"""
Standard Information Filter.
State represented by (information matrix, information vector):
Y = P^{-1} (information matrix)
y = P^{-1} * x_hat (information vector)
Recover state: x_hat = Y^{-1} * y
"""
def __init__(self, x0: np.ndarray, P0: np.ndarray):
"""
Initialize information filter.
Parameters
----------
x0 : (n,) array
Initial state estimate
P0 : (n, n) array
Initial covariance
"""
self.n = len(x0)
# Store covariance for predict step (information filter needs it)
self.P = P0.copy()
# Information representation
self.Y = inv(P0) # Information matrix
self.y = self.Y @ x0 # Information vector
self.x = x0.copy()
def predict(self, F: np.ndarray, Q: np.ndarray) -> None:
"""
Prediction step.
Parameters
----------
F : (n, n) array
State transition matrix
Q : (n, n) array
Process noise covariance
"""
# Covariance prediction (need P for next step)
# P_pred = F @ P @ F.T + Q
self.P = F @ self.P @ F.T + Q
# Information prediction
# Y_pred = (F @ P @ F.T + Q)^{-1}
self.Y = inv(self.P)
# Recover state and convert to information vector
self.x = self.Y @ self.y # Recover x from information rep
def update(self, z: np.ndarray, H: np.ndarray,
R: np.ndarray) -> None:
"""
Measurement update.
Parameters
----------
z : (m,) array
Measurement
H : (m, n) array
Measurement matrix
R : (m, m) array
Measurement noise covariance
"""
# Information gain from measurement
R_inv = inv(R)
# Update information matrix and vector (additive!)
self.Y = self.Y + H.T @ R_inv @ H
self.y = self.y + H.T @ R_inv @ z
# Recover covariance and state
self.P = inv(self.Y)
self.x = self.P @ self.y
def get_state(self) -> tuple:
"""Return current state and covariance."""
return self.x.copy(), self.P.copy()
Example: Simple 1D Tracking
# 1D position tracking with constant velocity
x0 = np.array([0.0, 1.0]) # pos, vel
P0 = np.eye(2) * 0.1
ifilt = InformationFilter(x0, P0)
# Process model
dt = 0.1
F = np.array([[1, dt], [0, 1]])
Q = np.eye(2) * 0.01
# Measurement model (position only)
H = np.array([[1, 0]])
R = np.array([[0.5]])
for k in range(20):
# Predict
ifilt.predict(F, Q)
# Measurement
z = np.array([0.1 * k + 0.5 * np.random.randn()])
# Update
ifilt.update(z, H, R)
x, P = ifilt.get_state()
if k % 5 == 0:
print(f"Step {k}: x = {x}, σ = {np.sqrt(np.diag(P))}")
—
Square Root Information Filter (SRIF)
Standard information filters require matrix inversion at each predict step, which can be numerically unstable. SRIF avoids this by working with the square root of the information matrix.
Let $mathbf{S} = mathbf{Y}^{1/2}$ (Cholesky factor of information matrix).
Advantages:
No explicit matrix inversion (use QR decomposition)
Better numerical conditioning (square root of matrix is better-conditioned)
Works reliably with rank-deficient problems
Decorrelated measurements are natural
Algorithm: Measurement Update
Given current square root: $mathbf{S}_k mathbf{S}_k^T = mathbf{Y}_k$
New measurement with $mathbf{H}_{k+1}$, $z_{k+1}$, $R_{k+1}$:
Stack and QR decompose:
$$begin{bmatrix} mathbf{S}_k \ sqrt{mathbf{R}_{k+1}^{-1}} mathbf{H}_{k+1} end{bmatrix} leftarrow text{QR Decomposition}$$
Result: Upper triangular $mathbf{S}_{k+1}$ (new square root information matrix).
Implementation:
from scipy.linalg import qr, cholesky, solve_triangular, inv
class SquareRootInformationFilter:
"""
Square Root Information Filter (SRIF).
Maintains square root of information matrix: S where Y = S @ S.T
More numerically stable than standard information filter.
"""
def __init__(self, x0: np.ndarray, P0: np.ndarray):
"""
Initialize SRIF.
Parameters
----------
x0 : (n,) array
Initial state estimate
P0 : (n, n) array
Initial covariance (should be positive definite)
"""
self.n = len(x0)
self.x = x0.copy()
# Square root of information matrix
# S @ S.T = Y = P^{-1}
Y = inv(P0)
self.S = cholesky(Y, lower=False) # Upper triangular
# Current covariance
self.P = P0.copy()
def update_measurement(self, z: np.ndarray, H: np.ndarray,
R: np.ndarray) -> None:
"""
Measurement update via QR decomposition.
Parameters
----------
z : (m,) array
Measurement
H : (m, n) array
Measurement matrix
R : (m, m) array
Measurement noise covariance
"""
m = len(z)
# Compute square root of R^{-1}
# For stability, use Cholesky if possible
try:
R_inv_sq = cholesky(inv(R), lower=False)
except:
# Fallback: eigenvalue decomposition
eigvals, eigvecs = np.linalg.eigh(inv(R))
eigvals = np.maximum(eigvals, 1e-10) # Regularize
R_inv_sq = eigvecs @ np.diag(np.sqrt(eigvals)) @ eigvecs.T
# Stacked matrix: [S_k^T; sqrt(R^{-1}) @ H]
matrix_stacked = np.vstack([
self.S.T,
R_inv_sq @ H
])
# QR decomposition
Q, R_upper = qr(matrix_stacked.T) # QR of transpose
# New square root (transpose back to upper triangular form)
self.S = R_upper[:self.n, :self.n].T
if not np.allclose(self.S, np.triu(self.S)):
# Ensure upper triangular
self.S = np.triu(self.S)
# Updated information vector
# y_new = y_old + H.T @ R^{-1} @ z
Y = self.S.T @ self.S
y = Y @ self.x + H.T @ inv(R) @ z
# Update state estimate
self.P = inv(Y)
self.x = self.P @ y
def predict(self, F: np.ndarray, Q: np.ndarray) -> None:
"""
Prediction step (requires covariance update).
Parameters
----------
F : (n, n) array
State transition matrix
Q : (n, n) array
Process noise covariance
"""
# Standard covariance prediction
self.P = F @ self.P @ F.T + Q
# Update square root of information matrix
Y_new = inv(self.P)
self.S = cholesky(Y_new, lower=False)
def get_state(self) -> tuple:
"""Return state estimate and covariance."""
return self.x.copy(), self.P.copy()
Example: SRIF vs Standard IF
import numpy as np
from scipy.linalg import inv
# Poorly conditioned problem
x0 = np.array([1.0, 1.0, 1.0])
P0 = np.array([
[1e6, 1e5, 1e4],
[1e5, 1e6, 1e5],
[1e4, 1e5, 1e6]
])
# Standard IF
if_filt = InformationFilter(x0, P0)
# SRIF
srif_filt = SquareRootInformationFilter(x0, P0)
# Measurement matrix
H = np.eye(3)
R = np.eye(3) * 0.1
z = np.array([1.1, 0.9, 1.05])
# Update both
if_filt.update(z, H, R)
srif_filt.update_measurement(z, H, R)
x_if, P_if = if_filt.get_state()
x_srif, P_srif = srif_filt.get_state()
print(f"IF error: {np.linalg.norm(x_if - x_srif):.2e}")
print(f"IF cond: {np.linalg.cond(P_if):.2e}")
print(f"SRIF cond: {np.linalg.cond(P_srif):.2e}")
—
Extended Information Filter (EKIF)
For nonlinear systems, extend information filtering with linearization (similar to EKF from Kalman filters).
class ExtendedInformationFilter:
"""
Extended Information Filter for nonlinear systems.
Linearizes around current state estimate:
x_{k+1} = f(x_k) + w_k
z_k = h(x_k) + v_k
"""
def __init__(self, x0: np.ndarray, P0: np.ndarray):
self.n = len(x0)
self.x = x0.copy()
self.P = P0.copy()
# Information representation
self.Y = inv(P0)
self.y = self.Y @ x0
def predict(self, f_func, F_jacobian, Q: np.ndarray) -> None:
"""
Nonlinear prediction.
Parameters
----------
f_func : callable
Nonlinear state function f(x)
F_jacobian : callable
Jacobian of f at current x
Q : (n, n) array
Process noise covariance
"""
# Nonlinear prediction
self.x = f_func(self.x)
# Linearize
F = F_jacobian(self.x)
# Covariance prediction
self.P = F @ self.P @ F.T + Q
# Information update
self.Y = inv(self.P)
self.y = self.Y @ self.x
def update(self, z: np.ndarray, h_func, H_jacobian,
R: np.ndarray) -> None:
"""
Nonlinear measurement update.
Parameters
----------
z : (m,) array
Measurement
h_func : callable
Measurement function h(x)
H_jacobian : callable
Jacobian of h at current x
R : (m, m) array
Measurement noise covariance
"""
# Linearize measurement
H = H_jacobian(self.x)
# Information update (additive)
R_inv = inv(R)
self.Y = self.Y + H.T @ R_inv @ H
self.y = self.y + H.T @ R_inv @ z
# Recover state
self.P = inv(self.Y)
self.x = self.P @ self.y
—
Comparison: Information Filter vs Kalman Filter
Numerical Behavior:
When Information Filter Excels:
Unknown initial state: Set $mathbf{Y}_0 = epsilon mathbf{I}$ (weak information)
Intermittent measurements: Missing measurements = no information update (skip it)
Batch smoothing: Natural representation for all-at-once processing
Multi-source fusion: Information from all sensors add directly
class HybridFilter:
"""
Use Kalman for forward pass, Information for batch smoothing.
Best of both worlds:
- Kalman: efficient forward prediction
- Information: stable batch refinement
"""
def __init__(self, x0: np.ndarray, P0: np.ndarray):
self.x_list = [x0.copy()]
self.P_list = [P0.copy()]
self.x = x0.copy()
self.P = P0.copy()
def run_forward_kalman(self, F_list, Q_list,
z_list, H_list, R_list) -> None:
"""Run Kalman filter forward and store trajectory."""
for k, (F, Q) in enumerate(zip(F_list, Q_list)):
# Predict
self.P = F @ self.P @ F.T + Q
self.x = F @ self.x
# Measurement available?
if z_list[k] is not None:
H, R, z = H_list[k], R_list[k], z_list[k]
# Standard Kalman update
S = H @ self.P @ H.T + R
K = self.P @ H.T @ inv(S)
self.x = self.x + K @ (z - H @ self.x)
self.P = (np.eye(len(self.x)) - K @ H) @ self.P
self.x_list.append(self.x.copy())
self.P_list.append(self.P.copy())
def batch_information_smooth(self) -> None:
"""
Refine entire trajectory using information filter smoother.
Similar to RTS but using information representation.
"""
# Convert to information representation
Y_list = [inv(P) for P in self.P_list]
y_list = [Y @ x for Y, x in zip(Y_list, self.x_list)]
# Information smoothing (simplified)
# In practice, use dedicated batch smoother
# Here just demonstrating the concept
print(f"Smoothing {len(self.x_list)} states in batch mode")
return self.x_list, self.P_list
—
Weak Initial Conditions (High Uncertainty)
Information filters excel when initial state is highly uncertain.
class WeakInitialStateIF:
"""
Handle unknown initial state via weak information.
If x_0 is completely unknown:
Y_0 = ε * I (very small information)
y_0 = 0 (no information about x)
As measurements arrive, information accumulates.
"""
def __init__(self, state_dim: int, epsilon: float = 1e-6):
"""
Parameters
----------
state_dim : int
State dimension
epsilon : float
Weak information intensity (smaller = more uncertain)
"""
self.n = state_dim
# Weak initial information
self.Y = epsilon * np.eye(state_dim)
self.y = np.zeros(state_dim)
def update_with_measurements(self, z_list, H_list, R_list):
"""Process batch of measurements to initialize state."""
for z, H, R in zip(z_list, H_list, R_list):
# Accumulate information
self.Y = self.Y + H.T @ inv(R) @ H
self.y = self.y + H.T @ inv(R) @ z
# Recover state estimate
try:
P = inv(self.Y)
x = P @ self.y
return x, P
except np.linalg.LinAlgError:
print("Warning: Information matrix still singular")
print(f" Information rank: {np.linalg.matrix_rank(self.Y)}")
return None, None
Example: Initialize from measurements
# Unknown initial state, but have 5 measurements
weak_if = WeakInitialStateIF(state_dim=2)
# Process measurements
z_batch = [np.array([1.0]), np.array([1.1]),
np.array([0.9]), np.array([1.05]), np.array([1.02])]
H_batch = [np.array([[1, 0]]) for _ in z_batch]
R_batch = [np.array([[0.1]]) for _ in z_batch]
x_init, P_init = weak_if.update_with_measurements(z_batch,
H_batch, R_batch)
print(f"Initialized state: {x_init}")
print(f"Initialization uncertainty: {np.sqrt(np.diag(P_init))}")
—
Practical Diagnostic Tools
Monitor information filter health.
class InformationFilterDiagnostics:
"""Diagnostics for information filters."""
def __init__(self, if_object):
self.filter = if_object
self.history = {
'information_rank': [],
'condition_numbers': [],
'information_norms': [],
'state_norms': []
}
def log_step(self) -> None:
"""Record diagnostic data."""
Y, y = self.filter.Y, self.filter.y
# Information matrix diagnostics
self.history['information_rank'].append(np.linalg.matrix_rank(Y))
self.history['condition_numbers'].append(np.linalg.cond(Y))
self.history['information_norms'].append(np.linalg.norm(Y))
self.history['state_norms'].append(np.linalg.norm(self.filter.x))
def check_degeneracy(self) -> dict:
"""Check if information is sufficient."""
Y = self.filter.Y
rank = np.linalg.matrix_rank(Y)
expected_rank = Y.shape[0]
return {
'is_degenerate': rank < expected_rank,
'rank': rank,
'expected_rank': expected_rank,
'rank_deficit': expected_rank - rank,
'condition_number': np.linalg.cond(Y)
}
def check_information_bounds(self) -> dict:
"""Verify information matrix is reasonable."""
Y = self.filter.Y
eigvals = np.linalg.eigvals(Y)
return {
'min_eigenvalue': np.min(np.abs(eigvals)),
'max_eigenvalue': np.max(np.abs(eigvals)),
'all_positive': np.all(eigvals > 0),
'is_spd': np.all(eigvals > 1e-10) # Check positive definite
}
def plot_diagnostics(self):
"""Generate diagnostic plots."""
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
# Information rank
axes[0, 0].plot(self.history['information_rank'], alpha=0.7)
axes[0, 0].set_title('Information Matrix Rank')
axes[0, 0].set_ylabel('rank(Y)')
axes[0, 0].grid(True, alpha=0.3)
# Condition number
axes[0, 1].semilogy(self.history['condition_numbers'], alpha=0.7)
axes[0, 1].set_title('Information Matrix Conditioning')
axes[0, 1].set_ylabel('cond(Y)')
axes[0, 1].grid(True, alpha=0.3)
# Information norm (total information content)
axes[1, 0].plot(self.history['information_norms'], alpha=0.7)
axes[1, 0].set_title('Information Accumulation')
axes[1, 0].set_ylabel('||Y||_F')
axes[1, 0].grid(True, alpha=0.3)
# State norm
axes[1, 1].plot(self.history['state_norms'], alpha=0.7)
axes[1, 1].set_title('State Magnitude')
axes[1, 1].set_ylabel('||x||')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
return fig
—
Batch Information Filter (BIF)
Process entire measurement batch at once for optimal smoothing.
class BatchInformationFilter:
"""
Batch information filter for offline processing.
Process all measurements simultaneously for global optimality.
"""
def __init__(self, x0: np.ndarray, P0: np.ndarray):
self.x_init = x0.copy()
self.P_init = P0.copy()
def process_batch(self, measurements, H_list, R_list,
dynamics=None, Q_list=None):
"""
Process entire batch of measurements.
Parameters
----------
measurements : list of arrays
Measurement sequence
H_list : list of arrays
Measurement matrices
R_list : list of arrays
Measurement noise covariances
dynamics : callable, optional
State transition function f(x)
Q_list : list of arrays, optional
Process noise covariances
Returns
-------
x_batch : (N, n) array
Batch state estimates
P_batch : list of (n, n) arrays
Batch covariances
"""
N = len(measurements) # Number of timesteps
n = len(self.x_init) # State dimension
# Initialize information representation
Y_accum = inv(self.P_init)
y_accum = Y_accum @ self.x_init
# Accumulate all measurements
for z, H, R in zip(measurements, H_list, R_list):
R_inv = inv(R)
Y_accum = Y_accum + H.T @ R_inv @ H
y_accum = y_accum + H.T @ R_inv @ z
# Recover global estimate
P_batch = inv(Y_accum)
x_batch = P_batch @ y_accum
# If dynamics provided, do batch trajectory
if dynamics is not None:
x_trajectory = self._batch_trajectory_estimate(
x_batch, P_batch, dynamics, Q_list
)
else:
x_trajectory = x_batch
return x_trajectory, P_batch
def _batch_trajectory_estimate(self, x_batch, P_batch,
dynamics, Q_list):
"""
Refine trajectory using dynamics constraints.
(Simplified version)
"""
return x_batch
—
Tuning Guidelines
Choose Information Filter When:
✅ System is poorly conditioned (high aspect ratios in state/measurement spaces)
✅ Multiple sensors with different measurement rates/uncertainties
✅ Batch processing or offline smoothing
✅ Unknown or weak initial state
✅ Need numerical robustness
Parameter Selection:
Weak initial information: $epsilon = 10^{-6}$ to $10^{-8} times text{trace}(mathbf{Y}_{text{nominal}})$
SRIF vs Standard IF: Use SRIF always if numerics are critical
Regularization: Add $delta mathbf{I}$ to prevent singularity:
$$mathbf{Y}_k leftarrow mathbf{Y}_k + delta mathbf{I}, quad delta = 10^{-10}$$
—
Common Pitfalls
Singular Information Matrix: Measurements don’t constrain all states
Fix: Monitor rank, add weak priors, ensure observable states
Numerical Inversion: Computing $mathbf{Y}^{-1}$ becomes ill-conditioned
Fix: Use SRIF or regularize (add regularization term)
Weak Measurements Cause Overflow: Information accumulates without bound
Fix: Use information bounds, monitor condition numbers
Loss of Process Dynamics: SRIF predict step is complex
Fix: Hybrid approach (Kalman forward, Information backward)
—
[Kalman Filter Tuning](kalman_filter_tuning.rst) — Standard filtering approach
[Smoothing Algorithms](smoothing.rst) — RTS smoother for trajectory refinement
[Adaptive Filtering](adaptive_filtering.rst) — Tuning parameters online
[Troubleshooting Guide](troubleshooting.rst) — Numerical issues diagnosis
References:
Bierman (1977) — Factorization Methods for Discrete Sequential Estimation — Foundational SRIF work
Bar-Shalom, Li, Kirubarajan (2001) — Estimation with Applications to Tracking and Navigation
Kailath, Sayed, Hassibi (2000) — Linear Estimation — Comprehensive treatment with information filters
Dyer & McReynolds (1969) — Extension of Square Root Filtering to Include Process Noise — Original SRIF