First-Estimate Jacobian Estimators

The observability and consistency of VINS is crucial due to its ability to provide: (i) the minimal conditions for initialization, (ii) insights into what states are unrecoverable, and (iii) degenerate motions which can hurt the performance of the system. Naive EKF-based VINS estimators have been shown to be inconsistent due to spurious information gain along unobservable directions and have required the creation of "observability aware" filters which explicitly address the inaccurate information gains causing filter over-confidence (the estimated uncertainty is smaller than the true). Bar-Shalom et al. [2] make a convincing argument of why estimator consistency is crucial for accuracy and robust estimation:

Since the filter gain is based on the filter-calculated error covariances, it follows that consistency is necessary for filter optimality: Wrong covariance yield wrong gain. This is why consistency evaluation is vital for verifying a filter design – it amounts to evaluation of estimator optimality.

In this section, we will introduce First-estimates Jacobian (FEJ) [20] [21] methodology which can guarantee the observability properties of VINS and improve the estimation consistency.

EKF Linearized Error-State System

When developing an Extended Kalman filter (EKF), one needs to linearize the nonlinear motion and measurement models about some linearization points. This linearization is one of the sources of error causing inaccuracies in the estimates (in addition to, for example, model errors and measurement noise). To simplify the derivations, in what follows, we consider the state contains the inertial navigation state and a single environmental feature (no biases).

\begin{align*} \mathbf{x}_k &= \begin{bmatrix} {}_G^{I_{k}} \bar{q}{}^{\top} & {}^G\mathbf{p}_{{I}_{k}}^{\top} & {}^G\mathbf{v}_{{I}_{k}}^{\top} & {}^G\mathbf{p}_{f}^{\top} \end{bmatrix}^{\top} \end{align*}

We refer the reader to [21] [27] [18] [19] [6] for more detailed derivations of the full state variables. Let us consider the following simplified linearized error-state visual-inertial system with the IMU kinematic motion model and camera measurement update:

\begin{align*} \tilde{\mathbf{x}}_{k|k-1} &= \mathbf{\Phi}_{(k,k-1)}~\tilde{\mathbf{x}}_{k-1|k-1} + \mathbf{G}_{k}\mathbf{w}_{k} \\ \tilde{\mathbf{z}}_{k} &= \mathbf{H}_{k}~\tilde{\mathbf{x}}_{k|k-1}+\mathbf{n}_{k} \end{align*}

Note that we use the left quaternion error state (see [Indirect Kalman Filter for 3D Attitude Estimation] [40] and ov_type::JPLQuat for details). For simplicity we assume that the camera and IMU frame have an identity transform. The state-transition (or system Jacobian) matrix from timestep k-1 to k can be derived as (see [IMU Propagation Derivations] for more details):

\begin{align*} \mathbf{\Phi}_{(k,k-1)}&= \begin{bmatrix} {}^{I_{k}}_{I_{k-1}}\mathbf R & \mathbf 0_{3\times3} & \mathbf 0_{3\times3} & \mathbf 0_{3\times3} \\[1em] \empty -{}^{I_{k-1}}_{G}\mathbf{R}^\top \lfloor \boldsymbol\alpha(k,k-1) \times\rfloor & \mathbf I_{3\times3} & (t_{k}-t_{k-1})\mathbf I_{3\times3} & \mathbf 0_{3\times3} \\[1em] \empty -{}^{I_{k-1}}_{G}\mathbf{R}^\top \lfloor \boldsymbol\beta(k,k-1) \times\rfloor & \mathbf 0_{3\times3} & \mathbf I_{3\times3} & \mathbf 0_{3\times3} \\[1em] \empty \mathbf 0_{3\times3} & \mathbf 0_{3\times3} & \mathbf 0_{3\times3} & \mathbf I_{3\times3} \end{bmatrix} \\[2em] \boldsymbol\alpha(k,k-1) &= \int_{t_{k-1}}^{k} \int_{t_{k-1}}^{s} {}^{I_{k-1}}_{\tau}\mathbf R (\mathbf a(\tau)-\mathbf b_a - \mathbf w_a) d\tau ds \\ \boldsymbol\beta(k,k-1) &= \int_{t_{k-1}}^{t_k} {}^{I_{k-1}}_{\tau}\mathbf R (\mathbf a(\tau)-\mathbf b_a - \mathbf w_a) d\tau \end{align*}

where $\mathbf a(\tau)$ is the true acceleration at time $\tau $ , ${}^{I_{k}}_{I_{k-1}}\mathbf R$ is computed using the gyroscope angular velocity measurements, and ${}^{G}\mathbf g = [0 ~ 0 ~ 9.81]^\top$ is gravity in the global frame of reference. During propagation one would need to solve these integrals using either analytical or numerical integration.

We can compute the measurement Jacobian of a given feature based on the perspective projection camera model at the k-th timestep as follows (see [Camera Measurement Update] for more details):

\begin{align*} \mathbf{H}_{k} &= \mathbf H_{proj,k}~\mathbf H_{state,k} \\ &= \begin{bmatrix} \frac{1}{{}^Iz} & 0 & \frac{-{}^Ix}{({}^Iz)^2} \\ 0 & \frac{1}{{}^Iz} & \frac{-{}^Iy}{({}^Iz)^2} \\ \end{bmatrix} \begin{bmatrix} \lfloor {}^{I_k}_{G}\mathbf{R}({}^{G}\mathbf{p}_f-{}^{G}\mathbf{p}_{I_k}) \times\rfloor & -{}^{I_k}_{G}\mathbf{R} & \mathbf 0_{3\times3} & {}^{I_k}_{G}\mathbf{R} \end{bmatrix} \\ &= \mathbf H_{proj,k}~ {}^{I_k}_{G}\mathbf{R} \begin{bmatrix} \lfloor ({}^{G}\mathbf{p}_f-{}^{G}\mathbf{p}_{I_k}) \times\rfloor {}^{I_k}_{G}\mathbf{R}^\top & -\mathbf I_{3\times3} & \mathbf 0_{3\times3} & \mathbf I_{3\times3} \end{bmatrix} \end{align*}

Linearized System Observability

System observability plays a crucial role in state estimation since it provides a deep insight about the system's geometrical properties and determines the minimal measurement modalities needed. With the state translation matrix, $\mathbf{\Phi}_{(k,0)}$ , and measurement Jacobian at timestep k, $\mathbf{H}_{k}$ , the observability matrix of this linearized system is defined by:

\begin{align*} \mathcal{O}= \begin{bmatrix} \mathbf{H}_{0}\mathbf{\Phi}_{(0,0)} \\ \mathbf{H}_{1}\mathbf{\Phi}_{(1,0)} \\ \mathbf{H}_{2}\mathbf{\Phi}_{(2,0)} \\ \vdots \\ \mathbf{H}_{k}\mathbf{\Phi}_{(k,0)} \\ \end{bmatrix} \end{align*}

If $\mathcal{O}$ is of full column rank then the system is fully observable. A nullspace of it (i.e., $ \mathbf{H}_{k}\mathbf{\Phi}_{(k,0)}\mathcal{N} = \mathbf 0 $ ) describes the unobservable state subspace which can not be recovered with given measurements. Note that while we simplify here and only consider the block row of the observability matrix when performing observability analysis, we need to ensure that this nullsapce holds for the entire matrix (i.e., each block row share the same nullspace). This can be achieved by ensuring that the nullspace does not vary with time, nor contain any measurements. For the k-th block row of this $\mathcal{O}$ matrix, we have:

\begin{align*} \mathbf{H}_{k}\mathbf{\Phi}_{(k,0)} &= \empty \mathbf H_{proj,k}~ {}^{I_k}_{G}\mathbf{R} \begin{bmatrix} \boldsymbol\Gamma_1 & \boldsymbol\Gamma_2 & \boldsymbol\Gamma_3 & \boldsymbol\Gamma_4 \end{bmatrix} \\[2em] \empty \empty \boldsymbol\Gamma_1 &= \Big\lfloor \Big({}^{G}\mathbf{p}_f-{}^{G}\mathbf{p}_{I_k} + {}^{I_{0}}_{G}\mathbf{R}^\top \boldsymbol\alpha(k,0) \Big) \times\Big\rfloor {}^{I_0}_{G}\mathbf{R}^\top \\ &= \Big\lfloor \Big({}^{G}\mathbf{p}_f- {}^{G}\mathbf{p}_{I_0}-{}^{G}\mathbf{v}_{I_0}(t_k-t_0)-\frac{1}{2}{}^G\mathbf{g}(t_k-t_0)^2 \Big) \times\Big\rfloor {}^{I_0}_{G}\mathbf{R}^\top \\ \boldsymbol\Gamma_2 &= -\mathbf I_{3\times3} \\ \boldsymbol\Gamma_3 &= -(t_{k}-t_0) \mathbf I_{3\times3} \\ \boldsymbol\Gamma_4 &= \mathbf I_{3\times3} \end{align*}

It is straightforward to verify the right nullspace spans four directions, i.e.,:

\begin{align*} \mathcal{N} &= \begin{bmatrix} {}^{I_{0}}_{G}\mathbf{R}{}^G\mathbf{g} & \mathbf 0_{3\times3} \\ -\lfloor {}^{G}\mathbf{p}_{I_0} \times\rfloor{}^G\mathbf{g} & \mathbf{I}_{3\times3} \\ -\lfloor {}^{G}\mathbf{v}_{I_0} \times\rfloor{}^G\mathbf{g} & \mathbf{0}_{3\times3} \\ -\lfloor {}^{G}\mathbf{p}_{f} \times\rfloor{}^G\mathbf{g} & \mathbf{I}_{3\times3} \\ \end{bmatrix} \end{align*}

where $ \mathcal{N} $ should be 4dof corresponding to global rotation about the gravity (yaw) and global translation of our visual-inertial systems.

First Estimate Jacobians

It has been showed that standard EKF based-VINS, which always computes the state translation matrix and the measurement Jacobian using the current state estimates, has the global yaw orientation appear to be observable and has an incorrectly reduced 3dof nullspace dimention [18]. This causes the filter mistakenly gaining extra information and becoming overconfident in the yaw.

To solve this issue, the First-Estimate Jacobains (FEJ) [20] [21] methodology can be applied. It evaluates the linearized system state transition matrix and Jacobians at the same estimate (i.e., the first estimates) over all time periods and thus ensures that the 4dof unobservable VINS subspace does not gain spurious information. The application of FEJ is simple yet effective, let us consider how to modify the propagation and update linearizations.


Let us first consider a small thought experiment of how the standard Kalman filter computes its state transition matrix. From a timestep zero to one it will use the current estimates from state zero forward in time. At the next timestep after it updates the state with measurements from other sensors, it will compute the state transition with the updated values to evolve the state to timestep two. This causes a miss-match in the "continuity" of the state transition matrix which when multiply sequentially should represent the evolution from time zero to time two.

\begin{align*} \mathbf{\Phi}_{(k+1,k-1)}(\mathbf{x}_{k+1|k},\mathbf{x}_{k-1|k-1}) \neq \mathbf{\Phi}_{(k+1,k)}(\mathbf{x}_{k+1|k},\mathbf{x}_{k|k}) ~ \mathbf{\Phi}_{(k,k-1)}(\mathbf{x}_{k|k-1},\mathbf{x}_{k-1|k-1}) \end{align*}

As shown above, we wish to compute the state transition matrix from the k-1 timestep given all k-1 measurements up until the current propagated timestep k+1 given all k measurements. The right side of the above equation is how one would normally perform this in a Kalman filter framework. $\mathbf{\Phi}_{(k,k-1)}(\mathbf{x}_{k|k-1},\mathbf{x}_{k-1|k-1})$ corresponds to propagating from the k-1 update time to the k timestep. One would then normally perform the k'th update to the state and then propagate from this updated state to the newest timestep (i.e. the $ \mathbf{\Phi}_{(k+1,k)}(\mathbf{x}_{k+1|k},\mathbf{x}_{k|k}) $ state transition matrix). This clearly is different then if one was to compute the state transition from time k-1 to the k+1 timestep as the second state transition is evaluated at the different $\mathbf{x}_{k|k}$ linearization point! To fix this, we can change the linearization point we evaluate these at:

\begin{align*} \mathbf{\Phi}_{(k+1,k-1)}(\mathbf{x}_{k+1|k},\mathbf{x}_{k-1|k-1}) = \mathbf{\Phi}_{(k+1,k)}(\mathbf{x}_{k+1|k},\mathbf{x}_{k|k-1}) ~ \mathbf{\Phi}_{(k,k-1)}(\mathbf{x}_{k|k-1},\mathbf{x}_{k-1|k-1}) \end{align*}


We also need to ensure that our measurement Jacobians match the linearization point of the state transition matrix. Let us consider the simple case at timestamp k with current state estimate $\hat{\mathbf{x}}_k$ gives the following observability matrix:

\begin{align*} \hat{\mathcal{O}}= \begin{bmatrix} \hat{\mathbf{H}}_{0}\mathbf{\Phi}_{(0,0)} \\ \vdots \\ \hat{\mathbf{H}}_{k}\mathbf{\Phi}_{(k,0)} \\ \end{bmatrix} \end{align*}

It is easy to verify that $ \mathcal{N} $ , derived from the previous section, is not valid for the whole entire matrix (i.e., $ \mathbf{H}_{k}\mathbf{\Phi}_{(k,0)}\mathcal{N} \neq \mathbf 0 $ ). For example, the feature estimate $ {}^{G}\hat{\mathbf{p}}_{f}(t_0) \neq ... \neq {}^{G}\hat{\mathbf{p}}_{f}(t_k) $ in each row of $ \hat{\mathcal{O}}$ are different, thus the nullsace does not hold. Specifically the first column $ \mathcal{N} $ is invalid, and causes the dimension of unobservable subspace to become 3 (i.e., $ \texttt{dim}(\hat{\mathcal{O}}) =3 $ ). This will cause the filter to gain extra information along the yaw unobservable direction and hurt the estimation performance. Specifically this leads to larger errors, erroneously smaller uncertainties, and inconsistency (see [18] [27] for detailed proof and discussion).

To assure $ \mathbf{H}_{k}\mathbf{\Phi}_{(k,0)}\mathcal{N} = \mathbf 0 $ , one can fix the linearization point over all time. A natural choice of state linearization point is to use the first state estimate $ \bar{\mathbf{x}}_k $ . Note that for the IMU state, its first state estimate will be $ {\mathbf{x}}_{k|k-1}$ which also match the linearization point in the state transition matrix. For the features, the initial triangulated value (i.e., at the time we initialize the feature into state vector) is its first estimate. As such, we can derive the linearized measurement update function as:

\begin{align*} \tilde{\mathbf{z}}_{k+1} = \mathbf{z}_{k+1} - \mathbf{h}(\hat{\mathbf{x}}_k) \simeq \bar{\mathbf{H}}_{k} (\mathbf{x}_{k} -\hat{\mathbf{x}}_k ) + \mathbf{n}_k \end{align*}

where $ \mathbf{h}(\cdot) $ is the nonlinear measurement function and $ \bar{\mathbf{H}}_{k} $ is the First-Estimate Jacobian which only consist the first state estimates $ \bar{\mathbf{x}}_k $ . It is not difficult to confirm that the observability matrix $ \mathcal{O} $ with the FEJ state transition matrix and measurement Jacobians yields the correct 4dof unobservable nullspace. As such, the system observability properties can be preserved.

While not utilizing the current (best) state estimates for linearization does introduce linearization errors, it has been shown that the inconsistencies far outweigh this. Follow up works which have also tried to address this problem include FEJ2 [6] and OC [21] [18] [19]. Both of these works try to handle the problem of ensuring correct observability properties while compensating or leveraging the best estimates.