Visual-Inertial Simulator

B-Spline Interpolation


At the center of the simulator is an $\mathbb{SE}(3)$ b-spline which allows for the calculation of the pose, velocity, and accelerations at any given timestep along a given trajectory. We follow the work of Mueggler et al. [13] and Patron et al. [14] in which given a series of uniformly distributed "control points" poses the pose $\{S\}$ at a given timestep $t_s$ can be interpolated by:

\begin{align*} {}^{G}_{S}\mathbf{T}(u(t_s)) &= {}^{G}_{i-1}\mathbf{T}~\mathbf{A}_0~\mathbf{A}_1~\mathbf{A}_2 \label{eq:poseinterp} \\[0.75em] \mathbf{A}_j &= \mathrm{exp}\Big({B}_j(u(t))~{}^{i-1+j}_{i+j}\mathbf{\Omega} \Big) \\ {}^{i-1}_{i}\mathbf{\Omega} &= \mathrm{log}\Big( {}^{G}_{i-1}\mathbf{T}^{-1}~{}^{G}_{i}\mathbf{T} \Big) \\[0.75em] {B}_0(u(t)) &= \frac{1}{3!}~(5+3u-3u^2+u^3) \\ {B}_1(u(t)) &= \frac{1}{3!}~(1+3u+3u^2-2u^3) \\ {B}_2(u(t)) &= \frac{1}{3!}~(u^3) \end{align*}

where $u(t_s)=(t_s-t_i)/(t_{i+1}-t_i)$ , $\mathrm{exp}(\cdot)$ , $\mathrm{log}(\cdot)$ are the $\mathbb{SE}(3)$ matrix exponential ov_core::exp_se3 and logarithm ov_core::log_se3. The frame notations can be seen in the above figure and we refer the reader to the ov_core::BsplineSE3 class for more details. The above equation can be interpretative as compounding the fractions portions of the bounding poses to the first pose ${}^{G}_{i-1}\mathbf{T}$ . From this above equation, it is simple to take the derivative in respect to time, thus allowing the computation of the velocity and acceleration at any point.

The only needed input into the simulator is a pose trajectory which we will then uniformly sample to construct control points for this spline. This spline is then used to both generate the inertial measurements while also providing the pose information needed to generate visual-bearing measurements.

Inertial Measurements

To incorporate inertial measurements from a IMU sensor, we can leverage the continuous nature and $C^2$ -continuity of our cubic B-spline. We can define the sensor measurement from a IMU as follows:

\begin{align*} {}^I\boldsymbol{\omega}_m(t) &= {}^I\boldsymbol{\omega}(t) + \mathbf{b}_\omega + \mathbf{n}_\omega \\ {}^I\mathbf{a}_m(t) &= {}^I\mathbf{a}(t) + {}^{I(t)}_G\mathbf{R}{}^G\mathbf{g} + \mathbf{b}_a + \mathbf{n}_a \\ \dot{\mathbf{b}}_\omega &= \mathbf{n}_{wg} \\ \dot{\mathbf{b}}_a &= \mathbf{n}_{wa} \end{align*}

where each measurement is corrupted with some white noise and random-walk bias. To obtain the true measurements from our $\mathbb{SE}(3)$ b-spline we can do the following:

\begin{align*} {}^I\boldsymbol{\omega}(t) &= \mathrm{vee}\Big( {}^{G}_{I}\mathbf{R}(u(t))^\top {}^{G}_{I}\dot{\mathbf{R}}(u(t)) \Big) \\ {}^I\mathbf{a}(t) &= {}^{G}_{I}\mathbf{R}(u(t))^\top {}^{G}\ddot{\mathbf{p}}_{I}(u(t)) \end{align*}

where $\mathrm{vee}(\cdot)$ returns the vector portion of the skew-symmetric matrix (see ov_core::vee). These are then corrupted using the random walk biases and corresponding white noises. For example we have the following:

\begin{align*} \omega_m(t) &= \omega(t) + b_\omega(t) + \sigma_w\frac{1}{\sqrt{\Delta t}}\textrm{gennoise}(0,1) \\ b_\omega(t+\Delta t) &= b_\omega(t) + \sigma_{wg}\sqrt{\Delta t}~\textrm{gennoise}(0,1) \\ t &= t+\Delta t \end{align*}

Note that this is repeated per-scalar value as compared to the vector and identically for the accelerometer readings. The $\textrm{gennoise}(m,v)$ function generates a random scalar float with mean m and variance v. The $\Delta t$ is our sensor sampling rate that we advance time forward with.

Visual-Bearing Measurement

The first step that we perform after creating the b-spline trajectory is the generation of a map of point features. To generate these features, we increment along the spline at a fixed interval and ensure that all cameras see enough features in the map. If there are not enough features in the given frame, we generate new features by sending random rays from the camera out and assigning a random depth. This feature is then added to the map so that it can be projected into future frames.

After the map generation phase, we generate feature measurements by projecting them into the current frame. Projected features are limited to being with-in the field of view of the camera, in front of the camera, and close in distance. Pixel noise can be directly added to the raw pixel values.