Mar 5, 2024: FarfetchFusion

Challenges:

Proposed Design: Disentangled Fusion

Separate static (ear, nose, forehead) and dynamic (mouth, eye) facial information

Combine static info from multiple frames while fusing only the dynamic parts from recent frames

⇒ leverage spatio-temporal redundancy in multi-view video streams

⇒ reduce processing time of static information

TSDF (distance measurement from voxel to surface) - stored in hash map

Volumetric Fusion

  1. Alignment (feature extraction + registration)

    1. use 2D images to find landmarks (mouth, eyes, ears, etc.)
    2. lift to 3D landmarks using 3D images ⇒ used to find similarity transform parameters
  2. Fusion

    1. Step 1: remove previous hash entries (to remove after image)
    2. Step 2: cast ray from each pixel of RGB-D image, allocate hash for the voxels intersecting with ray (voxels on or near face surface are captured only)
    3. Step 3: TSDF value is set (diff between z-axis position of voxel and depth value)

    Untitled

    1. Rendering
      1. Rasterization

        → highly optimized but need TSDF voxel to point cloud conversion

        → Marching Cubs Algorithm (to convert TSDF voxels to point cloud)

      2. Raycasting: generate rendered images without converting TSDF voxels to 3D data (point cloud)

        → find surface voxels (TSDF values closest to zero) by casting ray from viewer POV

        → color information used to form 2D pixels

    System Design:

    Untitled