Apache Hudi

When define a Hudi table you must define one of two supported data storage types.

Supported Hudi dataset storage types:

  1. Copy on write
  2. Merge on read

When you create a Hudi dataset, you specify that the dataset is either copy on write or merge on read.

  • Copy on Write (CoW) – Data is stored in a columnar format (Parquet), and each update creates a new version of files during a write. CoW is the default storage type.
  • Merge on Read (MoR) Data is stored using a combination of columnar (Parquet) and row-based (Avro) formats. Updates are logged to row-based delta files and are compacted as needed to create new versions of the columnar files.

Leave a Comment

Scroll to Top