Apache NiFi: JOLT Transformations Part 1

Apache NiFi Part 3

Posted by Craig Johnston on Saturday, December 10, 2022

This article introduces JOLT (JSON to JSON transformation Language) in Apache NiFi, covering the fundamental operations: shift, default, and remove.

If you’ve ever pulled data from a third-party API and tried to insert it into a database, you know the JSON structure rarely matches your schema. Field names are different, nested objects need flattening, some fields need defaults, others need to go away entirely. JOLT lets you define those transformations declaratively instead of writing code. I use it for reshaping webhook payloads, flattening API responses, and cleaning up data before it hits Postgres.

What is JOLT?

JOLT is a JSON transformation library that lets you restructure JSON documents declaratively. In NiFi, the JoltTransformJSON processor applies JOLT specifications to FlowFile content.

JOLT operations execute in order:

  1. shift - Move and rename data
  2. default - Add default values for missing fields
  3. remove - Delete fields
  4. sort - Sort map keys alphabetically
  5. cardinality - Control array/single value handling
  6. modify - Transform values with functions

Shift Operation

The shift operation moves data from the input to the output. The left side matches input paths, and the right side defines output paths.

Basic Field Mapping

Input:

{
  "firstName": "John",
  "lastName": "Doe",
  "age": 30
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "firstName": "first_name",
      "lastName": "last_name",
      "age": "user_age"
    }
  }
]

Output:

{
  "first_name": "John",
  "last_name": "Doe",
  "user_age": 30
}

Nested Object Mapping

Input:

{
  "user": {
    "name": "John",
    "contact": {
      "email": "[email protected]",
      "phone": "555-1234"
    }
  }
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "user": {
        "name": "profile.displayName",
        "contact": {
          "email": "profile.email",
          "phone": "profile.telephone"
        }
      }
    }
  }
]

Output:

{
  "profile": {
    "displayName": "John",
    "email": "[email protected]",
    "telephone": "555-1234"
  }
}

Flattening Nested Structures

Input:

{
  "order": {
    "id": "ORD-123",
    "customer": {
      "name": "Jane",
      "address": {
        "city": "Seattle",
        "state": "WA"
      }
    }
  }
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "order": {
        "id": "orderId",
        "customer": {
          "name": "customerName",
          "address": {
            "city": "city",
            "state": "state"
          }
        }
      }
    }
  }
]

Output:

{
  "orderId": "ORD-123",
  "customerName": "Jane",
  "city": "Seattle",
  "state": "WA"
}

Creating Nested from Flat

Input:

{
  "orderId": "ORD-123",
  "customerName": "Jane",
  "customerEmail": "[email protected]",
  "itemName": "Widget",
  "itemPrice": 29.99
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "orderId": "order.id",
      "customerName": "order.customer.name",
      "customerEmail": "order.customer.email",
      "itemName": "order.items[0].name",
      "itemPrice": "order.items[0].price"
    }
  }
]

Output:

{
  "order": {
    "id": "ORD-123",
    "customer": {
      "name": "Jane",
      "email": "[email protected]"
    },
    "items": [
      {
        "name": "Widget",
        "price": 29.99
      }
    ]
  }
}

Using Wildcards

The * wildcard matches any key at the current level.

Input:

{
  "data": {
    "user1": {"name": "Alice", "score": 85},
    "user2": {"name": "Bob", "score": 92},
    "user3": {"name": "Charlie", "score": 78}
  }
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "data": {
        "*": {
          "name": "users[&1].username",
          "score": "users[&1].points"
        }
      }
    }
  }
]

Output:

{
  "users": {
    "user1": {"username": "Alice", "points": 85},
    "user2": {"username": "Bob", "points": 92},
    "user3": {"username": "Charlie", "points": 78}
  }
}

Array Index Reference (&)

The & symbol references matched keys by depth level.

  • & or &0 - Current matched key
  • &1 - Parent matched key
  • &2 - Grandparent matched key

Input:

{
  "departments": {
    "engineering": ["Alice", "Bob"],
    "marketing": ["Charlie", "Diana"]
  }
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "departments": {
        "*": {
          "*": "employees[&1].members[]"
        }
      }
    }
  }
]

Output:

{
  "employees": {
    "engineering": {"members": ["Alice", "Bob"]},
    "marketing": {"members": ["Charlie", "Diana"]}
  }
}

Default Operation

The default operation adds values for missing fields without overwriting existing ones.

Input:

{
  "name": "John",
  "status": "active"
}

Spec:

[
  {
    "operation": "default",
    "spec": {
      "name": "Unknown",
      "email": "[email protected]",
      "status": "pending",
      "role": "user",
      "metadata": {
        "version": "1.0",
        "source": "api"
      }
    }
  }
]

Output:

{
  "name": "John",
  "status": "active",
  "email": "[email protected]",
  "role": "user",
  "metadata": {
    "version": "1.0",
    "source": "api"
  }
}

Note that name and status retain their original values.

Nested Defaults

Input:

{
  "user": {
    "id": 123
  }
}

Spec:

[
  {
    "operation": "default",
    "spec": {
      "user": {
        "name": "Anonymous",
        "preferences": {
          "theme": "light",
          "notifications": true
        }
      }
    }
  }
]

Output:

{
  "user": {
    "id": 123,
    "name": "Anonymous",
    "preferences": {
      "theme": "light",
      "notifications": true
    }
  }
}

Remove Operation

The remove operation deletes fields from the output.

Input:

{
  "id": 123,
  "name": "John",
  "password": "secret123",
  "ssn": "123-45-6789",
  "email": "[email protected]"
}

Spec:

[
  {
    "operation": "remove",
    "spec": {
      "password": "",
      "ssn": ""
    }
  }
]

Output:

{
  "id": 123,
  "name": "John",
  "email": "[email protected]"
}

Removing Nested Fields

Input:

{
  "user": {
    "id": 123,
    "profile": {
      "name": "John",
      "internalId": "INT-456",
      "email": "[email protected]"
    },
    "audit": {
      "createdBy": "admin",
      "modifiedBy": "system"
    }
  }
}

Spec:

[
  {
    "operation": "remove",
    "spec": {
      "user": {
        "profile": {
          "internalId": ""
        },
        "audit": ""
      }
    }
  }
]

Output:

{
  "user": {
    "id": 123,
    "profile": {
      "name": "John",
      "email": "[email protected]"
    }
  }
}

Combining Operations

Operations execute in array order. Combine them for complex transformations.

Input:

{
  "raw_data": {
    "user_id": "U123",
    "user_name": "johndoe",
    "temp_token": "abc123",
    "email_address": "[email protected]"
  }
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "raw_data": {
        "user_id": "user.id",
        "user_name": "user.username",
        "temp_token": "user.token",
        "email_address": "user.email"
      }
    }
  },
  {
    "operation": "default",
    "spec": {
      "user": {
        "role": "member",
        "verified": false
      }
    }
  },
  {
    "operation": "remove",
    "spec": {
      "user": {
        "token": ""
      }
    }
  }
]

Output:

{
  "user": {
    "id": "U123",
    "username": "johndoe",
    "email": "[email protected]",
    "role": "member",
    "verified": false
  }
}

NiFi Processor Configuration

In NiFi, use the JoltTransformJSON processor:

  1. Add JoltTransformJSON to your flow
  2. Set Jolt Specification to your spec (or reference a file)
  3. Set Jolt Transform to “Chain” for multiple operations

Using Expression Language

Reference FlowFile attributes in specs:

[
  {
    "operation": "default",
    "spec": {
      "processedAt": "${now():format('yyyy-MM-dd HH:mm:ss')}",
      "source": "${filename}"
    }
  }
]

Enable Transform Cache Size for repeated transformations with the same spec.

Testing JOLT Specs

Use these approaches to test your specs:

NiFi’s Built-in Tester

The processor includes an “Advanced” tab with a JOLT spec tester. Paste input JSON and verify output before running the flow.

Online Tools

Unit Testing in Java

import com.bazaarvoice.jolt.Chainr;
import com.bazaarvoice.jolt.JsonUtils;

@Test
public void testTransformation() {
    List<Object> spec = JsonUtils.classpathToList("/specs/user-transform.json");
    Chainr chainr = Chainr.fromSpec(spec);

    Object input = JsonUtils.classpathToObject("/testdata/input.json");
    Object expected = JsonUtils.classpathToObject("/testdata/expected.json");

    Object actual = chainr.transform(input);

    assertEquals(expected, actual);
}

Summary

This article covered JOLT basics:

  • Shift to restructure and rename fields
  • Default to add missing values
  • Remove to delete unwanted fields
  • Combining operations for complex transformations

The next article explores advanced JOLT operations including cardinality, modify, and wildcard patterns.

Resources

Next: JOLT Transformations Part 2

Check out the next article in this series, Apache NiFi: JOLT Transformations Part 2.

Note: This blog is a collection of personal notes. Making them public encourages me to think beyond the limited scope of the current problem I'm trying to solve or concept I'm implementing, and hopefully provides something useful to my team and others.

This blog post, titled: "Apache NiFi: JOLT Transformations Part 1: Apache NiFi Part 3" by Craig Johnston, is licensed under a Creative Commons Attribution 4.0 International License. Creative Commons License