Apache NiFi: JOLT Transformations Part 2

Apache NiFi Part 4

Posted by Craig Johnston on Wednesday, February 15, 2023

This article covers advanced JOLT transformations including cardinality, modify operations, wildcards, and chained specifications for complex JSON restructuring.

This continues from Part 3: JOLT Transformations Part 1.

Most of the JSON I deal with comes from APIs that weren’t designed with my database schema in mind. Nested objects three levels deep, arrays that sometimes contain one item and sometimes fifty, field names in camelCase when I need snake_case. The basic JOLT operations from Part 1 handle simple cases, but real-world API responses usually need multiple transformation passes. That’s what the advanced operations below are for.

Cardinality Operation

Cardinality controls whether values are arrays or single values. Use it to normalize inconsistent APIs that sometimes return arrays and sometimes single values.

ONE - Force Single Value

Input:

{
  "tags": ["urgent"],
  "category": ["electronics", "gadgets"],
  "priority": "high"
}

Spec:

[
  {
    "operation": "cardinality",
    "spec": {
      "tags": "ONE",
      "category": "ONE",
      "priority": "ONE"
    }
  }
]

Output:

{
  "tags": "urgent",
  "category": "electronics",
  "priority": "high"
}

MANY - Force Array

Input:

{
  "singleItem": "value1",
  "alreadyArray": ["a", "b"],
  "nested": {
    "item": "single"
  }
}

Spec:

[
  {
    "operation": "cardinality",
    "spec": {
      "singleItem": "MANY",
      "alreadyArray": "MANY",
      "nested": {
        "item": "MANY"
      }
    }
  }
]

Output:

{
  "singleItem": ["value1"],
  "alreadyArray": ["a", "b"],
  "nested": {
    "item": ["single"]
  }
}

Modify Operations

The modify operation transforms values using built-in functions. There are two variants:

  • modify-default-beta - Only modify if field doesn’t exist
  • modify-overwrite-beta - Always modify

String Functions

Input:

{
  "name": "  john doe  ",
  "email": "[email protected]",
  "code": "abc123"
}

Spec:

[
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "name": "=trim(@(1,name))",
      "email": "=toLower(@(1,email))",
      "code": "=toUpper(@(1,code))",
      "displayName": "=concat(@(1,name),' <',@(1,email),'>')"
    }
  }
]

Output:

{
  "name": "john doe",
  "email": "[email protected]",
  "code": "ABC123",
  "displayName": "john doe <[email protected]>"
}

Available String Functions

  • toLower(string) - Convert to lowercase
  • toUpper(string) - Convert to uppercase
  • trim(string) - Remove leading/trailing whitespace
  • concat(str1, str2, ...) - Concatenate strings
  • substring(string, start, end) - Extract substring
  • split(string, delimiter) - Split into array
  • join(delimiter, array) - Join array to string

Math Functions

Input:

{
  "price": 19.99,
  "quantity": 3,
  "discount": 0.1,
  "values": [10, 20, 30, 40]
}

Spec:

[
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "subtotal": "=doubleSum(@(1,price),@(1,price),@(1,price))",
      "total": "=divide(@(1,subtotal),=intSum(1,-@(1,discount)))",
      "average": "=avg(@(1,values))",
      "min": "=min(@(1,values))",
      "max": "=max(@(1,values))",
      "rounded": "=toInteger(@(1,price))"
    }
  }
]

Available Math Functions

  • intSum(n1, n2, ...) - Sum integers
  • doubleSum(n1, n2, ...) - Sum doubles
  • divide(dividend, divisor) - Division
  • multiply(n1, n2) - Multiplication (via divide with 1/n)
  • min(array) - Minimum value
  • max(array) - Maximum value
  • avg(array) - Average
  • abs(number) - Absolute value
  • toInteger(value) - Convert to integer
  • toDouble(value) - Convert to double
  • toLong(value) - Convert to long

Type Conversion

Input:

{
  "stringNum": "42",
  "stringBool": "true",
  "intValue": 100
}

Spec:

[
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "asInteger": "=toInteger(@(1,stringNum))",
      "asBoolean": "=toBoolean(@(1,stringBool))",
      "asString": "=toString(@(1,intValue))"
    }
  }
]

Output:

{
  "stringNum": "42",
  "stringBool": "true",
  "intValue": 100,
  "asInteger": 42,
  "asBoolean": true,
  "asString": "100"
}

Reference Syntax

The @ symbol references values:

  • @(0,fieldName) - Current level field
  • @(1,fieldName) - Parent level field
  • @(2,fieldName) - Grandparent level field
  • @ - Current value being processed

Advanced Wildcards

Matching Multiple Patterns

Use | to match multiple keys:

Input:

{
  "firstName": "John",
  "lastName": "Doe",
  "middleName": "William",
  "age": 30
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "firstName|lastName|middleName": "names.&",
      "age": "demographics.age"
    }
  }
]

Output:

{
  "names": {
    "firstName": "John",
    "lastName": "Doe",
    "middleName": "William"
  },
  "demographics": {
    "age": 30
  }
}

Matching Array Elements

Input:

{
  "items": [
    {"id": 1, "name": "Apple", "type": "fruit"},
    {"id": 2, "name": "Carrot", "type": "vegetable"},
    {"id": 3, "name": "Banana", "type": "fruit"}
  ]
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "items": {
        "*": {
          "id": "products[&1].productId",
          "name": "products[&1].productName",
          "type": "products[&1].category"
        }
      }
    }
  }
]

Output:

{
  "products": [
    {"productId": 1, "productName": "Apple", "category": "fruit"},
    {"productId": 2, "productName": "Carrot", "category": "vegetable"},
    {"productId": 3, "productName": "Banana", "category": "fruit"}
  ]
}

Conditional Matching by Value

Match based on field values using the @ reference on the left side:

Input:

{
  "events": [
    {"type": "click", "data": "button1"},
    {"type": "view", "data": "page1"},
    {"type": "click", "data": "link2"}
  ]
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "events": {
        "*": {
          "type": {
            "click": {
              "@(2,data)": "clicks[]"
            },
            "view": {
              "@(2,data)": "views[]"
            }
          }
        }
      }
    }
  }
]

Output:

{
  "clicks": ["button1", "link2"],
  "views": ["page1"]
}

Hash (#) for Array Collection

The # symbol creates arrays by collecting matched values:

Input:

{
  "users": {
    "u1": {"name": "Alice", "role": "admin"},
    "u2": {"name": "Bob", "role": "user"},
    "u3": {"name": "Charlie", "role": "admin"}
  }
}

Spec:

[
  {
    "operation": "shift",
    "spec": {
      "users": {
        "*": {
          "name": "allNames[]",
          "role": {
            "admin": {
              "#admin": "roles.admins[]",
              "@(2,name)": "adminNames[]"
            },
            "user": {
              "#user": "roles.users[]",
              "@(2,name)": "userNames[]"
            }
          }
        }
      }
    }
  }
]

Output:

{
  "allNames": ["Alice", "Bob", "Charlie"],
  "roles": {
    "admins": ["admin", "admin"],
    "users": ["user"]
  },
  "adminNames": ["Alice", "Charlie"],
  "userNames": ["Bob"]
}

Chained Specifications

Complex transformations often require multiple passes:

Input:

{
  "order": {
    "id": "ORD-001",
    "items": [
      {"sku": "A1", "qty": 2, "price": 10.00},
      {"sku": "B2", "qty": 1, "price": 25.00}
    ],
    "customer": {
      "name": "  JOHN DOE  ",
      "email": "[email protected]"
    }
  }
}

Chained Spec:

[
  {
    "operation": "shift",
    "spec": {
      "order": {
        "id": "orderId",
        "items": {
          "*": {
            "sku": "lineItems[&1].sku",
            "qty": "lineItems[&1].quantity",
            "price": "lineItems[&1].unitPrice"
          }
        },
        "customer": {
          "name": "customer.rawName",
          "email": "customer.rawEmail"
        }
      }
    }
  },
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "customer": {
        "name": "=trim(@(1,rawName))",
        "email": "=toLower(@(1,rawEmail))"
      },
      "lineItems": {
        "*": {
          "lineTotal": "=doubleSum(=multiply(@(1,quantity),@(1,unitPrice)))"
        }
      }
    }
  },
  {
    "operation": "remove",
    "spec": {
      "customer": {
        "rawName": "",
        "rawEmail": ""
      }
    }
  },
  {
    "operation": "default",
    "spec": {
      "status": "pending",
      "currency": "USD",
      "metadata": {
        "version": "1.0"
      }
    }
  }
]

Output:

{
  "orderId": "ORD-001",
  "lineItems": [
    {"sku": "A1", "quantity": 2, "unitPrice": 10.00},
    {"sku": "B2", "quantity": 1, "unitPrice": 25.00}
  ],
  "customer": {
    "name": "JOHN DOE",
    "email": "[email protected]"
  },
  "status": "pending",
  "currency": "USD",
  "metadata": {
    "version": "1.0"
  }
}

Sort Operation

Alphabetically sort object keys (useful for consistent output):

Spec:

[
  {
    "operation": "sort"
  }
]

This recursively sorts all keys in the document.

Summary

This article covered advanced JOLT features:

  • Cardinality for ONE/MANY value normalization
  • Modify operations with string and math functions
  • Advanced wildcards for pattern matching
  • Conditional routing by value
  • Chained specifications for complex transformations
  • Sort operation for consistent output

The next article explores dynamic HTTP listeners with portpxy for webhook handling.

Resources

Next: Dynamic HTTP Listeners with portpxy

Check out the next article in this series, Apache NiFi: Dynamic HTTP Listeners with portpxy.

Note: This blog is a collection of personal notes. Making them public encourages me to think beyond the limited scope of the current problem I'm trying to solve or concept I'm implementing, and hopefully provides something useful to my team and others.

This blog post, titled: "Apache NiFi: JOLT Transformations Part 2: Apache NiFi Part 4" by Craig Johnston, is licensed under a Creative Commons Attribution 4.0 International License. Creative Commons License