Data - IMTI - Craig Johnston

Advanced Platform Development with Kubernetes

Enabling Data Management, the Internet of Things, Blockchain, and Machine Learning

I’ve been distracted for over a year now, writing a (~500 page) end-to-end tutorial on constructing data-centric platforms with Kubernetes. The book is titled “Advanced Platform Development with Kubernetes: Enabling Data Management, the Internet of Things, Blockchain, and Machine Learning” A little more than a year ago, Apress reached out and asked if I would write a book on Kubernetes for them, mirroring the wide range of projects I develop (and write about) for my clients.

Posted by Craig Johnston Sunday, August 30, 2020

Kafka on Kubernetes

Deploy a highly available Kafka cluster on Kubernetes.

Kafka is a fast, horizontally scalable, fault-tolerant, message queue service. Kafka is used for building real-time data pipelines and streaming apps. There are a few Helm based installers out there including the official Kubernetes incubator/kafka. However, in this article, I walk through applying a surprisingly small set of Kubernetes configuration files needed to stand up high performance, highly available Kafka. Manually applying Kubernetes configurations gives you a step-by-step understanding of the system you are deploying and limitless opportunities to customize.

Posted by Craig Johnston Tuesday, September 25, 2018

Elasticsearch Essential Queries

Getting started with Elasticsearch

The following is an overview for querying Elasticsearch. Over the years I have tried to assemble developer notes for myself and my team on a variety of platforms, languages and frameworks, a type of cheat-sheet but with context, not a comprehensive how-to, but a decent 15-minute overview of the features we are most likely to implement in a given iteration. Explore the Elasticsearch official documentation: Search in Depth. Contents Motivation Following Along with Elasticsearch and Kubernetes Vocabulary Basic CRUD API Delete an Index Create an Index Create or Update a Document (Upsert) Get a Document Mappings, Types and Metadata Get Mapping Create a Mapping Searching Range Filtering Aggregations Counts Averages, Minimums and Maximums Percentile Percent by Rank Percent by Rank Interval Resources Support this blog!

Posted by Craig Johnston Thursday, July 26, 2018

Remote Query Elasticsearch on Kubernetes

Local workstation-based microservices development

Developing on our local workstations has always been a conceptual challenge for my team when it comes to remote data access. Local workstation-based development of services that intend to connect to a wide range of remote services that may have no options for external connections poses a challenge. Mirroring the entire development environment is possible in many cases, just not practical. In days before Kubernetes, writing code in IDEs on our local workstation meant we had only a few options for developing server-side-API-style services that needed to connect to a database.

Posted by Craig Johnston Wednesday, July 25, 2018

High Traffic JSON Data into Elasticsearch on Kubernetes

Instant, reliable, send and forget.

IOT devices, Point-of-Sale systems, application events or any client that sends data destined for indexing in Elasticsearch often need to send and forget, however, unless that data is of low value there needs to be assurance that arrives at its final destination. Back-pressure and database outages can pose a considerable threat to data integrity. Contents Background Overview Development Environment the-project Namespace The Project: Weather (wx) Data rxtx for Store-and-Forward wx-rxtx Service wx-rxtx StatefulSet rtBeat to Collect, Buffer and Publish wx-rtbeat Service wx-rtbeat ConfigMap wx-rtbeat Deployment Client Simulation / Kubernetes Cron Performance Conclusion Port Forwarding / Local Development Reference Support this blog!

Posted by Craig Johnston Wednesday, July 18, 2018

Kibana on Kubernetes

Visualize your Elasticsearch data.

This guide walks through a process for setting up Kibana within a namespace on a Kubernetes cluster. If you followed along with Production Grade Elasticsearch on Kubernetes then aside from personal or corporate preferences, little modifications are necessary for the configurations below. Contents Project Namespace Service Kibana ConfigMap Deployment Basic Auth (Optional) TLS Certificate (Optional) Ingress Conclusion Port Forwarding / Local Development Resources Support this blog!

Posted by Craig Johnston Sunday, July 15, 2018

Production Grade Elasticsearch on Kubernetes

Setup a fast, custom production grade Elasticsearch cluster.

Installing production ready, Elasticsearch 6.2 on Kubernetes requires a hand full of simple configurations. The following guide is a high-level overview of an installation process using Elastic’s recommendations for best practices. The Github project kubernetes-elasticsearch-cluster is used for the Elastic Docker container and built to operate Elasticsearch with nodes dedicated as Master, Data, and Client/Ingest. The Docker container docker-elasticsearch, a “Ready to use, lean and highly configurable Elasticsearch container image.” by pires is sufficient for use in this guide.

Posted by Craig Johnston Saturday, July 14, 2018

Python Data Essentials - Matplotlib and Seaborn

A beginners guide.

There is an overwhelming number of options for developers needing to provide data visualization. The most popular library for data visualization in Python is Matplotlib, and built directly on top of Matplotlib is Seaborn. The Seaborn library is “tightly integrated with the PyData stack, including support for numpy and pandas data structures and statistical routines from scipy and statsmodels.” This article is only intended to get you started with Matplotlib and Seaborn.

Posted by Craig Johnston Sunday, July 8, 2018

Webpage to PDF Microservice

Automate PDF Report Generation

I create a lot of data visualizations for clients, many of which are internal, portal-style websites that present data in real time, as well as give options for viewing reports from previous time-frames. PDFs are useful for data such as bank statements or any form of time-snapshot progress reporting. It is common for clients to want PDF versions generated on a regular basis for sharing through email or other technologies.

Posted by Craig Johnston Sunday, July 1, 2018

Python Data Essentials - Pandas

A data type equivalent to super-charged spreadsheets.

Pandas bring Python a data type equivalent to super-charged spreadsheets. Pandas add two highly expressive data structures to Python, Series and DataFrame. Pandas Series and DataFrames provide a performant analysis and manipulation of “relational” or “labeled” data similar to relational database tables like MySQL or the rows and columns of Excel. Pandas are great for working with time series data as well as arbitrary matrix data, and unlabeled data. Pandas leverage NumPy and if you are not familiar with this fundamental library for working with numbers, then I suggest you take a look at Python Data Essentials - NumPy to get a decent footing.

Posted by Craig Johnston Sunday, June 17, 2018

Python Data Essentials - Numpy

Powerful N-dimensional array objects.

Python is one of The Most Popular Languages for Data Science, and because of this adoption by the data science community, we have libraries like NumPy, Pandas and Matplotlib. NumPy at it’s core provides a powerful N-dimensional array objects in which we can perform linear algebra, Pandas give us data structures and data analysis tools, similar to working with a specialized database or powerful spreadsheets and finally Matplotlib to generate plots, histograms, power spectra, bar charts, error charts and scatterplots to name a few.

Posted by Craig Johnston Saturday, June 16, 2018

SQL Foundations

Selects, joins and aliases.

The following is an attempt at explaining the basics of an SQL query, and more importantly how I believe you can best think through them. All queries can be broken down into the basics of this declarative language. I recently helped a co-worker read through a large SQL query with a few dozen joins and left joins, alias, and recursions. He is mostly a front-end integrator and although he has been tinkering with SQL for years, he never really understood the basics.

Posted by Craig Johnston Monday, April 2, 2018

Don't Install cqlsh

Containers as utility applications

We live in a world of process isolation and tools that make utilizing it extremely simple, with apps like Docker we can perform dependency management with dependency isolation. As I am slowly becoming a fanboy of containerization, I look forward to the day when typing ps on my local workstation or remote server is nearly synonymous with commands like docker ps or kubectl get services. Case: Cassandra development and your local workstation.

Posted by Craig Johnston Thursday, March 1, 2018

IMTI