IIoT Project Part 2: Grafana, InfluxDB & Anomaly Detection

Industrial IoT (IIoT) Project Part 2: Data Analytics, Visualization, and Dynamic Anomaly Detection

In the first phase of our project, we ensured that data was securely transported via the MQTT protocol over a WireGuard VPN tunnel. However, in industrial automation, data only gains value when it is processed. Real value emerges through data contextualization, trend monitoring, and the system’s ability to react when critical thresholds are breached.

In this article, we detail how securely transported telemetry data is persisted in the InfluxDB (Time Series Database) layer, visualized with Grafana, and transformed into a self-monitoring anomaly detection mechanism.

If you haven’t read Part 1 yet, you can access it via this link.

Before we dive into the details, here is the result video of Step 2, followed by a breakdown of the implementation.

1. Data Persistence (Historian) and the InfluxDB Layer

In modern industrial facilities, “Historian” systems are critical for post-mortem analysis and legal compliance. For this project, InfluxDB was chosen due to its high write speeds and low storage costs.

1.1. Why a Time Series Database (TSDB)?

Classical relational databases (MySQL, PostgreSQL, etc.) can struggle under the weight of thousands of sensor data points arriving every second. InfluxDB stores every piece of data with a “timestamp.” The data structure used in this project is as follows:

Measurement: sensor_data
Tags: node_id (For device-based filtering)
Fields: temperature, pressure, vibration, status

1.2. MQTT-to-InfluxDB Bridge (Middleware)

A custom service developed in Python constantly listens to the MQTT broker (Mosquitto). Every incoming JSON packet is parsed and instantly “inserted” into InfluxDB. By using Python’s -u (unbuffered) mode, we ensure that data is written directly to the database without waiting in the buffer.

2. Advanced Data Simulation and Anomaly Logic

To test the robustness of a real system, working only with “ideal” data is insufficient. Therefore, a complex error generation logic was integrated into the industrial_sensors.py script.

2.1. Stabilization vs. Randomness

The system’s base temperature was stabilized between 64.5°C and 65.5°C with minor fluctuations (noise). However, for operational testing, a “Critical Error” mode was added, triggered with a 0.5% probability. When triggered, the temperature spikes above 100°C, the status variable updates to 1, and a “Fault” signal is sent to the database.

3. Operational Monitoring Center with Grafana

Grafana is the “showcase” of the project and the operator’s window into the system. Every panel created serves a specific engineering need.

3.1. Temperature Profile (°C) – Trend Analysis

This panel displays the history of the data based on time filters.

Thresholds: A red dashed line and fill area added above 90°C provide a visual boundary for the operator.
Smoothing: Transitions between data points were “Smoothed” to make the trend easier to read.

3.2. Current Core Temperature (Gauge)

This dial shows the instantaneous status and operates with industry-standard color coding:

Green (50-80°C): Safe operating zone.
Orange (80-95°C): Warning zone (Maintenance or cooling check may be required).
Red (95-120°C): Emergency (System in anomaly mode).

3.3. Critical System Parameters (Pressure & Vibration)

The specific threshold values applied to these graphs report mechanical health in real-time:

System Pressure (Bar): A yellow threshold at 20.1 Bar and a red area above it monitor hydraulic pressure stability.
Mechanical Vibration (mm/s): Color coding defined for values above 1.28 mm/s represents wear in bearings or motor housings.

3.4. Daily Total Anomalies (24h) & Active Time

Strategic counters are placed at the top of the dashboard to monitor overall equipment effectiveness (OEE):

Anomaly Counter: Uses time > now() - 24h logic to track total errors in the last 24 hours. It displays “0” when healthy and switches to a red alarm state during an error.
System Active Time: Converts total system uptime from seconds into a readable hour-based format (e.g., 2.61 hours).

4. Technical Challenges and Solutions

4.1. Synchronization and Timezone Issues

Initially, data appeared as “No Data” on Grafana due to timezone inconsistencies. This was resolved by synchronizing InfluxDB’s UTC-based structure with the local time (Munich – CET).

4.2. Data Gaps (Buffering Issues)

Standard output buffering in Linux systems caused data to arrive at the dashboard in 20-30 second blocks. Implementing the -u parameter in the service files enabled real-time data flow.

5. System Integration Perspective

The greatest success of this project is the ability of independent open-source tools to work together like an orchestra:

Security: Network isolation with WireGuard.
Messaging: Lightweight and fast communication with MQTT.
Memory: Time-series persistence with InfluxDB.
Visibility: Smart monitoring and threshold management with Grafana.

6. Conclusion and Future Plans

Our current infrastructure possesses all the fundamental components required to monitor the heart of a factory. However, the Industry 4.0 journey does not end here.

What’s coming in Part 3?

Alerting: Telegram or Email notifications for critical errors.
Predictive Maintenance: Predicting failures before they happen using Machine Learning algorithms.