|
|

A Brief Discussion on Real-time Analytics Applications of Streaming Data

Trinity Data Integration Lab
Recently, the TSMC data leakage incident has caused an uproar. Given TSMC’s role as the “national protector,” the issue has escalated to the level of national security. In the realm of cybersecurity, while external threats always have corresponding countermeasures (albeit requiring great cost and effort to continuously combat them), the real challenge lies with insider threats. Insiders are legitimate users, making it extremely difficult to monitor their activities effectively without compromising work efficiency or personal dignity. We do not know how TSMC identified the leaker, but if it were possible to analyze employee file access behavior in real time—such as detecting patterns like “frequent access to an unusually high volume of sensitive data” or “continuously retrieving large datasets and closing them within minutes”—it would be much more feasible to promptly identify potential insider threats.

Traditional ETL operates on batch processing, meaning that data analysis typically extracts data from data warehouses or data pools on a periodic basis (daily, weekly, or monthly), rather than in real time. However, if an enterprise needs real-time analytics—for instance, to detect insider threats as described above—it must adopt Streaming Data technologies that enable direct data capture and analysis within the Data Pipeline. Pro Perspectives’ column has already published several articles on Streaming Data and Data Pipelines; this article focuses on introducing industry solutions and applications for real-time analytics based on Data Pipelines.

Real-time Analytics Solutions in Data Pipelines

Currently, the industry offers several types of solutions:
  1. Real-time Analytics: Streaming Analytics
    A broad term referring to the real-time computation and analysis of data streams.
    • Emphasizes real-time monitoring and analytics.
    • Typical use cases: monitoring and alerts for website user behavior, financial transaction data, factory machine logs, and smart meters; dashboards and real-time reporting.
       
  2. Event Analytics: Event Stream Processing (ESP)
    Focuses on processing and analyzing “events” in a continuous, low-latency manner.
    • Event-driven processing that reacts to event data streams.
    • Commonly applied in cybersecurity, financial trading, and IoT environment monitoring.
       
  3. Event Pattern Analytics: Complex Event Processing (CEP)
    Goes further by detecting patterns and correlations across multiple event sources.
    • Applied in real-time cybersecurity and business risk monitoring, fraud detection, and abnormal IoT behavior.
    • Enables more advanced real-time analytics, inference, and decision-making.
       
In principle:
  • If the pipeline requires continuous real-time data stream processing, use Streaming Analytics.
  • If the pipeline requires event detection and trigger analysis, use ESP.
  • If the pipeline requires complex logic and cross-event correlation, use CEP.
A diagram of Streaming Data analytics is typically presented here.

Real-time Analytics Applications in Data Pipelines

Just like traditional Business Intelligence (BI) and Big Data Analytics, Data Pipeline real-time analytics covers a wide range of applications. However, due to its real-time nature—while BI and Big Data tend to focus on summaries—real-time analytics emphasizes alerts and triggers.

Below are some representative application scenarios:
  • Smart Grid: Stream Data technologies calculate power loads in real time; CEP detects anomalies; AI forecasting models optimize power usage and energy storage.
  • Customer Service Centers: APIs receive service messages; AI NLP models analyze customer sentiment in real time; CEP triggers VIP call transfer processes.
  • Manufacturing: Real-time sensor data processed via Stream Data techniques to compute features; CEP detects anomalies; predictive AI models optimize preventive maintenance.
  • Vaccination Programs: APIs receive real-time vaccination records; Stream Data calculates inoculation rates; CEP issues alerts; data integrated into AI models forecasting future demand.
  • Financial Services: Stream Data processes transaction records; CEP detects fraud patterns; AI risk models enhance fraud prevention and block suspicious transactions in real time.
  • Telecommunications: Real-time analysis of user traffic; AI customer retention models; CEP automatically sends promotional offers.
  • Logistics: GPS data processed with Stream Data to calculate routes in real time; reinforcement learning AI models re-optimize routes; APIs notify drivers.
  • Healthcare: IoT sensors monitor patient wards in real time; CEP detects critical conditions; AI models forecast real-time bed demand and notify hospital systems.