Network Intrusion Features via Tumbling Time Windows

This article originally used the term "Sliding Time Window".  This article actually discusses a variant called the "Tumbling Time Window"

Feature creation is one of the first steps toward creating Machine Models that apply to network monitoring or other stream-oriented data processes.  We massage independent variables into a form that can be used by ML models or other statistical tools. This often involves transforming source data through numerical conversion, bucketing, aggregation, and other techniques.

For this project, we'd like to try and train a machine model to detect intrusion events by having it look at network traffic. People sometimes try and directly consume events as inputs. An individual network packet does not contain enough context to be useful on its own. A Tumbling time window makes it possible to create features with more context than you would get with a single message.


This GitHub repository contains Python code that creates features from Wireshark/tshark packet streams. The program accepts live tshark output or tshark streams created from .pcap files. 

Tumbling Time Window

Tumbling time windows are a popular way of slicing up the incoming streams into features that can be fed to ML models for intrusion analysis and other purposes.  We take an incoming stream of messages/logs and slice them into groups at regular intervals.  We stream the data through a time box.  Everything in the box becomes part of that window and is treated as a single input.

We create Features based analysis of the packets that fell in the time box.  Those derived/calculated features feed the model either for training or as part of using the mode.



The picture shows a network stream that has been time-sliced into 6 windows.  Each window is fixed in time. 
  • The number of packets in each window varies.  
  • The amount of time represented by the window is constant.

Window Summary Info as Features

The Features below represent a transformation of the packets in the window into a single set of variables.  The example shows a set of calculated features: 
  1. the number of TCP, UDP, ARP protocol messages, 
  2. the number of TLS, HTTP, SSDP and, SMB2 service messages, 
  3. the number of host pairs 
  4. the total bytes transferred

ML training involves hyperparameter tuning that changes the weights of the various features to create a model that detects network intrusion or other issues in the packet stream.

Video

Alternative Window Strategies

We described the use of non-overlapping time windows (Tumbling Windows). There are also other strategies including using time windows that overlap by various amounts.  This means a given network packet or event contributes to more than one set of features. This approach might limit any time window boundary conditions where a timed attack could hide around time window boundaries.
  • Tumbling Time Window: Fixed time non-overlapping windows that advance at a fixed rate.  Windows have a fixed length. There may or may not be events in a given window. Data can be in only one window.
  • Hopping Window:  The window advances at a specific rate with a specific width.  The window advances irrespective of events received.  This is an overlapping version of the Tumbling Window. Data can be in multiple windows.
  • Time-Based Sliding Window: The events that happened in the last N seconds.  They are triggered every time a new event is received. They are data triggered.  There is at least one event in each window, the trigger event. Data can be in multiple windows.
  • Eviction-Based Sliding Window: The window contains the last N elements. Window length (time) varies based on the event rate.

References

  • Repository: 
    • Python source code https://github.com/freemansoft/Network-intrusion-dataset-creator This code is 8x faster than the original.
  • Other Blogs and Videos: 
    • Blog: https://joe.blog.freemansoft.com/2021/04/network-intrusion-features-via-sliding.html
    • Blog: https://joe.blog.freemansoft.com/2021/04/creating-features-in-python-using.html
      • Video: https://youtu.be/jKgGh5a5gFA
  • Originating Research 
    • Research paper the original source code was based on. https://www.researchgate.net/profile/Nadun-Rajasinghe/project/A-customizable-Network-Intrusion-Detection-dataset-creating-framework/attachment/5aff08f8b53d2f63c3ccae32/AS:627686015766528@1526663416701/download/1570426776.pdf?context=ProjectUpdatesLog
    • Original Python source repository https://github.com/nrajasin/Network-intrusion-dataset-creator
  • Other
    • https://www.mikulskibartosz.name/difference-between-tumbling-and-sliding-window/
    • https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
    • https://docs.lenses.io/3.2/sql/streaming/windowing.html
Created 2021/04
Updated 2022/10

Comments

Popular posts from this blog

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Accelerate Storage Spaces with SSDs in Windows 10 Storage Pool tiers

Java 8 development on Linux/WSL with Visual Studio Code on Windows 10