Network Intrusion Features via Sliding Time Windows

Feature creation is one of the first steps towards creating Machine Models that apply to network monitoring or other stream-oriented data processes.  We massage independent variables into a form that can be used by ML models or other statistical tools. This often involves transforming source data through numerical conversion, bucketing, aggregation, and other techniques.

For this project, we'd like to try and train a machine model to detect intrusion events by having it look at network traffic. People sometimes try and directly consume events as inputs. An individual network packet does not contain enough context to be useful on its own. A sliding time window makes it possible to create features with more context than you would get with a single message.


This GitHub repository contains Python code that creates features from Wireshark/tshark packet streams. The program accepts live tshark output or tshark streams created from .pcap files. 

Sliding Time Window

Sliding time windows are a popular way of slicing up the incoming streams into features that can be fed to ML models for intrusion analysis and other purposes.  We take an incoming stream of messages/logs and slice them into groups at regular intervals.  We stream the data through a time box.  Everything in the box becomes part of that window and is treated as a single input.

We create a set of Features based analyses of the packets that fell in the time box.  Those derived/calculated features feed the model either for training or as part of using the mode.



The picture shows a network stream that has been time-sliced into 6 windows.  Each window is fixed in time. 
  • The number of packets in each window varies.  
  • The amount of time represented by the window is constant.

Window Summary Info as Features

The Features below represent a transformation of the packets in the window into a single set of variables.  The example shows a set of calculated features: 
  1. the number of TCP, UDP, ARP protocol messages, 
  2. the number of TLS, HTTP, SSDP and, SMB2 service messages, 
  3. the number of host pairs 
  4. the total bytes transferred

ML training involves hyperparameter tuning that changes the weights of the various features to create a model that detects network intrusion or other issues in the packet stream.

Video

Alternative Window Strategies

We described the use of non-overlapping time windows. There are also other strategies including using time windows that overlap by various amounts.  This means a given network packet or event contributes to more than one set of features. This approach might limit any time window boundary conditions where a timed attack could hide around time window boundaries.

References

  • Repository: 
    • Python source code https://github.com/freemansoft/Network-intrusion-dataset-creator This code is 8x faster than the original.
  • Other Blogs and Videos: 
    • Blog: https://joe.blog.freemansoft.com/2021/04/network-intrusion-features-via-sliding.html
    • Blog: https://joe.blog.freemansoft.com/2021/04/creating-features-in-python-using.html
      • Video: https://youtu.be/jKgGh5a5gFA
  • Originating Research 
    • Research paper the original source code was based on. https://www.researchgate.net/profile/Nadun-Rajasinghe/project/A-customizable-Network-Intrusion-Detection-dataset-creating-framework/attachment/5aff08f8b53d2f63c3ccae32/AS:627686015766528@1526663416701/download/1570426776.pdf?context=ProjectUpdatesLog
    • Original Python source repository https://github.com/nrajasin/Network-intrusion-dataset-creator

Comments

Popular posts from this blog

Accelerate Storage Spaces with SSDs in Windows 10 Storage Pool tiers

Docker on a Chromebook on Crostini - Neverware CloudReady is ready

Installing CUDA 10.2 and RTX NVidia drivers on Ubuntu 18.04 LTS