Time and Count based Tumbling Windows for Network Packet Statistics

Aggregating and analyzing streaming data is one of the ways people build machine learning datasets. Data is ingested and then data near each other is pushed into aggregations or rows. Aggregations have several attributes or Features. You can think of them as columns in a database or spreadsheet. A data set is made up of many aggregations each one representing some subset of the stream data. You can think of the aggregations as rows in a spreadsheet.

One of the challenges is picking the right windowing strategy for aggregating or analyzing streaming data. There are a variety of well-known windowing algorithms, Tumbling, Hoping, Sliding, etc. We are using a Tumbling Windows algorithm because of its relative simplicity and low memory usage. Tumbling windows repeat without overlap. Tumbling windows are either size-limited or time-limited. They contain a maximum amount of data or extend for a maximum amount of time.

Time-based windowing: The top row in the picture shows a series of time-based windows. Windows are time scoped. There can be any number of packets in each window. Every window covers the same length of time.

The diagram above shows 3 5-second windows with varying numbers of events.

Count-based windowing: The middle row in the picture shows a series of count-based windows. Windows are content scoped, filling when they reach their max size. The windows can extend for any amount of time. Every window contains the same number of data points.

The diagram above shows 4 3-packet windows that span different amounts of time.

Time-based and count-based windowing: The bottom row in the picture demonstrates a multi-variant approach. Windows are filled when they reach max size or exceed a time period. This means windows contain either the specified number of packets or span the specified time.

The diagram above shows 3 3-packet windows and 1 5-second window where only two packets appeared before the window TTL expired.

YouTube Video

to be created

Streaming Tumbling Windows in Dataset Creator

Streaming app that accepts tshark output and genrates windowed statistics for Machine Learning specified by time or packet counts https://github.com/freemansoft/Network-intrusion-dataset-creator
Tumbling windows vs other window types https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions

Examples

These results were not based on the diagram above. They were generated using `tshark` sample data available from https://wiki.wireshark.org/SampleCaptures .

The sample data set has 38 packets that start at 11:31:42 and end at 11:32:41. The sample data has about a 1-minute time span.

5-Second Tumbling window

The capture buckets are sequential 5-second windows. We have about 60 seconds of data so we will end up with 12 5-second buckets. The first and the last buckets always have data.

There are 5-second periods where there is no traffic resulting in empty buckets

~/Network-intrusion-dataset-creator$ python3 main.py --sourcefile smtp-ssl.pcapng -wt 5000
packetCount: 21 startTime: 11:31:42.005000 endTime: 11:31:42.450000
packetCount: 0 startTime: 11:31:47.005000 endTime: 11:31:47.005000
packetCount: 0 startTime: 11:31:52.005000 endTime: 11:31:52.005000
packetCount: 4 startTime: 11:31:57.005000 endTime: 11:31:58.335000
packetCount: 0 startTime: 11:32:02.005000 endTime: 11:32:02.005000
packetCount: 0 startTime: 11:32:07.005000 endTime: 11:32:07.005000
packetCount: 0 startTime: 11:32:12.005000 endTime: 11:32:12.005000
packetCount: 0 startTime: 11:32:17.005000 endTime: 11:32:17.005000
packetCount: 0 startTime: 11:32:22.005000 endTime: 11:32:22.005000
packetCount: 4 startTime: 11:32:27.005000 endTime: 11:32:29.517000
packetCount: 0 startTime: 11:32:32.005000 endTime: 11:32:32.005000
packetCount: 9 startTime: 11:32:37.005000 endTime: 11:32:41.025000 

10-Second Tumbling Window

        The capture buckets are sequential 10-second windows.  We have
          about 60 seconds of data so there will be 6 10-second buckets. The
          first and the last buckets always have data. 
      


        There are 10-second periods where there is no traffic resulting in
          empty buckets

~/Network-intrusion-dataset-creator$ python3 main.py --sourcefile smtp-ssl.pcapng -wt 10000
packetCount: 21 startTime: 11:31:42.005000 endTime: 11:31:42.450000
packetCount: 4 startTime: 11:31:52.005000 endTime: 11:31:58.335000
packetCount: 0 startTime: 11:32:02.005000 endTime: 11:32:02.005000
packetCount: 0 startTime: 11:32:12.005000 endTime: 11:32:12.005000
packetCount: 4 startTime: 11:32:22.005000 endTime: 11:32:29.517000
packetCount: 9 startTime: 11:32:32.005000 endTime: 11:32:41.025000

4-Count Tumbling with 10,000 msec guard

The capture buckets are allowed to contain up to 4 packets.  An
          empty bucket is created if there are 10 seconds without the minimum 4
          packets. The first and the last buckets always have data.      

        The 4-packet window size means that the window length can vary in
          time.  Some of the windows are very short which changes the next
          window's start time.  Only one window timed out before collecting
          its 4 packets        

~/Network-intrusion-dataset-creator$ python3 main.py --sourcefile smtp-ssl.pcapng -wp 4 -wt 10000
packetCount: 4 startTime: 11:31:42.005000 endTime: 11:31:42.089000
packetCount: 4 startTime: 11:31:42.089000 endTime: 11:31:42.132000
packetCount: 4 startTime: 11:31:42.132000 endTime: 11:31:42.212000
packetCount: 4 startTime: 11:31:42.212000 endTime: 11:31:42.309000
packetCount: 4 startTime: 11:31:42.309000 endTime: 11:31:42.450000
packetCount: 1 startTime: 11:31:42.450000 endTime: 11:31:42.450000
packetCount: 4 startTime: 11:31:52.450000 endTime: 11:31:58.335000
packetCount: 4 startTime: 11:32:29.474000 endTime: 11:32:29.517000
packetCount: 4 startTime: 11:32:40.938000 endTime: 11:32:41.025000
packetCount: 4 startTime: 11:32:41.025000 endTime: 11:32:41.025000
packetCount: 1 startTime: 11:32:41.025000 endTime: 11:32:41.025000

20 Count Tumbling with 10,000 msec guard

        The capture buckets are allowed to contain up to 20 packets.  An
          empty bucket is created if there are 10 seconds without the minimum 20
          packets. The first and the last buckets always have data.
      

        The 20-packet window size means many of the intervals will timeout
          unlike the 4-packet window in the previous section.  There is a
          lot of initial traffic which dumps most of the packets into the first
          bucket leaving less than 20.  This means there will be a single
          window with the reaming packets or multiple windows that timed out
          because of the low traffic.  

~/Network-intrusion-dataset-creator$ python3 main.py --sourcefile smtp-ssl.pcapng -wp 20 -wt 10000
packetCount: 20 startTime: 11:31:42.005000 endTime: 11:31:42.450000
packetCount: 1 startTime: 11:31:42.450000 endTime: 11:31:42.450000
packetCount: 4 startTime: 11:31:52.450000 endTime: 11:31:58.335000
packetCount: 0 startTime: 11:32:02.450000 endTime: 11:32:02.450000
packetCount: 0 startTime: 11:32:12.450000 endTime: 11:32:12.450000
packetCount: 4 startTime: 11:32:22.450000 endTime: 11:32:29.517000
packetCount: 9 startTime: 11:32:32.450000 endTime: 11:32:41.025000

      Wrap Up
     See the repo for code that demonstrates this.
 
 Created 2022 10

Blog de Joe Freeman