Posts

Showing posts with the label Tumbling Window

Time and Count based Tumbling Windows for Network Packet Statistics

Image
Aggregating and analyzing streaming data is one of the ways people build machine learning datasets.  Data is ingested and then data near each other is pushed into aggregations or rows. Aggregations have several attributes or Features . You can think of them as columns in a database or spreadsheet. A data set is made up of many aggregations each one representing some subset of the stream data.  You can think of the aggregations as rows in a spreadsheet.  One of the challenges is picking the right windowing strategy for aggregating or analyzing streaming data. There are a variety of well-known windowing algorithms, Tumbling, Hoping, Sliding, etc. We are using a Tumbling Windows algorithm because of its relative simplicity and low memory usage. Tumbling windows repeat without overlap. Tumbling windows are either size-limited or time-limited. They contain a maximum amount of data or extend for a maximum amount of time. Time-based windowing: ...

Creating Features in Python using tumbling windows

Image
This article originally discussed "Sliding Windows" but actually refers to a variant called "Tumbling Windows" The first step to using ML for intrusion analysis detection is the creation of Features that can be used in training and detection.  I write in  another blog  about creating features from tumbling windows bound aggregates of packet streams. Inbound packets are analyzed and then grouped with other packets that happen near each other.  We can walk through the steps of   GitHub repository   contains Python code that creates features from Wireshark/tshark packet streams. The program accepts live tshark output or tshark streams generated from captured .pcap files.  Network Traffic into Tumbling Windows The example program requires Python and Wireshark/tshark.  The Python code uses 4 multiprocess tasks making this essentially a 5 core process.  It is a 100% CPU bound on a 4 core machine so I suspect it ...

Network Intrusion Features via Tumbling Time Windows

Image
This article originally used the term "Sliding Time Window".  This article actually discusses a variant called the "Tumbling Time Window" Feature creation is one of the first steps toward creating Machine Models that apply to network monitoring or other stream-oriented data processes.  We massage independent variables into a form that can be used by ML models or other statistical tools. This often involves transforming source data through numerical conversion, bucketing, aggregation, and other techniques. For this project, we'd like to try and train a machine model to detect intrusion events by having it look at network traffic. People sometimes try and  directly consume events  as inputs. An individual network packet does not contain enough context to be useful on its own. A Tumbling time window makes it possible to create features with more context than you would get with a single message. This GitHub repository contains Pyth...