Compressing large amounts of netflow data using a pattern classification scheme
Compressing large amounts of netflow data using a pattern classification scheme
Samenvatting
The storage of large amounts of network data is a challenging problem, in particular if it still needs to be actively consulted as for example in the case of network forensics. Here we propose a method to compress NetFlow data while simultaneously adding domain knowledge. Our method is based on a pattern classification scheme by considering all flows from a single source IP address simultaneously. Each pattern can be described by at most 19 attributes that give a good statistical description of the original NetFlow data, while minimising information loss. We estimate that on average a factor of about 300 in storage space can be gained. The process is explained using a real world dataset from a large, high-speed, network, and a formal rationale is provided.