Skip to content

Anomaly Detection on Dynamic (time-evolving) Graphs in Real-time and Streaming manner. Detecting intrusions (DoS and DDoS attacks), frauds, fake rating anomalies.

License

Notifications You must be signed in to change notification settings

Stream-AD/MIDAS

Repository files navigation

MIDAS

Microcluster-Based Detector of Anomalies in Edge Streams

GIF demo ...

Table of Contents

Features

  • Finds Anomalies in Dynamic/Time-Evolving Graphs
  • Detects Microcluster Anomalies (suddenly arriving groups of suspiciously similar edges e.g. DoS attack)
  • Theoretical Guarantees on False Positive Probability
  • Constant Memory (independent of graph size)
  • Constant Update Time (real-time anomaly detection to minimize harm)
  • Up to 48% more accurate and 644 times faster than the state of the art approaches

For more details, please read the paper - MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams. Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos. AAAI 2020.

Use Cases

  1. Intrusion Detection
  2. Fake Ratings
  3. Financial Fraud

Getting Started

  1. Run make to compile code and create the executable.
  2. Run ./midas -i followed by the input file path and name.

Demo

  1. Run ./demo.sh to compile the code and run it on an example dataset.

Command-Line Options

  • -h --help: produce help message
  • -i --input: input file name
  • -o --output: output file name (default: scores.txt)
  • -r --rows: Number of Hash Functions (default: 2)
  • -b --buckets: Number of Buckets (default: 769)
  • -a --alpha: Temporal Decay Factor (default: 0.6)
  • --norelations : Run MIDAS instead of MIDAS-R
  • --undirected : Treat graph as undirected instead of directed

Input File Format

MIDAS expects the input edge stream to be stored in a single file containing the following three columns in order:

  1. source (int): source ID of the edge
  2. destination (int): destination ID of the edge
  3. time (int): timestamp of the edge

Thus, each line represents an edge. Edges should be sorted in non-decreasing order of their timestamps and the column delimiter should be ,

Datasets

  1. DARPA: Original Format, MIDAS format
  2. TwitterWorldCup2014
  3. TwitterSecurity

Online Articles

  1. KDnuggets: Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs
  2. Towards Data Science: Controlling Fake News using Graphs and Statistics
  3. Towards Data Science: Anomaly detection in dynamic graphs using MIDAS
  4. Towards AI: Anomaly Detection with MIDAS
  5. AIhub Interview

MIDAS in other Languages

  1. Golang by Steve Tan
  2. Ruby by Andrew Kane
  3. Rust by Scott Steele
  4. R by Tobias Heidler
  5. Python by Ritesh Kumar

Citation

If you use this code for your research, please consider citing our paper.

@article{bhatia2019midas,
  title={MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams},
  author={Bhatia, Siddharth and Hooi, Bryan and Yoon, Minji and Shin, Kijung and Faloutsos, Christos},
  journal={arXiv preprint arXiv:1911.04464},
  year={2019}
}


Webpage https://www.comp.nus.edu.sg/~sbhatia/  ·  Email siddharth@comp.nus.edu.sg  ·  Twitter @siddharthb_