Case Study

Introduction

With the case study for this lecture we’ll demonstrate the Data Mining methodology and application using a real-world example from the banking sector.

msbank is a young, digital-first direct bank focusing on students and young adults. It combines mobile banking with simple, transparent card products and fair pricing.

Abbildung 1: The msbank Head Quarter in Münster

msbank StudentCard

  • Target group: Students (up to 30 years old) enrolled at German universities
  • Card type: Debit card (optional small student credit line available)
  • Monitoring: Real-time fraud scoring for every transaction; risk flagging for unusual purchase behavior
  • Intervention: Automatic 3-D Secure challenge or temporary block when risk is high; manual review if needed
  • Customer experience: Push notification via the msbank app during fraud verification to avoid unnecessary frustration
  • Chargeback policy: The bank reimburses confirmed fraud cases; unjustified chargebacks are reviewed manually
Abbildung 2: The msbank StudentCard

Dataset

  • msbank provides a dataset of credit card transactions1.
  • The data will help us illustrate the different steps of the machine learning process.
  • Goal: Detect fraudulent credit card transactions
  • Data: 100,000 credit card transactions with information on the transaction (date, location, merchant, owner, …) and fraud
    • 22 predictor variables
    • 1 binary target variable: fraud (yes/no)

Download the dataset here.

Features of the Dataset

This dataset contains 100,000 credit card transactions, including details about the cardholder, merchant, location, and fraud status. The column is_fraud is the dependent variable (fraud flag). All others are predictors.

Tabelle 1: Description of variables in the Credit Card Transactions dataset
Description of variables in the Credit Card Transactions dataset
Column Meaning Python_Type General_Type
trans_date_trans_time Timestamp of the transaction (date and time) datetime64[ns] Date/Time
cc_num Credit card number (anonymized) string Text
merchant Merchant name where the transaction occurred category Categorical
category Transaction category (e.g., grocery_pos, gas_transport) category Categorical
amt Transaction amount (USD) float32 Numeric
first Cardholder’s first name string Text
last Cardholder’s last name string Text
gender Cardholder’s gender category Categorical
street Cardholder’s street address string Text
city Cardholder’s city category Categorical
state Cardholder’s state (US) category Categorical
zip Cardholder’s ZIP code (string to preserve leading zeros) string Text
lat Latitude of the transaction location float32 Numeric
long Longitude of the transaction location float32 Numeric
city_pop Population of the city where the transaction occurred float32 Numeric
job Cardholder’s occupation category Categorical
dob Cardholder’s date of birth datetime64[ns] Date/Time
trans_num Unique transaction identifier string Text
unix_time Unix timestamp of the transaction (seconds since epoch) int32 Numeric
merch_lat Latitude of the merchant location float32 Numeric
merch_long Longitude of the merchant location float32 Numeric
merch_zipcode Merchant ZIP code (nullable integer) Int32 Numeric
is_fraud Target variable: indicates whether the transaction is fraudulent (`True`/`False`) bool Categorical (Binary)
Zurück nach oben

Fußnoten

  1. originally sourced from Hugging face↩︎