| Column | Meaning | Python_Type | General_Type |
|---|---|---|---|
| trans_date_trans_time | Timestamp of the transaction (date and time) | datetime64[ns] | Date/Time |
| cc_num | Credit card number (anonymized) | string | Text |
| merchant | Merchant name where the transaction occurred | category | Categorical |
| category | Transaction category (e.g., grocery_pos, gas_transport) | category | Categorical |
| amt | Transaction amount (USD) | float32 | Numeric |
| first | Cardholder’s first name | string | Text |
| last | Cardholder’s last name | string | Text |
| gender | Cardholder’s gender | category | Categorical |
| street | Cardholder’s street address | string | Text |
| city | Cardholder’s city | category | Categorical |
| state | Cardholder’s state (US) | category | Categorical |
| zip | Cardholder’s ZIP code (string to preserve leading zeros) | string | Text |
| lat | Latitude of the transaction location | float32 | Numeric |
| long | Longitude of the transaction location | float32 | Numeric |
| city_pop | Population of the city where the transaction occurred | float32 | Numeric |
| job | Cardholder’s occupation | category | Categorical |
| dob | Cardholder’s date of birth | datetime64[ns] | Date/Time |
| trans_num | Unique transaction identifier | string | Text |
| unix_time | Unix timestamp of the transaction (seconds since epoch) | int32 | Numeric |
| merch_lat | Latitude of the merchant location | float32 | Numeric |
| merch_long | Longitude of the merchant location | float32 | Numeric |
| merch_zipcode | Merchant ZIP code (nullable integer) | Int32 | Numeric |
| is_fraud | Target variable: indicates whether the transaction is fraudulent (`True`/`False`) | bool | Categorical (Binary) |
Case Study
Introduction
With the case study for this lecture we’ll demonstrate the Data Mining methodology and application using a real-world example from the banking sector.
msbank is a young, digital-first direct bank focusing on students and young adults. It combines mobile banking with simple, transparent card products and fair pricing.
msbank StudentCard
- Target group: Students (up to 30 years old) enrolled at German universities
- Card type: Debit card (optional small student credit line available)
- Monitoring: Real-time fraud scoring for every transaction; risk flagging for unusual purchase behavior
- Intervention: Automatic 3-D Secure challenge or temporary block when risk is high; manual review if needed
- Customer experience: Push notification via the msbank app during fraud verification to avoid unnecessary frustration
- Chargeback policy: The bank reimburses confirmed fraud cases; unjustified chargebacks are reviewed manually
Dataset
- msbank provides a dataset of credit card transactions1.
- The data will help us illustrate the different steps of the machine learning process.
- Goal: Detect fraudulent credit card transactions
- Data: 100,000 credit card transactions with information on the transaction (date, location, merchant, owner, …) and fraud
- 22 predictor variables
- 1 binary target variable: fraud (yes/no)
Download the dataset here.
Features of the Dataset
This dataset contains 100,000 credit card transactions, including details about the cardholder, merchant, location, and fraud status. The column is_fraud is the dependent variable (fraud flag). All others are predictors.
Fußnoten
originally sourced from Hugging face↩︎