Assignment

Predicting Customer Response to Marketing Campaigns

1 Context

Riverstone Bank (a young fintech) has run several marketing campaigns to acquire new customers and to cross-sell term deposit accounts to existing customers. To increase the efficiency of future campaigns, the bank wants to build a model that predicts whether a customer will respond positively and open a term deposit.

Your team, acting as external consultants, will develop a classification model that estimates the probability of a successful term-deposit opening based on customer attributes and past campaign interactions.

2 Data

A detailed description of the modeling data is given in Section 5. The training dataset (dataset.parquet) includes information on executed marketing campaigns, customer demographics, and campaign outcomes.

When designing and evaluating your classifier, use the following cost–benefit matrix:

	Truth
Prediction	No Subscription	Subscription
No Subscription	0 EUR	16 EUR
Subscription	-30 EUR	80 EUR

Interpretation

Each correctly identified subscriber yields an average additional revenue of 80 EUR.
A customer who would subscribe but is predicted as “no subscription” (and thus not targeted) still results in a gain of 16 EUR, reflecting that about 20% will buy without targeted outreach.
A customer predicted to subscribe who does not actually subscribe incurs 30 EUR cost (e.g., calls/ads).
Correctly predicted non-subscribers yield neither gain nor loss (0 EUR).

3 Task

Train a model using dataset.parquet that predicts the likelihood of a term deposit subscription. Final predictions will be assessed on a hidden dataset; your model’s business value will be computed from total costs / total revenue using the matrix above.

Key requirements

Explore & understand the data with descriptive analyses and graphics.
Clean & engineer features if needed (derive meaningful variables for prediction).
Model with suitable classification algorithms; tune hyperparameters where sensible.
Evaluate & select a final model using statistical metrics and total cost/revenue.
Interpret the results (which features drive performance, why).
Communicate clearly with tables and figures.

Avoid overfitting via proper train/validation splits or resampling.

4 Deliverables

4.1 Jupyter Notebook (Python) — Code & Report (single file)

Submit one Jupyter Notebook (.ipynb) that contains:

Clear markdown narrative (executive summary, methods, results, discussion, references if any).
All Python code to reproduce your analysis and figures.
Executable cells end-to-end (use relative paths, set a random seed, keep runtime reasonable).

Tip

Prefer pandas, scikit-learn, numpy, matplotlib/seaborn, and a tidy project structure. If you export a separate PDF of the report, include it in addition to – not instead of – the notebook.

4.2 Predictions

Submit final prediction for the precition data set prediction.parquet.

4.2.1 File format

Two columns named customer_id and prediction with values {no, yes}.
Use either .csv or .parquet for continuity, that’s accepted as well.

4.2.2 CSV example

import pandas as pd
out = pd.DataFrame({
    "customer_id": ["4567833","14567834","14567835","14567836","14567837"],
    "prediction":  ["no","no","no","yes","no"]
})
out.to_csv("prediction1.csv", index=False)

Caution

Make sure your IDs and column names match exactly.

5 Features of the Dataset

The provided dataset (dataset.parquet) includes the following columns. The column y is the dependent variable (whether the customer subscribed to the term deposit). All others are predictors.

Column	Meaning	Type
customer_id	Customer ID	Numeric
age	Age of customer	Numeric
job	Type of job (e.g., admin, technician, entrepreneur)	Categorical
marital	Marital status	Categorical
education	Education level	Categorical
default	Has credit in default (yes/no)	Categorical
housing	Has a housing loan (yes/no)	Categorical
loan	Has a personal loan (yes/no)	Categorical
contact	Type of contact	Categorical
month	Month of last contact in the year	Categorical
day_of_week	Weekday of last contact	Categorical
duration	Duration of last contact	Numeric
campaign	Number of contacts during the campaign	Numeric
pdays	Days since last contact	Numeric
previous	Number of contacts before this campaign	Numeric
poutcome	Outcome of the previous campaign	Categorical
emp.var.rate	Employment variation rate	Numeric
cons_price_idx	Consumer price index	Numeric
cons_conf_idx	Consumer confidence index	Numeric
euribor3m	Euribor 3-month rate	Numeric
nr_employed	Number of employees	Numeric
y	Target: term deposit subscription (yes/no)	{yes, no}

6 Submission

Deadline: (TBD; see Teams).
What to submit:
1. teamname_case_study.ipynb (Python code + report in one file).
2. Final prediction files (teamname_prediction_final.csv, or teamname_prediction_final.parquet).
3. (Optional) A PDF export of your notebook for easier reading.

Ensure your notebook runs top-to-bottom without errors on a standard environment.

7 Grading Criteria

Structure, systematics, formal requirements / documentation — 15%
Code executability — 10%
Correctness of analyses — 20%
Scope of analyses — 20%
Completeness of case study processing — 10%
Independence, initiative, creativity — 5%
Economic / problem-oriented thinking — 10%
Critical reflection, discussion / outlook — 10%

Note

Extra credit may be awarded for top leaderboard performance (business value).

8 Academic Integrity & Reproducibility

Cite external sources and libraries as needed.
Keep your work reproducible (fixed seeds if necessary).
Clearly mark any external templates or code you adapt.
Use of AI tools (e.g., ChatGPT, Copilot, etc.) must be transparent: declare where and how such tools were used (e.g., code generation, text editing, idea exploration).