Assignment

Predicting Customer Response to Marketing Campaigns

1 Context

Riverstone Bank (a young fintech) has run several marketing campaigns to acquire new customers and to cross-sell term deposit accounts to existing customers. To increase the efficiency of future campaigns, the bank wants to build a model that predicts whether a customer will respond positively and open a term deposit.

Figure 1: The headquarters of Riverstone Bank

Your team, acting as external consultants, will develop a classification model that estimates the probability of a successful term-deposit opening based on customer attributes and past campaign interactions.

2 Data

A detailed description of the modeling data is given in Section 5. The training dataset (dataset.parquet) includes information on executed marketing campaigns, customer demographics, and campaign outcomes.

When designing and evaluating your classifier, use the following cost–benefit matrix:

Truth
Prediction No Subscription Subscription
No Subscription 0 EUR 16 EUR
Subscription -30 EUR 80 EUR

Interpretation

  • Each correctly identified subscriber yields an average additional revenue of 80 EUR.
  • A customer who would subscribe but is predicted as “no subscription” (and thus not targeted) still results in a gain of 16 EUR, reflecting that about 20% will buy without targeted outreach.
  • A customer predicted to subscribe who does not actually subscribe incurs 30 EUR cost (e.g., calls/ads).
  • Correctly predicted non-subscribers yield neither gain nor loss (0 EUR).

3 Task

Train a model using dataset.parquet that predicts the likelihood of a term deposit subscription. Final predictions will be assessed on a hidden dataset; your model’s business value will be computed from total costs / total revenue using the matrix above.

Key requirements

  1. Explore & understand the data with descriptive analyses and graphics.
  2. Clean & engineer features if needed (derive meaningful variables for prediction).
  3. Model with suitable classification algorithms; tune hyperparameters where sensible.
  4. Evaluate & select a final model using statistical metrics and total cost/revenue.
  5. Interpret the results (which features drive performance, why).
  6. Communicate clearly with tables and figures.

Avoid overfitting via proper train/validation splits or resampling.

4 Deliverables

4.1 Jupyter Notebook (Python) — Code & Report (single file)

Submit one Jupyter Notebook (.ipynb) that contains:

  • Clear markdown narrative (executive summary, methods, results, discussion, references if any).
  • All Python code to reproduce your analysis and figures.
  • Executable cells end-to-end (use relative paths, set a random seed, keep runtime reasonable).
Tip

Prefer pandas, scikit-learn, numpy, matplotlib/seaborn, and a tidy project structure. If you export a separate PDF of the report, include it in addition to – not instead of – the notebook.

4.2 Predictions

Submit final prediction for the precition data set prediction.parquet.

4.2.1 File format

  • Two columns named customer_id and prediction with values {no, yes}.
  • Use either .csv or .parquet for continuity, that’s accepted as well.

4.2.2 CSV example

import pandas as pd
out = pd.DataFrame({
    "customer_id": ["4567833","14567834","14567835","14567836","14567837"],
    "prediction":  ["no","no","no","yes","no"]
})
out.to_csv("prediction1.csv", index=False)
Caution

Make sure your IDs and column names match exactly.

5 Features of the Dataset

The provided dataset (dataset.parquet) includes the following columns. The column y is the dependent variable (whether the customer subscribed to the term deposit). All others are predictors.

Column Meaning Type
customer_id Customer ID Numeric
age Age of customer Numeric
job Type of job (e.g., admin, technician, entrepreneur) Categorical
marital Marital status Categorical
education Education level Categorical
default Has credit in default (yes/no) Categorical
housing Has a housing loan (yes/no) Categorical
loan Has a personal loan (yes/no) Categorical
contact Type of contact Categorical
month Month of last contact in the year Categorical
day_of_week Weekday of last contact Categorical
duration Duration of last contact Numeric
campaign Number of contacts during the campaign Numeric
pdays Days since last contact Numeric
previous Number of contacts before this campaign Numeric
poutcome Outcome of the previous campaign Categorical
emp.var.rate Employment variation rate Numeric
cons_price_idx Consumer price index Numeric
cons_conf_idx Consumer confidence index Numeric
euribor3m Euribor 3-month rate Numeric
nr_employed Number of employees Numeric
y Target: term deposit subscription (yes/no) {yes, no}

6 Submission

  • Deadline: (TBD; see Teams).
  • What to submit:
    1. teamname_case_study.ipynb (Python code + report in one file).
    2. Final prediction files (teamname_prediction_final.csv, or teamname_prediction_final.parquet).
    3. (Optional) A PDF export of your notebook for easier reading.

Ensure your notebook runs top-to-bottom without errors on a standard environment.

7 Grading Criteria

  • Structure, systematics, formal requirements / documentation — 15%
  • Code executability — 10%
  • Correctness of analyses — 20%
  • Scope of analyses — 20%
  • Completeness of case study processing — 10%
  • Independence, initiative, creativity — 5%
  • Economic / problem-oriented thinking — 10%
  • Critical reflection, discussion / outlook — 10%
Note

Extra credit may be awarded for top leaderboard performance (business value).

8 Academic Integrity & Reproducibility

  • Cite external sources and libraries as needed.
  • Keep your work reproducible (fixed seeds if necessary).
  • Clearly mark any external templates or code you adapt.
  • Use of AI tools (e.g., ChatGPT, Copilot, etc.) must be transparent: declare where and how such tools were used (e.g., code generation, text editing, idea exploration).
Back to top