| Prediction | No Subscription | Subscription |
|---|---|---|
| No Subscription | 0 EUR | 16 EUR |
| Subscription | -30 EUR | 80 EUR |
Assignment
Predicting Customer Response to Marketing Campaigns
1 Context
Riverstone Bank (a young fintech) has run several marketing campaigns to acquire new customers and to cross-sell term deposit accounts to existing customers. To increase the efficiency of future campaigns, the bank wants to build a model that predicts whether a customer will respond positively and open a term deposit.
Your team, acting as external consultants, will develop a classification model that estimates the probability of a successful term-deposit opening based on customer attributes and past campaign interactions.
2 Data
A detailed description of the modeling data is given in Section 5. The training dataset (dataset.parquet) includes information on executed marketing campaigns, customer demographics, and campaign outcomes.
When designing and evaluating your classifier, use the following cost–benefit matrix:
Interpretation
- Each correctly identified subscriber yields an average additional revenue of 80 EUR.
- A customer who would subscribe but is predicted as “no subscription” (and thus not targeted) still results in a gain of 16 EUR, reflecting that about 20% will buy without targeted outreach.
- A customer predicted to subscribe who does not actually subscribe incurs 30 EUR cost (e.g., calls/ads).
- Correctly predicted non-subscribers yield neither gain nor loss (0 EUR).
3 Task
Train a model using dataset.parquet that predicts the likelihood of a term deposit subscription. Final predictions will be assessed on a hidden dataset; your model’s business value will be computed from total costs / total revenue using the matrix above.
Key requirements
- Explore & understand the data with descriptive analyses and graphics.
- Clean & engineer features if needed (derive meaningful variables for prediction).
- Model with suitable classification algorithms; tune hyperparameters where sensible.
- Evaluate & select a final model using statistical metrics and total cost/revenue.
- Interpret the results (which features drive performance, why).
- Communicate clearly with tables and figures.
Avoid overfitting via proper train/validation splits or resampling.
4 Deliverables
4.1 Jupyter Notebook (Python) — Code & Report (single file)
Submit one Jupyter Notebook (.ipynb) that contains:
- Clear markdown narrative (executive summary, methods, results, discussion, references if any).
- All Python code to reproduce your analysis and figures.
- Executable cells end-to-end (use relative paths, set a random seed, keep runtime reasonable).
Prefer pandas, scikit-learn, numpy, matplotlib/seaborn, and a tidy project structure. If you export a separate PDF of the report, include it in addition to – not instead of – the notebook.
4.2 Predictions
Submit final prediction for the precition data set prediction.parquet.
4.2.1 File format
- Two columns named
customer_idandpredictionwith values{no, yes}. - Use either
.csvor.parquetfor continuity, that’s accepted as well.
4.2.2 CSV example
import pandas as pd
out = pd.DataFrame({
"customer_id": ["4567833","14567834","14567835","14567836","14567837"],
"prediction": ["no","no","no","yes","no"]
})
out.to_csv("prediction1.csv", index=False)Make sure your IDs and column names match exactly.
5 Features of the Dataset
The provided dataset (dataset.parquet) includes the following columns. The column y is the dependent variable (whether the customer subscribed to the term deposit). All others are predictors.
| Column | Meaning | Type |
|---|---|---|
| customer_id | Customer ID | Numeric |
| age | Age of customer | Numeric |
| job | Type of job (e.g., admin, technician, entrepreneur) | Categorical |
| marital | Marital status | Categorical |
| education | Education level | Categorical |
| default | Has credit in default (yes/no) | Categorical |
| housing | Has a housing loan (yes/no) | Categorical |
| loan | Has a personal loan (yes/no) | Categorical |
| contact | Type of contact | Categorical |
| month | Month of last contact in the year | Categorical |
| day_of_week | Weekday of last contact | Categorical |
| duration | Duration of last contact | Numeric |
| campaign | Number of contacts during the campaign | Numeric |
| pdays | Days since last contact | Numeric |
| previous | Number of contacts before this campaign | Numeric |
| poutcome | Outcome of the previous campaign | Categorical |
| emp.var.rate | Employment variation rate | Numeric |
| cons_price_idx | Consumer price index | Numeric |
| cons_conf_idx | Consumer confidence index | Numeric |
| euribor3m | Euribor 3-month rate | Numeric |
| nr_employed | Number of employees | Numeric |
| y | Target: term deposit subscription (yes/no) | {yes, no} |
6 Submission
- Deadline: (TBD; see Teams).
- What to submit:
teamname_case_study.ipynb(Python code + report in one file).
- Final prediction files (
teamname_prediction_final.csv, orteamname_prediction_final.parquet).
- (Optional) A PDF export of your notebook for easier reading.
Ensure your notebook runs top-to-bottom without errors on a standard environment.
7 Grading Criteria
- Structure, systematics, formal requirements / documentation — 15%
- Code executability — 10%
- Correctness of analyses — 20%
- Scope of analyses — 20%
- Completeness of case study processing — 10%
- Independence, initiative, creativity — 5%
- Economic / problem-oriented thinking — 10%
- Critical reflection, discussion / outlook — 10%
Extra credit may be awarded for top leaderboard performance (business value).
8 Academic Integrity & Reproducibility
- Cite external sources and libraries as needed.
- Keep your work reproducible (fixed seeds if necessary).
- Clearly mark any external templates or code you adapt.
- Use of AI tools (e.g., ChatGPT, Copilot, etc.) must be transparent: declare where and how such tools were used (e.g., code generation, text editing, idea exploration).