Bank Term Deposit Analysis

Logistic Regression Model

Built a binary classification model to predict whether a customer will subscribe to a bank's term deposit using their age and account balance

Prediction:

Customer Information

Confusion Matrix for model without using SMOTE

cm without smote

Model score = 88.5%

- The model predicts everything as “no” (0)

- Because the data are imbalanced and most labels are actually “no,” the model still gets a deceptively high accuracy by doing almost nothing meaningful.

Confusion Matrix for model built with SMOTE

cm with smote

Model score = 65%

- I used SMOTE to balance the dataset by oversampling the minority class

- The model achieves a 65% accuracy, lower than before applying SMOTE.

- However, this is acceptable because the new model improved recall for the minority class & actually "learned" to predict instead of assigning 0 ("no") to all cases for the sake of high accuracy score.

Precision & Recall

pr without smote

- All of them are 0 because the model never predicts 1 (no true positives).

- Meanwhile: For class 0, the metrics are high because everything is predicted as 0, and the vast majority of actual labels are 0.

Precision & Recall

pr with smote

- The new model correctly predicts 41% of the positive class (recall)

- Only 14% of all of the model's predicted positives are correct (precision).

=> With SMOTE, the model is now able to detect the minority class (higher recall than a model that predicts “0” always)

Visualization Dashboard


I built the dashboard with Power BI to visualize the Bank Term Deposit data. Due to Microsoft License, a live, interactive web version of the dashboard with tooltips is not available, only the static pdf export is.



Static snapshot of my Power BI dashboard

SQL Analysis

View Code

SQL Queries

        SELECT * FROM bank_schema.bank_targets;
        
        -- Total number of rows
        SELECT COUNT(age) FROM bank_schema.bank_targets;
        
        -- Age distribution statistics
        SELECT MIN(age) AS min_age, 
               MAX(age) AS max_age, 
               AVG(age) AS mean_age, 
               MAX(age) - MIN(age) AS range_age 
        FROM bank_schema.bank_targets;
        
        -- Occupation distribution statistics
        SELECT job, 
               COUNT(job) AS number_of_people_with_this_job 
        FROM bank_schema.bank_targets 
        GROUP BY job 
        ORDER BY COUNT(job) DESC;
        
        -- Retrieve the number of unique occupations
        SELECT COUNT(DISTINCT job) 
        FROM bank_schema.bank_targets;
        
        -- Calculate percentages of each occupation
        SELECT (969 / COUNT(*)) * 100, 
               (946 / COUNT(*)) * 100, 
               (768 / COUNT(*)) * 100, 
               (478 / COUNT(*)) * 100, 
               (417 / COUNT(*)) * 100, 
               (230 / COUNT(*)) * 100, 
               (183 / COUNT(*)) * 100, 
               (168 / COUNT(*)) * 100, 
               (128 / COUNT(*)) * 100, 
               (112 / COUNT(*)) * 100, 
               (84 / COUNT(*)) * 100, 
               (38 / COUNT(*)) * 100 
        FROM bank_schema.bank_targets;
        
        -- Marital status statistics
        SELECT marital, 
               COUNT(marital) AS number_of_people 
        FROM bank_schema.bank_targets 
        GROUP BY marital;
        
        -- Calculate percentages of customers in each marital status
        SELECT (2797 / COUNT(*)) * 100, 
               (1196 / COUNT(*)) * 100, 
               (528 / COUNT(*)) * 100 
        FROM bank_schema.bank_targets;
        
        -- Education statistics
        UPDATE bank_schema.bank_targets 
        SET education = 'secondary' 
        WHERE education = 'unknown';
        
        SELECT education, 
               COUNT(education) AS number_of_people 
        FROM bank_schema.bank_targets 
        GROUP BY education;
        
        -- Calculate percentages of customers in each education level
        SELECT (678 / COUNT(*)) * 100, 
               (2493 / COUNT(*)) * 100, 
               (1350 / COUNT(*)) * 100 
        FROM bank_schema.bank_targets;
        
        -- Default rate statistics
        SELECT default_rate, 
               COUNT(default_rate) AS number_of_people 
        FROM bank_schema.bank_targets 
        GROUP BY default_rate;
        
        -- Calculate percentages of customers for each default rate
        SELECT (4445 / COUNT(*)) * 100 AS `No_%`, 
               (76 / COUNT(*)) * 100 AS `Yes_%` 
        FROM bank_schema.bank_targets;
        
        -- Balance statistics
        SELECT MIN(balance) AS min, 
               MAX(balance) AS max, 
               AVG(balance) AS mean, 
               (MAX(balance) - MIN(balance)) AS `range` 
        FROM bank_schema.bank_targets;
        
        -- Housing statistics
        SELECT housing, 
               COUNT(housing) AS num_people 
        FROM bank_schema.bank_targets 
        GROUP BY housing;
        
        -- Calculate percentages of customers for each housing category
        SELECT (1962 / COUNT(*)) * 100 AS `No_%`, 
               (2559 / COUNT(*)) * 100 AS `Yes_%` 
        FROM bank_schema.bank_targets;
        
        -- Loan statistics
        SELECT loan, 
               COUNT(loan) AS num_loan 
        FROM bank_schema.bank_targets 
        GROUP BY loan;
        
        -- Calculate percentages of customers for each loan category
        SELECT (3830 / COUNT(*)) * 100 AS `No_%`, 
               (691 / COUNT(*)) * 100 AS `Yes_%` 
        FROM bank_schema.bank_targets;
        
        -- Last contact month analysis
        SELECT `month`, 
               COUNT(`month`) 
        FROM bank_schema.bank_targets 
        GROUP BY `month` 
        ORDER BY COUNT(`month`);
        
        -- Calculate percentages of customers for each month
        SELECT (20 / COUNT(*)) * 100 AS december, 
               (49 / COUNT(*)) * 100 AS march, 
               (52 / COUNT(*)) * 100 AS september, 
               (80 / COUNT(*)) * 100 AS october, 
               (148 / COUNT(*)) * 100 AS january, 
               (222 / COUNT(*)) * 100 AS february, 
               (293 / COUNT(*)) * 100 AS april, 
               (389 / COUNT(*)) * 100 AS november, 
               (531 / COUNT(*)) * 100 AS june, 
               (633 / COUNT(*)) * 100 AS august, 
               (706 / COUNT(*)) * 100 AS july, 
               (1398 / COUNT(*)) * 100 AS may 
        FROM bank_schema.bank_targets;
        
        -- Duration of phone call analysis
        SELECT MIN(duration) AS min_duration, 
               MAX(duration) AS max_duration, 
               AVG(duration) AS average_duration, 
               MAX(duration) - MIN(duration) AS range_duration 
        FROM bank_schema.bank_targets;
        
        -- Number of contacts performed for each client
        SELECT MIN(campaign) AS min_contacts, 
               MAX(campaign) AS max_contacts, 
               AVG(campaign) AS average_contacts 
        FROM bank_schema.bank_targets;
        
        -- Number of days that passed by after the client
        was last contacted from a previous campaign
        -- ALTER TABLE bank_schema.bank_targets
       RENAME COLUMN pdays TO days_passed;

        SELECT COUNT(days_passed) 
        FROM bank_schema.bank_targets 
        WHERE days_passed = -1;
        
        SELECT AVG(days_passed) 
        FROM bank_schema.bank_targets 
        WHERE days_passed != -1;
        
        -- Number of contacts performed
        before this campaign for each customer 
        -- ALTER TABLE bank_schema.bank_targets
        RENAME COLUMN previous TO previous_contacts;

        SELECT AVG(previous_contacts) 
        FROM bank_schema.bank_targets;
        
        SELECT previous_contacts, 
               COUNT(previous_contacts) 
        FROM bank_schema.bank_targets 
        GROUP BY previous_contacts 
        ORDER BY COUNT(previous_contacts) DESC;
        
        -- Outcome of the previous campaign
        -- ALTER TABLE bank_schema.bank_targets
        RENAME COLUMN poutcome TO previous_campaign_outcome;
        SELECT previous_campaign_outcome, 
               COUNT(previous_campaign_outcome) AS count 
        FROM bank_schema.bank_targets 
        GROUP BY previous_campaign_outcome 
        ORDER BY COUNT(previous_campaign_outcome);
        
        -- Calculate percentages
        SELECT (129 / COUNT(*)) * 100 AS success, 
               (490 / COUNT(*)) * 100 AS failure, 
               (197 / COUNT(*)) * 100 AS other, 
               (3705 / COUNT(*)) * 100 AS `unknown` 
        FROM bank_schema.bank_targets;
        
        -- Customers' responses to term deposit
        SELECT y, 
               COUNT(y) AS number_of_customers 
        FROM bank_schema.bank_targets 
        GROUP BY y;
        
        -- Housing ownership
        SELECT housing, 
               COUNT(housing) 
        FROM bank_schema.bank_targets 
        WHERE y = 'yes' 
        GROUP BY housing;
        
        SELECT (301 / 521) * 100 AS `no`, 
               (220 / 521) * 100 AS `yes`;
        
        SELECT (93 / 521) * 100 AS may, 
               (61 / 521) * 100 AS july, 
               (79 / 521) * 100 AS aug, 
               (55 / 521) * 100 AS june, 
               (39 / 521) * 100 AS november, 
               (56 / 521) * 100 AS april, 
               (38 / 521) * 100 AS february, 
               (16 / 521) * 100 AS january, 
               (37 / 521) * 100 AS october, 
               (17 / 521) * 100 AS september, 
               (21 / 521) * 100 AS march, 
               (9 / 521) * 100 AS december;
        
        -- Analyzing customers who responded "Yes" to term deposit
        
        -- Age distribution
        SELECT MIN(age), 
               MAX(age), 
               AVG(age), 
               MAX(age) - MIN(age) AS `range` 
        FROM bank_schema.bank_targets 
        WHERE y = 'yes';
        
        -- Occupation
        SELECT COUNT(DISTINCT job) 
        FROM bank_schema.bank_targets 
        WHERE y = 'yes';
        
        SELECT job, 
               COUNT(job) 
        FROM bank_schema.bank_targets 
        WHERE y = 'yes' 
        GROUP BY job 
        ORDER BY COUNT(job) DESC;
        
        SELECT (131 / 521) * 100 AS management, 
               (83 / 521) * 100 AS blue_collar, 
               (69 / 521) * 100 AS technician, 
               (58 / 521) * 100 AS administration, 
               (54 / 521) * 100 AS services, 
               (38 / 521) * 100 AS retired, 
               (20 / 521) * 100 AS sel_employed, 
               (19 / 521) * 100 AS entrepreneur, 
               (15 / 521) * 100 AS housemaid, 
               (14 / 521) * 100 AS student, 
               (10 / 521) * 100 AS unemployed, 
               (11 / 521) * 100 AS unknown;
            

SQL results

Information regarding the entire dataset Customers who responded "Yes" to the term deposit Customers who responded "Yes" to the term deposit.1
Total number 4521 521 4000
------------------- --------------------------------------------------------- Age distribution------------------------------------------------ --------------------------------------------------------------------------
Min 19 19 19
Max 87 87 86
Range 41 42 41
Average 68 68 67
------------------- --------------------------------------------------------- Occupation distribution-------------------------------------- --------------------------------------------------------------------------
Number of unique jobs 12 12 12
Management 969 (21.43%) 131 (25%) 838 (21%)
Blue-collar 946 (21%) 83 (16%) 877 (22%)
Technician 768 (17%) 69 (13%) 685 (17%)
Admin 478 (10.57%) 58 (11%) 420 (11%)
Services 417 (9.22%) 54 (10%) 379 (9%)
Retired 230 (5.09%) 38 (7%) 176 (4%)
Self-employed 183 (4.05%) 20 (4%) 163 (4%)
Entrepreneur 168 (3.72%) 19 (4%) 153 (4%)
Unemployed 128 (2.83%) 15 (3%) 115 (3%)
Housemaid 112 (2.48%) 14 (3%) 98 (2%)
Students 84 (1.86%) 13 (2%) 65 (2%)
Unknown 38 (0.84%) 7 (1%) 31 (1%)
------------------- --------------------------------------------------------- Marital Status--------------------------------------------------- --------------------------------------------------------------------------
Married 2797 (62%) 277 (53%) 2520 (63%)
Single 1196 (26%) 167 (32%) 1029 (26%)
Divorced 528 (12%) 77 (15%) 451 (11%)
------------------- --------------------------------------------------------- Education--------------------------------------------------------- --------------------------------------------------------------------------
Primary 678 (15%) 64 (12%) 614 (15%)
Secondary 2493 (55%) 264 (51%) 2229 (56%)
Tertiary 1350 (30%) 193 (37%) 1157 (29%)
------------------- --------------------------------------------------------- Default rate-------------------------------------------------------- --------------------------------------------------------------------------
No 4445 (98%) 512 (98%) 3933 (98%)
Yes 76 (2%) 9 (2%) 67 (2%)
------------------- --------------------------------------------------------- Balance------------------------------------------------------------- --------------------------------------------------------------------------
Min -3313 -1206 -3313
Max 71188 26965 71188
Mean 1423 158 1403
Range 74501 28171 74501
------------------- --------------------------------------------------------- Housing Ownership----------------------------------------- --------------------------------------------------------------------------
No 1962 (43%) 301 (58%) 1661 (42%)
Yes 2559 (57%) 220 (42%) 2339 (58%)
------------------- --------------------------------------------------------- Loan------------------------------------------------------------- --------------------------------------------------------------------------
No 3830 (85%) 478 (92%) 3352 (84%)
Yes 691 (15%) 43 (8%) 648 (16%)
------------------- --------------------------------------------------------- Month of previous contact---------------------------------- --------------------------------------------------------------------------
May 1398 (31%) 93 (18%) 1305 (3%)
July 706 (16%) 61 (12%) 645 (16%)
August 633 (14%) 79 (15%) 554 (14%)
June 531 (12%) 55 (11%) 476 (12%)
November 389 (9%) 39(7%) 350 (9%)
April 293 (6%) 56 (11%) 237 (6%)
February 222 (5%) 38 (7%) 184 (5%)
January 148 (3%) 16 (3%) 132 (3%)
October 80 (2%) 37 (7%) 43 (1%)
September 52 (1%) 17 (3%) 35 (1%)
March 49 (1%) 21 (4%) 28 (1%)
December 20 (0.4%) 9 (1%) 11 (0.3%)
------------------- --------------------------------------------------------- Duration of phone call---------------------------------------- --------------------------------------------------------------------------
Min 4 seconds 30 seconds 4 seconds
Max 3025 seconds 2769 seconds 3025 seconds
Average 264 seconds 553 seconds 226 seconds
Range 301 seconds 2739 seconds 3021 seconds
------------------- --------------------------------------------------------- Number of contacts------------------------------------------- --------------------------------------------------------------------------
Min 1 time 1 1
Max 50 times 24 50
Average 3 times 2 3
------------------- --------------------------------------------------------- Number of days passed by after last contact------------ --------------------------------------------------------------------------
Average 225 days 196 days 233 days
Not previously contacted 3705 customers 337 customers 3368 customers
------------------- --------------------------------------------------------- Number of contacts performed before this campaign --------------------------------------------------------------------------
0 times 3705 customers 337 customers 3368 customers
Average 1 time 3 times 3 times
------------------- --------------------------------------------------------- Outcome of the previous campaign------------------------ --------------------------------------------------------------------------
Success 129 (3%) 83 (16%) 46 (1%)
Failure 490 (11%) 63 (12%) 427 (11%)
Other 197 (5%) 38 (7%) 159 (4%)
Unknown 3705 (82%) 337 (65%) 3368 (84%)