Home

Published

-

Logistic Regression in Scikit-learn (Worked Example)

img of Logistic Regression in Scikit-learn (Worked Example)

Why Logistic Regression?

Logistic Regression is a strong baseline for binary classification.

It models the probability of class 1 using the sigmoid function:

P(y=1x)=σ(z)=11+ez,z=wTx+bP(y=1\mid x) = \sigma(z) = \frac{1}{1+e^{-z}}, \quad z = w^Tx + b

Worked Example: Customer Churn Style Data

The example below creates synthetic binary classification data, trains a Logistic Regression model, and prints prediction labels, probabilities, and decision scores.

   import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, roc_auc_score

# 1) Create synthetic data
X, y = make_classification(
    n_samples=1000,
    n_features=8,
    n_informative=5,
    n_redundant=1,
    class_sep=1.2,
    random_state=42,
)

# 2) Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

# 3) Build a pipeline: scale features then fit model
model = make_pipeline(
    StandardScaler(),
    LogisticRegression(max_iter=1000, random_state=42),
)

# 4) Fit
model.fit(X_train, y_train)

# 5) Predictions
pred_labels = model.predict(X_test)                 # hard class labels (0/1)
pred_probs = model.predict_proba(X_test)[:, 1]      # probability for class 1
pred_scores = model.decision_function(X_test)       # raw score before sigmoid

print('Accuracy:', round(accuracy_score(y_test, pred_labels), 4))
print('ROC-AUC :', round(roc_auc_score(y_test, pred_probs), 4))
print('\nClassification report:\n')
print(classification_report(y_test, pred_labels))

# Show first 10 predictions with scores/probabilities
print('First 10 predictions:')
print('idx  score      prob(class=1)  pred  true')
for i in range(10):
    print(f"{i:>3}  {pred_scores[i]:>8.4f}   {pred_probs[i]:>12.4f}   {pred_labels[i]:>4}  {y_test[i]:>4}")

Understanding Prediction Scores

  • decision_function output is the log-odds score z=wTx+bz = w^Tx + b.
  • predict_proba converts that score using sigmoid.
  • predict applies a threshold (default 0.5 probability) to get class labels.

A positive decision score usually means the model leans toward class 1, and a negative score leans toward class 0.

Optional: Custom Thresholding

If your use case needs fewer false negatives or fewer false positives, change the threshold.

   custom_threshold = 0.35
pred_custom = (pred_probs >= custom_threshold).astype(int)

print('Custom-threshold accuracy:', round(accuracy_score(y_test, pred_custom), 4))

When To Use Logistic Regression

  • You need a fast, explainable baseline.
  • You want calibrated-ish probabilities quickly.
  • You have tabular features and binary targets.

For many real projects, this is the first model worth training before moving to complex methods.