Feature Engineering That Actually Improves Models

Better features beat better algorithms. These techniques consistently improve model performance across domains.

Key Insights

  • Target encoding outperforms one-hot for high-cardinality categoricals
  • Time-based features (day of week, hour, recency) add predictive power to temporal data
  • Feature interactions capture relationships that linear models miss

Target Encoding

from sklearn.model_selection import KFold

def target_encode(df, col, target, n_splits=5):
    kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)
    encoded = pd.Series(index=df.index, dtype=float)
    for train_idx, val_idx in kf.split(df):
        means = df.iloc[train_idx].groupby(col)[target].mean()
        encoded.iloc[val_idx] = df.iloc[val_idx][col].map(means)
    return encoded.fillna(df[target].mean())

Time Features

df["hour"] = df["timestamp"].dt.hour
df["day_of_week"] = df["timestamp"].dt.dayofweek
df["days_since_last"] = df.groupby("user_id")["timestamp"].diff().dt.days
df["is_weekend"] = df["day_of_week"].isin([5, 6]).astype(int)

Interaction Features

# Ratio features often outperform raw values
df["price_per_sqft"] = df["price"] / df["sqft"]
df["income_to_debt"] = df["income"] / (df["debt"] + 1)

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.