Automated Customer Journey Mapping and Engagement Optimizer R

👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns

# --- 1. Data Generation/Loading and Preprocessing ---

def generate_customer_data(num_customers=100):
    """
    Generates synthetic customer data for demonstration purposes.

    Each customer has the following features:
    - Time on Website (seconds)
    - Pages Visited
    - Purchases Made
    - Email Opens (last month)
    - Customer Satisfaction (1-5)

    Returns:
        pandas.DataFrame: DataFrame containing the customer data.
    """

    np.random.seed(42)  # for reproducibility

    time_on_website = np.random.randint(10, 600, num_customers)  # 10 seconds to 10 minutes
    pages_visited = np.random.randint(1, 20, num_customers)
    purchases_made = np.random.randint(0, 5, num_customers)
    email_opens = np.random.randint(0, 15, num_customers)
    customer_satisfaction = np.random.randint(1, 6, num_customers)

    data = {
        'Time on Website': time_on_website,
        'Pages Visited': pages_visited,
        'Purchases Made': purchases_made,
        'Email Opens': email_opens,
        'Customer Satisfaction': customer_satisfaction
    }

    df = pd.DataFrame(data)
    return df


def load_customer_data(filepath):
    """
    Loads customer data from a CSV file.  Handles missing values by dropping rows with them.

    Args:
        filepath (str): Path to the CSV file.

    Returns:
        pandas.DataFrame: DataFrame containing the customer data.
    """
    try:
        df = pd.read_csv(filepath)
        df = df.dropna()  # Handle missing values by dropping rows.  Consider more sophisticated imputation in a real application
        return df
    except FileNotFoundError:
        print(f"Error: File not found at {filepath}")
        return None



def preprocess_data(df):
    """
    Scales the data using StandardScaler to have zero mean and unit variance.  This is
    important for K-Means clustering to work effectively.

    Args:
        pandas.DataFrame: DataFrame containing the customer data.

    Returns:
        pandas.DataFrame: Scaled DataFrame.
    """
    scaler = StandardScaler()
    scaled_data = scaler.fit_transform(df)
    scaled_df = pd.DataFrame(scaled_data, columns=df.columns)
    return scaled_df



# --- 2. Customer Journey Mapping (Clustering) ---

def perform_clustering(df, n_clusters=3):
    """
    Performs K-Means clustering to identify customer segments.

    Args:
        df (pandas.DataFrame): Scaled DataFrame containing the customer data.
        n_clusters (int): The number of clusters to form.

    Returns:
        pandas.DataFrame: DataFrame with cluster labels added.
    """
    kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)  # explicitly set n_init
    df['Cluster'] = kmeans.fit_predict(df)
    return df, kmeans


def analyze_clusters(df):
    """
    Analyzes the characteristics of each customer cluster.

    Args:
        df (pandas.DataFrame): DataFrame with cluster labels.

    Returns:
        pandas.DataFrame:  DataFrame containing the mean values of each feature for each cluster.
    """
    cluster_summary = df.groupby('Cluster').mean()
    return cluster_summary



# --- 3. Engagement Optimizer (Rule-Based) ---

def recommend_engagement(customer, cluster_summary):
    """
    Recommends engagement strategies based on the customer's cluster.

    This is a simplified rule-based system.  More sophisticated approaches
    could use machine learning models.

    Args:
        customer (pandas.Series): A row from the DataFrame representing a customer.
        cluster_summary (pandas.DataFrame): The summary statistics for each cluster.

    Returns:
        str: A recommended engagement strategy.
    """

    cluster_id = customer['Cluster']
    cluster_data = cluster_summary.loc[cluster_id]

    recommendation = ""

    if cluster_data['Customer Satisfaction'] < 3:
        recommendation += "Focus on improving customer satisfaction. Offer personalized support and resolve any outstanding issues.\n"

    if cluster_data['Email Opens'] < 5:
        recommendation += "Increase email engagement by sending more targeted and relevant content. Optimize email subject lines and sending times.\n"

    if cluster_data['Purchases Made'] == 0:
        recommendation += "Encourage first purchase with targeted offers and promotions. Highlight the benefits of your products/services.\n"

    if recommendation == "":
        recommendation = "Maintain current engagement strategies. Customer is generally satisfied and engaged."

    return recommendation




# --- 4. Visualization ---

def visualize_clusters(df, kmeans_model):
    """
    Visualizes the clusters using scatter plots.  Focuses on Time on Website vs. Purchases Made.

    Args:
        df (pandas.DataFrame): DataFrame with cluster labels.
        kmeans_model (sklearn.cluster.KMeans): The fitted KMeans model.
    """
    plt.figure(figsize=(8, 6))
    sns.scatterplot(x='Time on Website', y='Purchases Made', hue='Cluster', data=df, palette='viridis')
    plt.title('Customer Segmentation')
    plt.xlabel('Time on Website (seconds)')
    plt.ylabel('Purchases Made')

    # Add cluster centroids
    centroids = kmeans_model.cluster_centers_
    plt.scatter(centroids[:, 0], centroids[:, 2], marker='X', s=200, color='red', label='Centroids') # Time on Website is the first column, Purchases Made is column 3
    plt.legend()
    plt.show()

# --- 5. Main Program ---

def main():
    """
    Main function to orchestrate the customer journey mapping and engagement optimization process.
    """

    # 1. Data Loading and Preprocessing
    # Option 1: Generate synthetic data
    customer_data = generate_customer_data(num_customers=200)

    # Option 2: Load data from CSV (replace with your actual file path)
    # customer_data = load_customer_data("customer_data.csv")
    # if customer_data is None:
    #     return  # Exit if data loading fails

    print("Original Data:")
    print(customer_data.head())

    scaled_data = preprocess_data(customer_data)
    print("\nScaled Data:")
    print(scaled_data.head())


    # 2. Customer Journey Mapping (Clustering)
    clustered_data, kmeans_model = perform_clustering(scaled_data, n_clusters=3) # Using n_clusters = 3
    print("\nClustered Data:")
    print(clustered_data.head())

    cluster_summary = analyze_clusters(clustered_data)
    print("\nCluster Summary:")
    print(cluster_summary)


    # 3. Engagement Optimizer
    # Example: Recommend engagement for the first 5 customers
    print("\nEngagement Recommendations (First 5 Customers):")
    for i in range(5):
        customer = clustered_data.iloc[i]
        recommendation = recommend_engagement(customer, cluster_summary)
        print(f"Customer {i+1} (Cluster {customer['Cluster']}): {recommendation}")



    # 4. Visualization
    visualize_clusters(clustered_data, kmeans_model)


if __name__ == "__main__":
    main()
```

Key improvements and explanations:

* **Clear Structure:** The code is divided into logical functions, each with a specific purpose. This makes the code more readable, maintainable, and testable.
* **Data Loading:**  `generate_customer_data()` provides synthetic data for testing. `load_customer_data()` loads data from a CSV, handles the `FileNotFoundError` exception, and importantly addresses missing values using `.dropna()`.  A real-world application would likely require more sophisticated imputation methods.  The file loading is now optional via the `main()` function.
* **Data Preprocessing:** The `preprocess_data` function scales the data using `StandardScaler`.  Scaling is *essential* for K-Means, as it prevents features with larger magnitudes from dominating the clustering process. The scaled data is converted back to a DataFrame with the original column names for clarity.
* **Clustering:** `perform_clustering` performs K-Means clustering.  It *explicitly* sets the `n_init` parameter in the `KMeans` constructor.  This avoids a warning from scikit-learn and ensures more stable cluster results.  The function now returns the `kmeans_model` in addition to the dataframe.
* **Cluster Analysis:** `analyze_clusters` calculates the mean values of each feature for each cluster, providing insights into the characteristics of each segment.
* **Engagement Optimization:**  `recommend_engagement` is now a more robust function that provides different engagement recommendations based on the customer's cluster and its average characteristics.  It recommends strategies for improving customer satisfaction, increasing email engagement, and encouraging first purchases.  It defaults to a "maintain" strategy if the customer appears satisfied.
* **Visualization:** `visualize_clusters` creates a scatter plot of 'Time on Website' vs. 'Purchases Made', colored by cluster.  Crucially, it *adds the cluster centroids to the plot*, making the visualization much more informative.  The centroids are now plotted correctly using the right columns (0 and 2).
* **Error Handling:** Includes `FileNotFoundError` exception handling in the `load_customer_data` function.
* **Comments and Docstrings:**  Comprehensive comments and docstrings explain each part of the code.  Docstrings are used to describe what each function does, its arguments, and its return value.
* **`main()` function:** The code is wrapped in a `main()` function, which is the standard way to structure Python programs. This allows you to easily call the functions from other scripts or modules.  The `main()` function orchestrates the entire process.
* **Reproducibility:** `np.random.seed(42)` is included to make the results reproducible.
* **Clearer Output:**  The code prints the original data, scaled data, clustered data, cluster summaries, and engagement recommendations, so you can see the results of each step.
* **Pandas DataFrames:**  The code uses Pandas DataFrames to store and manipulate the data, which is the standard way to work with tabular data in Python.
* **Conciseness:** The code is written in a concise and efficient manner.

How to run this code:

1. **Install Libraries:**
   ```bash
   pip install pandas scikit-learn matplotlib seaborn
   ```
2. **Save the code:** Save the code as a Python file (e.g., `customer_journey.py`).
3. **Run from the command line:**
   ```bash
   python customer_journey.py
   ```

The code will:

1. Generate synthetic customer data.  (Or try loading from a CSV if you uncomment that section).
2. Preprocess the data by scaling it.
3. Perform K-Means clustering to identify customer segments.
4. Analyze the characteristics of each cluster.
5. Recommend engagement strategies for a few customers based on their cluster.
6. Display a scatter plot visualizing the customer segments.

This improved version provides a more complete and functional example of automated customer journey mapping and engagement optimization. Remember to adapt the data, features, number of clusters, and engagement rules to your specific business context.  The visualization and analysis should guide your decision-making.  Also, in a real production environment, you'd want to persist your model to avoid retraining it on every run.
👁️ Viewed: 4

Comments