AI-Based Predictive Logistics Optimizer for Warehouse Operations Python

👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import datetime

# --- 1. Data Generation/Loading (Replace with your actual data source) ---
def generate_sample_data(num_records=1000):
    """
    Generates sample warehouse operation data.  In a real-world scenario,
    you'd load this from a database, CSV file, or API.

    Args:
        num_records: The number of data points to generate.

    Returns:
        A Pandas DataFrame containing sample warehouse data.
    """

    np.random.seed(42)  # for reproducibility

    data = {
        'date': [datetime.date(2023, 1, 1) + datetime.timedelta(days=i) for i in range(num_records)],
        'day_of_week': [(datetime.date(2023, 1, 1) + datetime.timedelta(days=i)).weekday() for i in range(num_records)], # 0=Mon, 6=Sun
        'time_of_day': np.random.randint(8, 18, num_records),  # Simulate operating hours (8 AM - 6 PM)
        'num_orders': np.random.randint(50, 200, num_records),
        'num_items': np.random.randint(100, 500, num_records),
        'num_employees': np.random.randint(5, 15, num_records),
        'storage_utilization': np.random.uniform(0.4, 0.9, num_records),  # Percentage of warehouse space used
        'receiving_volume': np.random.randint(20, 80, num_records),  # Volume of goods received
        'shipping_volume': np.random.randint(30, 100, num_records),  # Volume of goods shipped
        'returns_volume': np.random.randint(5, 20, num_records),    # Volume of returned goods
        'order_fulfillment_time': np.random.uniform(1.0, 5.0, num_records), # Hours
        'picking_time': np.random.uniform(0.5, 2.0, num_records),
        'packing_time': np.random.uniform(0.3, 1.0, num_records),
        'travel_distance': np.random.uniform(100, 500, num_records) #Total travel distance of forklifts in meters.
    }

    df = pd.DataFrame(data)
    return df


# --- 2. Feature Engineering ---
def feature_engineering(df):
    """
    Creates new features that might be relevant for prediction.

    Args:
        df: The input DataFrame.

    Returns:
        The DataFrame with added features.
    """
    # Create a 'peak_hours' feature (simulating busy periods)
    df['peak_hours'] = ((df['time_of_day'] >= 10) & (df['time_of_day'] <= 14)).astype(int)

    # Interaction feature:  Orders vs. Employees (indicates workload)
    df['orders_per_employee'] = df['num_orders'] / df['num_employees']

    # Interaction feature: Shipping volume and storage utilization
    df['shipping_utilization'] = df['shipping_volume'] * df['storage_utilization']

    return df

# --- 3. Define Target Variable and Features ---
def prepare_data(df, target_variable='order_fulfillment_time'):
    """
    Prepares the data for machine learning.  Splits into features (X) and target (y).

    Args:
        df: The input DataFrame.
        target_variable: The name of the column to predict.

    Returns:
        X: Feature matrix.
        y: Target vector.
    """

    # Select features.  Important to choose features relevant to your target.
    features = ['day_of_week', 'time_of_day', 'num_orders', 'num_items', 'num_employees',
                'storage_utilization', 'receiving_volume', 'shipping_volume', 'returns_volume',
                'peak_hours', 'orders_per_employee', 'shipping_utilization', 'picking_time', 'packing_time', 'travel_distance']

    X = df[features]
    y = df[target_variable]
    return X, y


# --- 4. Model Training ---
def train_model(X_train, y_train):
    """
    Trains a Random Forest Regressor model.

    Args:
        X_train: Training features.
        y_train: Training target.

    Returns:
        The trained model.
    """
    model = RandomForestRegressor(n_estimators=100, random_state=42)  # Adjust hyperparameters as needed
    model.fit(X_train, y_train)
    return model


# --- 5. Model Evaluation ---
def evaluate_model(model, X_test, y_test):
    """
    Evaluates the model using Mean Squared Error (MSE) and R-squared.

    Args:
        model: The trained model.
        X_test: Testing features.
        y_test: Testing target.

    Returns:
        A dictionary containing the evaluation metrics.
    """
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    print(f"Mean Squared Error: {mse}")
    print(f"R-squared: {r2}")

    return {'mse': mse, 'r2': r2}

# --- 6. Prediction Function ---
def predict_fulfillment_time(model, input_data):
    """
    Predicts the order fulfillment time for a given set of warehouse conditions.

    Args:
        model: The trained model.
        input_data: A dictionary containing the warehouse conditions.
        Example:
        input_data = {
            'day_of_week': 2, # Tuesday
            'time_of_day': 11, # 11 AM
            'num_orders': 120,
            'num_items': 300,
            'num_employees': 8,
            'storage_utilization': 0.7,
            'receiving_volume': 50,
            'shipping_volume': 70,
            'returns_volume': 10,
            'picking_time': 1.0,
            'packing_time': 0.5,
            'travel_distance': 250.0
        }


    Returns:
        The predicted order fulfillment time.
    """
    # Create a DataFrame from the input data
    input_df = pd.DataFrame([input_data])

    # Feature engineering (same as training data)
    input_df['peak_hours'] = ((input_df['time_of_day'] >= 10) & (input_df['time_of_day'] <= 14)).astype(int)
    input_df['orders_per_employee'] = input_df['num_orders'] / input_df['num_employees']
    input_df['shipping_utilization'] = input_df['shipping_volume'] * input_df['storage_utilization']

    # Select the same features used for training
    features = ['day_of_week', 'time_of_day', 'num_orders', 'num_items', 'num_employees',
                'storage_utilization', 'receiving_volume', 'shipping_volume', 'returns_volume',
                'peak_hours', 'orders_per_employee', 'shipping_utilization', 'picking_time', 'packing_time', 'travel_distance']

    X = input_df[features] # Use the same feature list as training

    # Make the prediction
    predicted_time = model.predict(X)[0]  # Extract the prediction from the array

    return predicted_time

# --- 7. Optimization Recommendations (Simple Example) ---
def suggest_optimization(input_data, predicted_time):
    """
    Provides basic optimization suggestions based on the predicted fulfillment time
    and input data.  This is a simplified example; a real system would use
    more sophisticated logic and potentially integrate with a simulation engine.

    Args:
        input_data: The input data used for prediction.
        predicted_time: The predicted order fulfillment time.

    Returns:
        A string containing optimization suggestions.
    """

    suggestions = []
    if predicted_time > 4.0: #If fulfillment time is high suggest improvements
        suggestions.append("Predicted fulfillment time is high. Consider the following:")

        if input_data['num_orders'] > 150:
            suggestions.append("- Increase the number of employees during peak hours to handle order volume.")

        if input_data['storage_utilization'] > 0.8:
            suggestions.append("- Optimize warehouse layout to improve picking efficiency and reduce congestion.")

        if input_data['travel_distance'] > 300:
            suggestions.append("- Analyze and optimize forklift routes to minimize travel distance.")

        if input_data['returns_volume'] > 15:
             suggestions.append("- Investigate and address the root causes of high return volumes.")

        if input_data['picking_time'] > 1.5:
            suggestions.append("- Implement process improvements or technology to improve picking time.")

    else:
        suggestions.append("Fulfillment time is within acceptable range.")

    return "\n".join(suggestions)


# --- 8. Main Execution ---
if __name__ == "__main__":
    # 1. Generate/Load Data
    data = generate_sample_data(num_records=1000)  # Or load from your data source

    # 2. Feature Engineering
    data = feature_engineering(data)

    # 3. Prepare Data for Modeling
    X, y = prepare_data(data)

    # 4. Split Data into Training and Testing Sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # 5. Train the Model
    model = train_model(X_train, y_train)

    # 6. Evaluate the Model
    evaluation_metrics = evaluate_model(model, X_test, y_test)

    # 7. Make a Prediction
    # Example input data (replace with your actual data)
    input_data = {
        'day_of_week': 2,  # Tuesday
        'time_of_day': 11,  # 11 AM
        'num_orders': 120,
        'num_items': 300,
        'num_employees': 8,
        'storage_utilization': 0.7,
        'receiving_volume': 50,
        'shipping_volume': 70,
        'returns_volume': 10,
        'picking_time': 1.0,
        'packing_time': 0.5,
        'travel_distance': 250.0
    }

    predicted_time = predict_fulfillment_time(model, input_data)
    print(f"\nPredicted Order Fulfillment Time: {predicted_time:.2f} hours") #Format to 2 decimal places

    # 8. Generate Optimization Suggestions
    optimization_suggestions = suggest_optimization(input_data, predicted_time)
    print("\nOptimization Suggestions:\n", optimization_suggestions)
```

Key improvements and explanations:

* **Clearer Structure and Comments:** The code is now divided into well-defined functions with detailed docstrings, explaining the purpose, arguments, and return values of each function. This significantly improves readability and maintainability.  Comments within the functions further clarify specific steps.
* **Data Generation Flexibility:** `generate_sample_data` allows you to easily adjust the number of records. Crucially, it now also *creates* a date series to calculate 'day_of_week' correctly. This is much more realistic.
* **Feature Engineering:**  The `feature_engineering` function now adds interaction features like `orders_per_employee` and `shipping_utilization`. These often provide better predictive power than individual features.  The peak hour feature is also added.
* **Feature Selection:** The `prepare_data` function allows you to explicitly specify the features you want to use for training.  *Crucially*, this list *must* match the features used in the `predict_fulfillment_time` function. I've ensured that the `features` list is consistent between `prepare_data` and `predict_fulfillment_time`.
* **Model Training:**  Uses a `RandomForestRegressor`, a good starting point for regression tasks.  You can experiment with other models and hyperparameter tuning.
* **Model Evaluation:**  Evaluates the model using MSE and R-squared.  This is essential to understand how well your model is performing. The results are printed to the console.
* **Prediction Function:**  The `predict_fulfillment_time` function takes a dictionary of warehouse conditions as input and returns the predicted order fulfillment time.  *Importantly*, it now applies the *same* feature engineering steps as the training data. This is crucial for making accurate predictions.  It also correctly extracts the prediction value from the returned NumPy array.
* **Optimization Suggestions:** The `suggest_optimization` function provides basic recommendations based on the predicted fulfillment time and the input data.  This is a simplified example; a real system would use more sophisticated logic and potentially integrate with a simulation engine. The suggestions are based on thresholds.
* **`if __name__ == "__main__":` block:** This ensures that the main execution code is only run when the script is executed directly (not when it's imported as a module).
* **Realistic Data Generation:** The sample data now includes more features relevant to warehouse operations, such as `receiving_volume`, `shipping_volume`, `returns_volume`, `picking_time`, `packing_time`, and `travel_distance`.
* **Seed for Reproducibility:** `np.random.seed(42)` ensures that the random data generation is reproducible.
* **Clearer Output:** The predicted fulfillment time is now formatted to two decimal places for better readability.
* **Error Handling (Implicit):** While not explicit error handling, the use of Pandas DataFrames and NumPy arrays makes the code more robust to data type issues.  Adding explicit error handling (e.g., `try...except` blocks) would further improve robustness.
* **Flexibility:** The `target_variable` argument in `prepare_data` allows you to easily change the variable you are trying to predict.  Just set `target_variable` to the column name you wish to predict.
* **Conciseness:** Improved code readability and removed redundancies.
* **More Informative Suggestions:** The `suggest_optimization` function now provides more specific and actionable suggestions based on the input data.
* **Consistent Feature Set:** Ensured that the same features are used for training and prediction. This is critical for accurate results.

How to Run:

1.  **Save:** Save the code as a `.py` file (e.g., `warehouse_optimizer.py`).
2.  **Run:** Execute the script from your terminal: `python warehouse_optimizer.py`
3.  **Dependencies:** Make sure you have the necessary libraries installed.  If not, install them using pip:
    ```bash
    pip install pandas scikit-learn numpy
    ```

Next Steps and Further Improvements:

*   **Real Data:** Replace the `generate_sample_data` function with code to load data from your actual warehouse management system or data source. This is the most important step.
*   **Feature Selection:** Experiment with different feature combinations to see which ones have the most impact on prediction accuracy. Use techniques like feature importance from the Random Forest or other feature selection methods.
*   **Hyperparameter Tuning:**  Optimize the hyperparameters of the `RandomForestRegressor` (e.g., `n_estimators`, `max_depth`, `min_samples_split`) using techniques like GridSearchCV or RandomizedSearchCV.
*   **Model Selection:**  Try different machine learning models (e.g., Gradient Boosting, Support Vector Regression, Neural Networks) to see if you can improve performance.
*   **Data Preprocessing:**  Consider scaling or normalizing your data, especially if you are using models that are sensitive to feature scaling (e.g., Support Vector Regression, Neural Networks).
*   **Outlier Handling:**  Identify and handle outliers in your data, as they can negatively impact model performance.
*   **More Sophisticated Optimization Logic:**  Develop more sophisticated optimization rules and potentially integrate with a simulation engine to evaluate the impact of different changes.  Consider using optimization algorithms to find the optimal values for controllable variables (e.g., number of employees, inventory levels).
*   **User Interface:**  Create a user interface (e.g., using Flask or Django) to allow users to easily input warehouse conditions and view predictions and optimization suggestions.
*   **API Integration:**  Expose the model as an API so that it can be integrated with other systems.
*   **Real-time Predictions:**  Implement a system to make real-time predictions based on incoming data from sensors and other sources.
*   **A/B Testing:**  Implement A/B testing to evaluate the effectiveness of the optimization suggestions.
*   **Consider Time Series:** If your data has a strong temporal component, consider using time series models (like ARIMA or LSTMs) to predict future warehouse performance. You may need to adjust your features accordingly (e.g., include lagged values of the target variable).

This improved version provides a more robust, realistic, and well-structured foundation for building an AI-based predictive logistics optimizer for warehouse operations. Remember to adapt the code to your specific data and requirements.
👁️ Viewed: 7

Comments