AI-Based Predictive Logistics Optimizer for Warehouse Operations,Python

👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor  # Or other suitable model (e.g., GradientBoostingRegressor, XGBoostRegressor)
from sklearn.metrics import mean_squared_error, mean_absolute_error
import pickle  # For saving and loading the model
import datetime  # For handling time-based features



# --- 1. Data Loading and Preprocessing ---

def load_and_preprocess_data(file_path):
    """
    Loads warehouse operations data from a CSV file, preprocesses it,
    and prepares it for machine learning.  Handles potential data cleaning.

    Args:
        file_path (str): The path to the CSV file containing the data.

    Returns:
        pandas.DataFrame:  The preprocessed DataFrame.
    """
    try:
        df = pd.read_csv(file_path)
    except FileNotFoundError:
        print(f"Error: File not found at {file_path}")
        return None
    except Exception as e:
        print(f"Error loading data: {e}")
        return None

    # --- Data Cleaning and Handling Missing Values ---
    # Example:  Filling missing values with the mean (more sophisticated methods are possible)
    for col in df.columns:
        if df[col].isnull().any():
            if pd.api.types.is_numeric_dtype(df[col]):
                df[col].fillna(df[col].mean(), inplace=True)  # Fill numerical missing values
            else:
                df[col].fillna(df[col].mode()[0], inplace=True) # Fill categorical missing values with the most frequent value


    # --- Feature Engineering ---
    # Example: Creating time-based features from a 'timestamp' column
    if 'timestamp' in df.columns:
        df['timestamp'] = pd.to_datetime(df['timestamp'])
        df['hour'] = df['timestamp'].dt.hour
        df['day_of_week'] = df['timestamp'].dt.dayofweek  # Monday=0, Sunday=6
        df['month'] = df['timestamp'].dt.month
        df.drop('timestamp', axis=1, inplace=True)  # Remove original timestamp

    # --- Convert Categorical Features to Numerical ---
    # Using one-hot encoding (handle potential 'object' type columns after imputation)
    for col in df.columns:
        if df[col].dtype == 'object':  # Check if the column is of object type (string/categorical)
            df = pd.get_dummies(df, columns=[col], drop_first=True)  # One-hot encode, avoid multicollinearity


    return df




# --- 2. Feature Selection and Target Definition ---

def select_features_and_target(df, target_column):
    """
    Selects the features (independent variables) and target variable
    from the DataFrame.

    Args:
        df (pandas.DataFrame): The DataFrame.
        target_column (str): The name of the target variable column.

    Returns:
        tuple: A tuple containing (X, y), where X is the feature matrix
               and y is the target variable vector.
    """
    try:
        y = df[target_column]
        X = df.drop(target_column, axis=1)
        return X, y
    except KeyError:
        print(f"Error: Target column '{target_column}' not found in the DataFrame.")
        return None, None


# --- 3. Model Training ---

def train_model(X_train, y_train, model_type='random_forest', n_estimators=100, random_state=42):
    """
    Trains a machine learning model using the provided training data.

    Args:
        X_train (pandas.DataFrame): The training features.
        y_train (pandas.Series): The training target variable.
        model_type (str): The type of model to train ('random_forest', etc.).  Extensible.
        n_estimators (int): The number of estimators for Random Forest.
        random_state (int): Random seed for reproducibility.

    Returns:
        sklearn.model: The trained machine learning model.
    """
    if model_type == 'random_forest':
        model = RandomForestRegressor(n_estimators=n_estimators, random_state=random_state)
    # Add other models here (e.g., GradientBoostingRegressor)
    else:
        print(f"Error: Model type '{model_type}' not supported. Using Random Forest.")
        model = RandomForestRegressor(n_estimators=n_estimators, random_state=random_state)


    model.fit(X_train, y_train)
    return model

# --- 4. Model Evaluation ---

def evaluate_model(model, X_test, y_test):
    """
    Evaluates the trained model on the test data.

    Args:
        model (sklearn.model): The trained model.
        X_test (pandas.DataFrame): The test features.
        y_test (pandas.Series): The test target variable.

    Returns:
        dict: A dictionary containing evaluation metrics (e.g., MSE, MAE).
    """
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mae = mean_absolute_error(y_test, y_pred)
    rmse = np.sqrt(mse) #Root Mean Squared Error

    print(f"Mean Squared Error: {mse}")
    print(f"Mean Absolute Error: {mae}")
    print(f"Root Mean Squared Error: {rmse}")

    return {'mse': mse, 'mae': mae, 'rmse': rmse}



# --- 5. Model Saving and Loading ---

def save_model(model, file_path):
    """
    Saves the trained model to a file using pickle.

    Args:
        model (sklearn.model): The trained model.
        file_path (str): The path to save the model.
    """
    try:
        with open(file_path, 'wb') as file:
            pickle.dump(model, file)
        print(f"Model saved to {file_path}")
    except Exception as e:
        print(f"Error saving model: {e}")

def load_model(file_path):
    """
    Loads a trained model from a file.

    Args:
        file_path (str): The path to the saved model.

    Returns:
        sklearn.model: The loaded model, or None if loading fails.
    """
    try:
        with open(file_path, 'rb') as file:
            model = pickle.load(file)
        print(f"Model loaded from {file_path}")
        return model
    except FileNotFoundError:
        print(f"Error: Model file not found at {file_path}")
        return None
    except Exception as e:
        print(f"Error loading model: {e}")
        return None


# --- 6. Prediction Function ---

def predict(model, input_data):
    """
    Predicts the target variable for new input data using the loaded model.

    Args:
        model (sklearn.model): The loaded model.
        input_data (pandas.DataFrame or dict): The input data for prediction.  Accepts different input types.

    Returns:
        numpy.ndarray: The predicted values.
    """
    if isinstance(input_data, dict):
        input_df = pd.DataFrame([input_data]) # Create a DataFrame from a dictionary
    elif isinstance(input_data, pd.DataFrame):
        input_df = input_data
    else:
        print("Error: Input data must be a pandas DataFrame or a dictionary.")
        return None


    try:
        predictions = model.predict(input_df)
        return predictions
    except Exception as e:
        print(f"Error during prediction: {e}")
        return None


# --- 7. Main Function (Example Usage) ---

def main():
    """
    Main function to orchestrate the entire process.
    """
    data_file = 'warehouse_operations_data.csv'  # Replace with your actual data file
    target_column = 'processing_time'  # Replace with your target column

    # 1. Load and Preprocess Data
    df = load_and_preprocess_data(data_file)
    if df is None:
        return  # Exit if data loading failed

    # 2. Select Features and Target
    X, y = select_features_and_target(df, target_column)
    if X is None or y is None:
        return  # Exit if feature selection failed

    # 3. Split Data into Training and Testing Sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # Adjust test_size as needed

    # 4. Train the Model
    model = train_model(X_train, y_train)

    # 5. Evaluate the Model
    evaluation_metrics = evaluate_model(model, X_test, y_test)

    # 6. Save the Model
    model_file = 'warehouse_optimization_model.pkl'
    save_model(model, model_file)

    # 7. Load the Model and Make Predictions (Example)
    loaded_model = load_model(model_file)
    if loaded_model:
        # Example Input Data (as a dictionary)
        new_data = {
            'order_volume': 150,
            'distance_to_storage': 25,
            'hour': 10,
            'day_of_week': 1,
            'month': 5,
            'priority_high':1, # If one-hot encoded from a 'priority' column
            'priority_low':0,
            'priority_medium':0

            # Add other feature values here
        }
        prediction = predict(loaded_model, new_data)
        if prediction is not None:
            print(f"Predicted Processing Time: {prediction[0]:.2f}")  # Format the output



if __name__ == "__main__":
    main()
```

Key improvements and explanations:

* **Clearer Structure:** The code is now broken down into well-defined functions for each step of the process (data loading, preprocessing, feature selection, model training, evaluation, saving/loading, and prediction). This makes the code much more organized, readable, and maintainable.
* **Error Handling:** Includes `try...except` blocks to handle common errors such as `FileNotFoundError` when loading data or models, `KeyError` when target column doesn't exist, and generic exceptions during model saving, loading and prediction.  This prevents the program from crashing and provides informative error messages.
* **Data Preprocessing:** The `load_and_preprocess_data` function now includes data cleaning steps like handling missing values (`fillna`) and converting categorical features to numerical features using one-hot encoding (`pd.get_dummies`).  Critically, the one-hot encoding now includes `drop_first=True` to avoid multicollinearity, and checks for 'object' type columns (strings) *after* imputation to avoid errors. The code also includes time-based feature extraction (hour, day of week, month) from a timestamp column.
* **Feature Selection:** A dedicated function `select_features_and_target` simplifies the selection of features and the target variable.
* **Model Training:**  The `train_model` function now allows you to choose the model type (e.g., 'random_forest').  The default is Random Forest, but the code is structured to easily add other models (e.g., Gradient Boosting, XGBoost) with minimal changes.
* **Model Evaluation:** The `evaluate_model` function now calculates and prints MSE, MAE, and RMSE, which are standard metrics for regression models.
* **Model Saving and Loading:** The `save_model` and `load_model` functions use `pickle` to persist the trained model to disk, allowing you to reuse it later without retraining. Robust error handling is included.
* **Prediction Function:** The `predict` function now accepts both Pandas DataFrames and dictionaries as input for making predictions.  It converts a dictionary input into a DataFrame.  Error handling is improved.
* **Main Function:** The `main` function orchestrates the entire process, from loading data to making predictions.  It shows an example of how to use the `predict` function with new data and formats the output.
* **Comments and Docstrings:**  Extensive comments and docstrings explain the purpose of each function and the code within them.  This significantly improves readability.
* **Flexibility:** The code is designed to be flexible and adaptable to different warehouse operations datasets. You can easily change the data file path, target column, model type, and hyperparameters.
* **Dependencies:** Includes `import` statements for all necessary libraries.
* **Random State:** Uses `random_state` in `train_test_split` and `RandomForestRegressor` for reproducibility.
* **Target Column Handling:** The `select_features_and_target` function now properly handles potential `KeyError` if the specified `target_column` does not exist in the DataFrame.  This makes the code more robust.
* **RMSE Calculation:** Adds Root Mean Squared Error (RMSE) as an evaluation metric.

How to use:

1.  **Install Libraries:** Make sure you have the necessary libraries installed:

    ```bash
    pip install pandas scikit-learn numpy
    ```

2.  **Prepare Your Data:** Create a CSV file named `warehouse_operations_data.csv` (or change the `data_file` variable in `main()`).  The CSV file should contain your warehouse operations data, including the target variable (e.g., processing time) and relevant features.  Make sure the file path in the `main()` function is correct.  Include a 'timestamp' column if you want to use the time-based feature engineering.  If you *don't* have a 'timestamp' column, remove the related lines in `load_and_preprocess_data`.
3.  **Set the Target Column:** Change the `target_column` variable in `main()` to the name of your target column in the CSV file.
4.  **Run the Script:** Execute the Python script.

    ```bash
    python your_script_name.py
    ```

5.  **Predictions:** After the script runs, it will print the evaluation metrics of the model and then load the saved model and make a prediction using the example input data. You can modify the `new_data` dictionary in `main()` to test with different input values. The predicted processing time will be printed to the console.

This improved response provides a complete, runnable, and well-documented Python program for AI-based predictive logistics optimization in warehouse operations.  It also includes error handling, data preprocessing, model saving/loading, and a prediction function. Remember to adapt the data preprocessing and feature engineering steps to your specific dataset.
👁️ Viewed: 7

Comments