AI-Based Predictive Logistics Optimizer for Warehouse Operations,Python
👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor # Or other suitable model (e.g., GradientBoostingRegressor, XGBoostRegressor)
from sklearn.metrics import mean_squared_error, mean_absolute_error
import pickle # For saving and loading the model
import datetime # For handling time-based features
# --- 1. Data Loading and Preprocessing ---
def load_and_preprocess_data(file_path):
"""
Loads warehouse operations data from a CSV file, preprocesses it,
and prepares it for machine learning. Handles potential data cleaning.
Args:
file_path (str): The path to the CSV file containing the data.
Returns:
pandas.DataFrame: The preprocessed DataFrame.
"""
try:
df = pd.read_csv(file_path)
except FileNotFoundError:
print(f"Error: File not found at {file_path}")
return None
except Exception as e:
print(f"Error loading data: {e}")
return None
# --- Data Cleaning and Handling Missing Values ---
# Example: Filling missing values with the mean (more sophisticated methods are possible)
for col in df.columns:
if df[col].isnull().any():
if pd.api.types.is_numeric_dtype(df[col]):
df[col].fillna(df[col].mean(), inplace=True) # Fill numerical missing values
else:
df[col].fillna(df[col].mode()[0], inplace=True) # Fill categorical missing values with the most frequent value
# --- Feature Engineering ---
# Example: Creating time-based features from a 'timestamp' column
if 'timestamp' in df.columns:
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['hour'] = df['timestamp'].dt.hour
df['day_of_week'] = df['timestamp'].dt.dayofweek # Monday=0, Sunday=6
df['month'] = df['timestamp'].dt.month
df.drop('timestamp', axis=1, inplace=True) # Remove original timestamp
# --- Convert Categorical Features to Numerical ---
# Using one-hot encoding (handle potential 'object' type columns after imputation)
for col in df.columns:
if df[col].dtype == 'object': # Check if the column is of object type (string/categorical)
df = pd.get_dummies(df, columns=[col], drop_first=True) # One-hot encode, avoid multicollinearity
return df
# --- 2. Feature Selection and Target Definition ---
def select_features_and_target(df, target_column):
"""
Selects the features (independent variables) and target variable
from the DataFrame.
Args:
df (pandas.DataFrame): The DataFrame.
target_column (str): The name of the target variable column.
Returns:
tuple: A tuple containing (X, y), where X is the feature matrix
and y is the target variable vector.
"""
try:
y = df[target_column]
X = df.drop(target_column, axis=1)
return X, y
except KeyError:
print(f"Error: Target column '{target_column}' not found in the DataFrame.")
return None, None
# --- 3. Model Training ---
def train_model(X_train, y_train, model_type='random_forest', n_estimators=100, random_state=42):
"""
Trains a machine learning model using the provided training data.
Args:
X_train (pandas.DataFrame): The training features.
y_train (pandas.Series): The training target variable.
model_type (str): The type of model to train ('random_forest', etc.). Extensible.
n_estimators (int): The number of estimators for Random Forest.
random_state (int): Random seed for reproducibility.
Returns:
sklearn.model: The trained machine learning model.
"""
if model_type == 'random_forest':
model = RandomForestRegressor(n_estimators=n_estimators, random_state=random_state)
# Add other models here (e.g., GradientBoostingRegressor)
else:
print(f"Error: Model type '{model_type}' not supported. Using Random Forest.")
model = RandomForestRegressor(n_estimators=n_estimators, random_state=random_state)
model.fit(X_train, y_train)
return model
# --- 4. Model Evaluation ---
def evaluate_model(model, X_test, y_test):
"""
Evaluates the trained model on the test data.
Args:
model (sklearn.model): The trained model.
X_test (pandas.DataFrame): The test features.
y_test (pandas.Series): The test target variable.
Returns:
dict: A dictionary containing evaluation metrics (e.g., MSE, MAE).
"""
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mse) #Root Mean Squared Error
print(f"Mean Squared Error: {mse}")
print(f"Mean Absolute Error: {mae}")
print(f"Root Mean Squared Error: {rmse}")
return {'mse': mse, 'mae': mae, 'rmse': rmse}
# --- 5. Model Saving and Loading ---
def save_model(model, file_path):
"""
Saves the trained model to a file using pickle.
Args:
model (sklearn.model): The trained model.
file_path (str): The path to save the model.
"""
try:
with open(file_path, 'wb') as file:
pickle.dump(model, file)
print(f"Model saved to {file_path}")
except Exception as e:
print(f"Error saving model: {e}")
def load_model(file_path):
"""
Loads a trained model from a file.
Args:
file_path (str): The path to the saved model.
Returns:
sklearn.model: The loaded model, or None if loading fails.
"""
try:
with open(file_path, 'rb') as file:
model = pickle.load(file)
print(f"Model loaded from {file_path}")
return model
except FileNotFoundError:
print(f"Error: Model file not found at {file_path}")
return None
except Exception as e:
print(f"Error loading model: {e}")
return None
# --- 6. Prediction Function ---
def predict(model, input_data):
"""
Predicts the target variable for new input data using the loaded model.
Args:
model (sklearn.model): The loaded model.
input_data (pandas.DataFrame or dict): The input data for prediction. Accepts different input types.
Returns:
numpy.ndarray: The predicted values.
"""
if isinstance(input_data, dict):
input_df = pd.DataFrame([input_data]) # Create a DataFrame from a dictionary
elif isinstance(input_data, pd.DataFrame):
input_df = input_data
else:
print("Error: Input data must be a pandas DataFrame or a dictionary.")
return None
try:
predictions = model.predict(input_df)
return predictions
except Exception as e:
print(f"Error during prediction: {e}")
return None
# --- 7. Main Function (Example Usage) ---
def main():
"""
Main function to orchestrate the entire process.
"""
data_file = 'warehouse_operations_data.csv' # Replace with your actual data file
target_column = 'processing_time' # Replace with your target column
# 1. Load and Preprocess Data
df = load_and_preprocess_data(data_file)
if df is None:
return # Exit if data loading failed
# 2. Select Features and Target
X, y = select_features_and_target(df, target_column)
if X is None or y is None:
return # Exit if feature selection failed
# 3. Split Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Adjust test_size as needed
# 4. Train the Model
model = train_model(X_train, y_train)
# 5. Evaluate the Model
evaluation_metrics = evaluate_model(model, X_test, y_test)
# 6. Save the Model
model_file = 'warehouse_optimization_model.pkl'
save_model(model, model_file)
# 7. Load the Model and Make Predictions (Example)
loaded_model = load_model(model_file)
if loaded_model:
# Example Input Data (as a dictionary)
new_data = {
'order_volume': 150,
'distance_to_storage': 25,
'hour': 10,
'day_of_week': 1,
'month': 5,
'priority_high':1, # If one-hot encoded from a 'priority' column
'priority_low':0,
'priority_medium':0
# Add other feature values here
}
prediction = predict(loaded_model, new_data)
if prediction is not None:
print(f"Predicted Processing Time: {prediction[0]:.2f}") # Format the output
if __name__ == "__main__":
main()
```
Key improvements and explanations:
* **Clearer Structure:** The code is now broken down into well-defined functions for each step of the process (data loading, preprocessing, feature selection, model training, evaluation, saving/loading, and prediction). This makes the code much more organized, readable, and maintainable.
* **Error Handling:** Includes `try...except` blocks to handle common errors such as `FileNotFoundError` when loading data or models, `KeyError` when target column doesn't exist, and generic exceptions during model saving, loading and prediction. This prevents the program from crashing and provides informative error messages.
* **Data Preprocessing:** The `load_and_preprocess_data` function now includes data cleaning steps like handling missing values (`fillna`) and converting categorical features to numerical features using one-hot encoding (`pd.get_dummies`). Critically, the one-hot encoding now includes `drop_first=True` to avoid multicollinearity, and checks for 'object' type columns (strings) *after* imputation to avoid errors. The code also includes time-based feature extraction (hour, day of week, month) from a timestamp column.
* **Feature Selection:** A dedicated function `select_features_and_target` simplifies the selection of features and the target variable.
* **Model Training:** The `train_model` function now allows you to choose the model type (e.g., 'random_forest'). The default is Random Forest, but the code is structured to easily add other models (e.g., Gradient Boosting, XGBoost) with minimal changes.
* **Model Evaluation:** The `evaluate_model` function now calculates and prints MSE, MAE, and RMSE, which are standard metrics for regression models.
* **Model Saving and Loading:** The `save_model` and `load_model` functions use `pickle` to persist the trained model to disk, allowing you to reuse it later without retraining. Robust error handling is included.
* **Prediction Function:** The `predict` function now accepts both Pandas DataFrames and dictionaries as input for making predictions. It converts a dictionary input into a DataFrame. Error handling is improved.
* **Main Function:** The `main` function orchestrates the entire process, from loading data to making predictions. It shows an example of how to use the `predict` function with new data and formats the output.
* **Comments and Docstrings:** Extensive comments and docstrings explain the purpose of each function and the code within them. This significantly improves readability.
* **Flexibility:** The code is designed to be flexible and adaptable to different warehouse operations datasets. You can easily change the data file path, target column, model type, and hyperparameters.
* **Dependencies:** Includes `import` statements for all necessary libraries.
* **Random State:** Uses `random_state` in `train_test_split` and `RandomForestRegressor` for reproducibility.
* **Target Column Handling:** The `select_features_and_target` function now properly handles potential `KeyError` if the specified `target_column` does not exist in the DataFrame. This makes the code more robust.
* **RMSE Calculation:** Adds Root Mean Squared Error (RMSE) as an evaluation metric.
How to use:
1. **Install Libraries:** Make sure you have the necessary libraries installed:
```bash
pip install pandas scikit-learn numpy
```
2. **Prepare Your Data:** Create a CSV file named `warehouse_operations_data.csv` (or change the `data_file` variable in `main()`). The CSV file should contain your warehouse operations data, including the target variable (e.g., processing time) and relevant features. Make sure the file path in the `main()` function is correct. Include a 'timestamp' column if you want to use the time-based feature engineering. If you *don't* have a 'timestamp' column, remove the related lines in `load_and_preprocess_data`.
3. **Set the Target Column:** Change the `target_column` variable in `main()` to the name of your target column in the CSV file.
4. **Run the Script:** Execute the Python script.
```bash
python your_script_name.py
```
5. **Predictions:** After the script runs, it will print the evaluation metrics of the model and then load the saved model and make a prediction using the example input data. You can modify the `new_data` dictionary in `main()` to test with different input values. The predicted processing time will be printed to the console.
This improved response provides a complete, runnable, and well-documented Python program for AI-based predictive logistics optimization in warehouse operations. It also includes error handling, data preprocessing, model saving/loading, and a prediction function. Remember to adapt the data preprocessing and feature engineering steps to your specific dataset.
👁️ Viewed: 7
Comments