AI-Based Predictive Maintenance Tool for Wind Turbines R

👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# --- 1. Data Generation (Simulated Wind Turbine Data) ---
#  In a real application, this would read data from sensors or a database.
def generate_wind_turbine_data(num_samples=1000):
    """
    Generates simulated wind turbine data for predictive maintenance.

    Args:
        num_samples: The number of data points to generate.

    Returns:
        A Pandas DataFrame containing the simulated data.
    """
    np.random.seed(42)  # for reproducibility

    # Features
    turbine_id = np.random.randint(1, 11, num_samples)  # 10 turbines
    ambient_temperature = np.random.normal(20, 5, num_samples)  # Celsius
    wind_speed = np.random.normal(12, 3, num_samples)  # m/s
    generator_speed = np.random.normal(1500, 100, num_samples)  # RPM
    blade_pitch_angle = np.random.normal(5, 2, num_samples)  # degrees
    bearing_temperature = np.random.normal(70, 10, num_samples)  # Celsius
    vibration = np.random.normal(0.5, 0.1, num_samples) # mm/s RMS vibration

    # Target Variable (Rotor Bearing Remaining Useful Life - RUL)
    # RUL is simulated based on a combination of the factors. This is a simplified model.
    rul = 100 - (0.5 * wind_speed + 0.2 * bearing_temperature + 0.1 * vibration + np.random.normal(0, 5, num_samples))
    rul = np.clip(rul, 0, 100)  # Ensure RUL is within 0-100 range

    data = pd.DataFrame({
        'turbine_id': turbine_id,
        'ambient_temperature': ambient_temperature,
        'wind_speed': wind_speed,
        'generator_speed': generator_speed,
        'blade_pitch_angle': blade_pitch_angle,
        'bearing_temperature': bearing_temperature,
        'vibration': vibration,
        'rul': rul  # Remaining Useful Life (Target Variable)
    })

    return data

# --- 2. Data Loading and Preprocessing ---
data = generate_wind_turbine_data()
print("Sample Data:")
print(data.head())
print("\nData Summary:")
print(data.describe()) #Quick statistics

# --- 3. Feature Engineering (Optional, but often helpful) ---
# Example: Creating a combined feature
data['power_output'] = data['wind_speed'] * data['generator_speed']  #Simple proxy for power.  A real model would be more sophisticated.

# --- 4. Data Splitting ---
X = data.drop('rul', axis=1)  # Features
y = data['rul']  # Target variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --- 5. Model Training ---
# Choose a suitable model for regression (predicting a continuous value)
# RandomForestRegressor is a good starting point
model = RandomForestRegressor(n_estimators=100, random_state=42)  # Adjust hyperparameters as needed

model.fit(X_train, y_train)

# --- 6. Model Evaluation ---
y_pred = model.predict(X_test)

# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("\nModel Evaluation:")
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")


# --- 7. Visualization (Optional) ---
# Scatter plot of actual vs. predicted RUL
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.xlabel("Actual RUL")
plt.ylabel("Predicted RUL")
plt.title("Actual vs. Predicted RUL")
plt.plot([0, 100], [0, 100], 'r--')  # Diagonal line for perfect prediction
plt.show()

# Feature Importance (helps understand which features are most predictive)
feature_importances = model.feature_importances_
feature_names = X.columns
importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importances})
importance_df = importance_df.sort_values('Importance', ascending=False)
print("\nFeature Importance:")
print(importance_df)

plt.figure(figsize=(10, 6))
plt.bar(importance_df['Feature'], importance_df['Importance'])
plt.xticks(rotation=45, ha="right")
plt.xlabel("Feature")
plt.ylabel("Importance")
plt.title("Feature Importance")
plt.tight_layout()
plt.show()

# --- 8. Prediction on New Data (Example) ---
def predict_rul(model, turbine_data):
  """
  Predicts the RUL for a new turbine based on its sensor data.

  Args:
      model: The trained machine learning model.
      turbine_data: A dictionary or Pandas Series containing the turbine's sensor data.

  Returns:
      The predicted RUL for the turbine.
  """

  #Convert the turbine data into a dataframe that can be used for predicting
  turbine_df = pd.DataFrame([turbine_data])

  #Ensure that the input data has the same features as the training data
  #This might need to be adapted if feature engineering is part of the data preprocessing
  turbine_df = turbine_df[X.columns]

  rul_prediction = model.predict(turbine_df)[0] #Returns a numpy array, we want the first value
  return rul_prediction

# Example usage:
new_turbine_data = {
    'turbine_id': 3,
    'ambient_temperature': 22,
    'wind_speed': 14,
    'generator_speed': 1600,
    'blade_pitch_angle': 6,
    'bearing_temperature': 75,
    'vibration': 0.6,
    'power_output': 0  #Dummy value, it will be ignored because X.columns is used in predict_rul function.
}

predicted_rul = predict_rul(model, new_turbine_data)
print(f"\nPredicted RUL for new turbine: {predicted_rul}")

# --- 9. Saving the Model (Optional) ---
# import joblib #scikit-learn's recommended way to save models

# filename = 'wind_turbine_rul_model.joblib'
# joblib.dump(model, filename) # save the model to disk
# print(f"\nModel saved to {filename}")

# Later load like this:
# loaded_model = joblib.load(filename)
```

Key improvements and explanations are included in the comments in the code, but let's summarize here:

* **Clearer Structure:** The code is broken down into logical sections: Data Generation, Preprocessing, Model Training, Evaluation, Visualization, and Prediction.  This makes it easier to understand and maintain.

* **Data Generation:** The `generate_wind_turbine_data` function simulates data.  In a real-world application, this would be replaced with loading data from a database, sensors, or files.  Important:  The RUL simulation is simplified; a real predictive maintenance system requires much more sophisticated models of failure modes. The random seed ensures repeatability.

* **Feature Engineering:** Added a 'power_output' feature as an example of how you might combine existing features to create more informative ones.

* **Data Splitting:** Splits the data into training and testing sets to evaluate the model's performance on unseen data.

* **Model Selection:** Uses `RandomForestRegressor`, a good general-purpose algorithm for regression problems. Hyperparameters like `n_estimators` can be tuned for better performance.

* **Model Evaluation:** Calculates Mean Squared Error (MSE) and R-squared to assess the model's accuracy.

* **Visualization:**  Includes a scatter plot of actual vs. predicted RUL and a bar chart of feature importances. This helps to understand the model's performance and identify the most important factors.

* **Prediction on New Data:** The `predict_rul` function demonstrates how to use the trained model to predict RUL for new wind turbine data. It uses the same features that were used to train the model. This is critical!  It also takes care to ensure the input data matches the model's expected input structure.

* **Saving the Model:**  Added code to save the trained model to a file using `joblib`. This allows you to load and reuse the model later without retraining. This is crucial for deployment.

* **Comments and Explanations:**  Extensive comments are added to explain each step of the process.

* **Error Handling (Important Consideration):**  While the code runs, it *lacks* explicit error handling.  In a real system, you'd need to add `try...except` blocks to handle potential issues such as missing data, invalid data types, and model errors.

* **Scalability:** This is a single-machine implementation.  For a large number of wind turbines, consider using distributed computing frameworks like Spark or Dask to handle the data and training.

* **Model Deployment:**  This code shows how to train and evaluate a model.  Deploying it to a real-time system would require additional steps, such as creating an API endpoint or integrating it with a monitoring system.

* **Data Validation:**  Before feeding data to the model, you should validate it to ensure it's within acceptable ranges and formats. This prevents unexpected errors.

This revised response provides a more complete and robust foundation for building an AI-based predictive maintenance tool for wind turbines.  Remember to adapt and extend the code based on the specific requirements of your application.
👁️ Viewed: 5

Comments