AI-Based Predictive Maintenance Tool for Wind Turbines,R

👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt

# --- 1. Data Simulation (Replace with actual data loading) ---

def simulate_wind_turbine_data(n_samples=1000):
    """
    Simulates wind turbine sensor data for demonstration purposes.

    Returns:
        pandas.DataFrame: A DataFrame containing simulated sensor data and a 'failure' column.
    """
    np.random.seed(42)  # for reproducibility

    data = {
        'rotor_speed': np.random.normal(15, 2, n_samples),
        'generator_temp': np.random.normal(70, 5, n_samples),
        'ambient_temp': np.random.normal(20, 3, n_samples),
        'wind_speed': np.random.normal(12, 4, n_samples),
        'power_output': np.random.normal(1500, 200, n_samples),
        'vibration': np.random.normal(0.5, 0.1, n_samples)
    }

    df = pd.DataFrame(data)

    # Simulate failures based on certain conditions (this is a simplified model)
    # In a real scenario, this would be based on historical failure data and domain expertise.
    failure_probability = np.zeros(n_samples)
    failure_probability[(df['rotor_speed'] < 8) | (df['generator_temp'] > 80) | (df['vibration'] > 0.7)] = 0.8
    failure_probability[(df['wind_speed'] > 20) & (df['power_output'] < 500)] = 0.7
    failure_probability = np.clip(failure_probability + np.random.normal(0, 0.05, n_samples), 0, 1)  # Add some noise

    df['failure'] = np.random.binomial(1, failure_probability)

    return df


# --- 2. Data Loading and Preprocessing (Adapt to your data source) ---

# Load data (replace with your actual data loading mechanism)
df = simulate_wind_turbine_data(n_samples=1000)
print("Sample Data:")
print(df.head())


# Data Exploration (Optional, but highly recommended for understanding your data)
print("\nData Summary:")
print(df.describe())

print("\nFailure Class Distribution:")
print(df['failure'].value_counts())


# Feature Selection (Choose relevant features)
features = ['rotor_speed', 'generator_temp', 'ambient_temp', 'wind_speed', 'power_output', 'vibration']
X = df[features]
y = df['failure']  # 'failure' column as the target variable


# --- 3. Data Splitting ---

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("\nTraining set size:", len(X_train))
print("Test set size:", len(X_test))


# --- 4. Model Training (Random Forest Classifier) ---

# Initialize the Random Forest Classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)  # Adjust hyperparameters as needed
# n_estimators: Number of trees in the forest.  More trees generally improve performance, but increase training time.
# random_state:  For reproducibility.  Sets the seed for the random number generator used by the algorithm.

# Train the model
model.fit(X_train, y_train)

print("\nModel Training Complete!")


# --- 5. Model Evaluation ---

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy: {accuracy:.4f}")

# Detailed classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


# --- 6. Feature Importance (Optional) ---

feature_importances = model.feature_importances_

# Create a DataFrame to display feature importances
feature_importance_df = pd.DataFrame({'Feature': features, 'Importance': feature_importances})
feature_importance_df = feature_importance_df.sort_values('Importance', ascending=False)

print("\nFeature Importances:")
print(feature_importance_df)

# Plot feature importances
plt.figure(figsize=(10, 6))
plt.barh(feature_importance_df['Feature'], feature_importance_df['Importance'])
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('Feature Importances')
plt.show()


# --- 7.  Prediction on New Data (Simulated here, replace with real-time data) ---

def predict_failure(model, rotor_speed, generator_temp, ambient_temp, wind_speed, power_output, vibration):
    """
    Predicts the probability of failure based on sensor inputs.

    Args:
        model: Trained machine learning model.
        rotor_speed: Rotor speed reading.
        generator_temp: Generator temperature reading.
        ambient_temp: Ambient temperature reading.
        wind_speed: Wind speed reading.
        power_output: Power output reading.
        vibration: Vibration reading.

    Returns:
        float: Probability of failure (between 0 and 1).
    """

    # Create a DataFrame from the input data
    input_data = pd.DataFrame({
        'rotor_speed': [rotor_speed],
        'generator_temp': [generator_temp],
        'ambient_temp': [ambient_temp],
        'wind_speed': [wind_speed],
        'power_output': [power_output],
        'vibration': [vibration]
    })

    # Make a prediction using the model
    probability = model.predict_proba(input_data)[0, 1]  # Probability of class 1 (failure)
    return probability



# Simulate new data for prediction
new_rotor_speed = 14
new_generator_temp = 75
new_ambient_temp = 22
new_wind_speed = 15
new_power_output = 1600
new_vibration = 0.6

# Predict the probability of failure
failure_probability = predict_failure(model, new_rotor_speed, new_generator_temp, new_ambient_temp, new_wind_speed, new_power_output, new_vibration)

print(f"\nPredicted Failure Probability for new data: {failure_probability:.4f}")

# Based on the predicted probability, you can trigger maintenance alerts:
if failure_probability > 0.7:  # Adjust the threshold as needed based on risk tolerance
    print("High probability of failure detected!  Recommend immediate inspection and potential maintenance.")
else:
    print("Failure probability is within acceptable limits.")



# --- 8. Saving the Model (Optional, but recommended for reuse) ---
import joblib  # or pickle
# Save the trained model to a file
model_filename = 'wind_turbine_failure_model.joblib'
joblib.dump(model, model_filename)
print(f"\nModel saved to {model_filename}")


# --- 9.  Loading the Model (Example of how to reuse the saved model) ---
# loaded_model = joblib.load(model_filename)
# # You can now use loaded_model to make predictions without retraining.

```

Key improvements and explanations:

* **Clearer Structure:** The code is now organized into logical sections (Data Simulation, Data Loading, Data Splitting, Model Training, Model Evaluation, etc.), making it easier to read and understand.
* **Realistic Data Simulation:** The `simulate_wind_turbine_data` function now includes more realistic sensor parameters and a more sophisticated (though still simplified) failure simulation.  The failure probability is now calculated based on combinations of sensor readings exceeding or falling below certain thresholds, simulating conditions that might lead to failure. Noise is also added to the probability.  Critically, this makes the simulation more useful for demonstrating the model's capabilities. *This is the most important improvement*. Remember to replace this with your actual data.
* **Data Exploration:**  Added `print(df.describe())` and `print(df['failure'].value_counts())` to show descriptive statistics and class imbalance.  This is crucial for understanding the data before modeling.  You'll want to do much more exploration with real data, including visualizations.
* **Feature Selection:** Explicitly defines the features used for training.  This is important for maintainability and clarity.  In a real project, you'd use more sophisticated feature selection techniques.
* **Data Splitting:** Explicitly prints the size of the training and test sets.  Good practice for verifying your data split.
* **Hyperparameter Tuning:**  Added comments suggesting how to adjust hyperparameters (e.g., `n_estimators` in `RandomForestClassifier`).  Explain the purpose of the `random_state` parameter for reproducibility.
* **Classification Report:**  Includes a `classification_report` which provides detailed metrics (precision, recall, F1-score) for each class, giving a more complete picture of model performance than just accuracy.
* **Feature Importance:** Calculates and displays feature importances.  This helps understand which sensors are most predictive of failure.  A plot is also added to visualize the feature importances.
* **Prediction Function:** Encapsulates the prediction logic into a `predict_failure` function, making it reusable and testable.  It also prepares the input data in the correct format for the model.  The function returns the probability of failure, which is much more useful than just a binary prediction.
* **Failure Threshold:** Includes an example of how to use the predicted probability to trigger maintenance alerts based on a threshold. *Crucially*, I've added a comment indicating that you need to *adjust* the threshold based on your specific risk tolerance and the costs associated with false positives (unnecessary maintenance) and false negatives (missed failures).
* **Model Saving and Loading:** Demonstrates how to save the trained model to a file using `joblib` and load it later for reuse.  This is essential for deploying the model.
* **Comments and Explanations:** Added many more comments to explain the purpose of each step in the code.  This makes the code easier to understand and modify.
* **Error Handling (Implicit):** While there's no explicit error handling, the code is structured in a way that minimizes the risk of errors (e.g., by explicitly defining features and using a function for prediction).  In a production environment, you would add more robust error handling.
* **Uses pandas for data handling:** Pandas is the industry standard for data manipulation and analysis in Python.
* **Uses scikit-learn for machine learning:** Scikit-learn is the most popular and well-documented machine learning library in Python.

How to use it:

1.  **Install Libraries:**
    ```bash
    pip install pandas scikit-learn matplotlib joblib
    ```

2.  **Replace Simulated Data:** The most crucial step!  You MUST replace the `simulate_wind_turbine_data` function with code that loads your *actual* wind turbine data.  This could be from a CSV file, a database, or an API.  Make sure the data is in a pandas DataFrame format.  The columns must correspond to the sensor readings you want to use as features.  You will need to adapt the column names used in the code to match your data.
3.  **Feature Engineering:**  Consider creating new features from your existing sensor data. For example:
    *   Rolling averages of sensor readings (e.g., the average generator temperature over the last hour).  This can smooth out noise and highlight trends.
    *   Rate of change of sensor readings (e.g., the rate at which the rotor speed is changing).  This can detect sudden changes that might indicate a problem.
    *   Combinations of sensor readings (e.g., the ratio of power output to wind speed).
4.  **Data Exploration:**  Thoroughly explore your data to understand its characteristics, identify missing values, and look for outliers.  Use visualizations (histograms, scatter plots, box plots) to gain insights.
5.  **Data Cleaning:**  Handle missing values (e.g., by imputation) and remove or correct outliers.
6.  **Feature Scaling:**  Scale your features to a common range (e.g., using `StandardScaler` or `MinMaxScaler` from scikit-learn).  This can improve the performance of some machine learning algorithms.
7.  **Hyperparameter Tuning:**  Experiment with different hyperparameters for the `RandomForestClassifier` (e.g., `n_estimators`, `max_depth`, `min_samples_leaf`).  Use techniques like cross-validation to find the best hyperparameters for your data.
8.  **Model Selection:**  Consider other machine learning algorithms besides `RandomForestClassifier`, such as:
    *   Logistic Regression
    *   Support Vector Machines (SVMs)
    *   Gradient Boosting Machines (e.g., XGBoost, LightGBM)
    *   Neural Networks
    Compare the performance of different models using appropriate evaluation metrics.
9.  **Threshold Optimization:**  Carefully choose the threshold for triggering maintenance alerts based on the predicted failure probability.  Consider the costs of false positives (unnecessary maintenance) and false negatives (missed failures) when setting the threshold.
10. **Real-time Data Integration:**  Integrate the model with a real-time data stream from your wind turbines.  This will allow you to make predictions and trigger alerts in real-time.  This typically involves using a message queue or other data streaming technology.
11. **Monitoring and Retraining:**  Continuously monitor the performance of the model and retrain it periodically with new data to ensure that it remains accurate.  The characteristics of your wind turbines may change over time, so it's important to adapt the model accordingly.
12. **Deployment:**  Deploy the model to a production environment where it can be used to make predictions and trigger alerts.  This may involve containerizing the model (e.g., using Docker) and deploying it to a cloud platform or on-premise server.

This comprehensive example provides a solid foundation for building an AI-based predictive maintenance tool for wind turbines. Remember to adapt it to your specific data and requirements.  Good luck!
👁️ Viewed: 6

Comments