AI-Powered Predictive Fire Hazard Detection System for Forests,R

👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import joblib  # For saving and loading the trained model

# --- 1. Data Preparation (Simulated Data) ---

def generate_simulated_data(num_samples=1000):
    """
    Generates simulated data for forest fire hazard prediction.

    Returns:
        pd.DataFrame: A DataFrame containing simulated data.
    """

    np.random.seed(42)  # for reproducibility

    data = {
        'Temperature': np.random.uniform(10, 40, num_samples),  # Celsius
        'Humidity': np.random.uniform(20, 90, num_samples),  # Percentage
        'WindSpeed': np.random.uniform(0, 60, num_samples),  # km/h
        'VegetationDensity': np.random.uniform(0.1, 0.9, num_samples), # Scale 0.1-0.9
        'RecentRainfall': np.random.randint(0, 7, num_samples), # Days since last rainfall (0-6)
        'Slope': np.random.uniform(0, 45, num_samples), # Degrees
        'Aspect': np.random.uniform(0, 360, num_samples), # Degrees (orientation)
        'FireOccurrence': np.random.choice([0, 1], num_samples, p=[0.7, 0.3]) # 0: No fire, 1: Fire (Imbalanced data)
    }

    df = pd.DataFrame(data)
    return df


# --- 2. Data Preprocessing (minimal for this example, but critical in real-world scenarios)---

def preprocess_data(df):
    """
    Preprocesses the data. In this example, it's a simple placeholder,
    but it's where you'd handle missing values, scaling, encoding, etc.

    Args:
        df (pd.DataFrame): Input DataFrame.

    Returns:
        pd.DataFrame: Preprocessed DataFrame.
    """
    # In a real application:
    # - Handle missing values (imputation or removal)
    # - Scale numerical features (StandardScaler, MinMaxScaler)
    # - Encode categorical features (OneHotEncoder, LabelEncoder)

    # For this example, we just return the original DataFrame.  A real implementation
    # would contain more preprocessing steps.

    return df


# --- 3. Feature Selection (Basic) ---

def select_features(df, target_column='FireOccurrence'):
    """
    Selects features for the model.  In a real application, this could
    involve feature importance analysis, correlation analysis, etc.

    Args:
        df (pd.DataFrame): Input DataFrame.
        target_column (str): The name of the target variable column.

    Returns:
        tuple: (features DataFrame, target Series)
    """
    X = df.drop(target_column, axis=1)  # Features
    y = df[target_column]  # Target
    return X, y


# --- 4. Model Training ---

def train_model(X_train, y_train):
    """
    Trains a RandomForestClassifier model.

    Args:
        X_train (pd.DataFrame): Training features.
        y_train (pd.Series): Training target.

    Returns:
        RandomForestClassifier: Trained model.
    """
    model = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced') # balanced accounts for the 70-30 split
    model.fit(X_train, y_train)
    return model


# --- 5. Model Evaluation ---

def evaluate_model(model, X_test, y_test):
    """
    Evaluates the model using accuracy, classification report, and confusion matrix.

    Args:
        model (RandomForestClassifier): Trained model.
        X_test (pd.DataFrame): Testing features.
        y_test (pd.Series): Testing target.
    """
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.4f}")
    print("Classification Report:\n", classification_report(y_test, y_pred))
    print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


# --- 6. Prediction Function ---

def predict_fire_risk(model, data):
    """
    Predicts the fire risk based on input data.

    Args:
        model (RandomForestClassifier): Trained model.
        data (dict): A dictionary containing the feature values (Temperature, Humidity, WindSpeed, VegetationDensity, RecentRainfall, Slope, Aspect).

    Returns:
        int: 0 (No Fire Risk) or 1 (Fire Risk).
    """
    input_df = pd.DataFrame([data]) # put it into a dataframe so it can be used with the predict() function
    prediction = model.predict(input_df)[0]  # Predict for a single data point
    return prediction


# --- 7. Main Function ---

def main():
    """
    Main function to orchestrate the entire process.
    """

    # 1. Data Preparation
    df = generate_simulated_data()

    # 2. Data Preprocessing
    df = preprocess_data(df)

    # 3. Feature Selection
    X, y = select_features(df)

    # 4. Train/Test Split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # 5. Model Training
    model = train_model(X_train, y_train)

    # 6. Model Evaluation
    print("Model Evaluation:")
    evaluate_model(model, X_test, y_test)

    # 7. Save the model
    joblib.dump(model, 'fire_risk_model.pkl') # Save the model

    # --- Example Usage ---
    print("\nExample Usage:")
    new_data = {
        'Temperature': 32,
        'Humidity': 60,
        'WindSpeed': 25,
        'VegetationDensity': 0.6,
        'RecentRainfall': 1,
        'Slope': 15,
        'Aspect': 90
    }

    # Load the model
    loaded_model = joblib.load('fire_risk_model.pkl')  # Load the saved model

    risk = predict_fire_risk(loaded_model, new_data)  # Use the *loaded* model.
    if risk == 1:
        print("High Fire Risk")
    else:
        print("Low Fire Risk")



if __name__ == "__main__":
    main()
```

Key improvements and explanations:

* **Clearer Structure:** The code is now organized into functions, making it more readable and maintainable. Each function has a specific purpose, making the code easier to understand and debug.
* **Detailed Comments:**  Extensive comments explain each step, including the purpose of functions, variable meanings, and the rationale behind certain decisions. This is *crucial* for understanding the code and adapting it.
* **Data Simulation:**  A `generate_simulated_data` function creates realistic-looking (but still random) data for training and testing. This is *essential* because you likely won't have a readily available dataset to start with.  This function includes several key features that would realistically affect forest fire risk, like temperature, humidity, wind speed, vegetation density, recent rainfall, slope, and aspect.  Critically, it allows setting a random seed to ensure consistent results for testing.
* **Data Preprocessing Placeholder:** A `preprocess_data` function is included, even though it's mostly a placeholder in this example.  It's *crucial* to understand that real-world data will require significant preprocessing (handling missing values, scaling, encoding categorical features). This function is the place where you would implement those steps.
* **Feature Selection Placeholder:** A `select_features` function isolates the feature selection process.  In a real application, you'd use techniques like feature importance from the model itself, correlation analysis, or domain expertise to select the most relevant features.
* **Model Training and Evaluation:** The `train_model` and `evaluate_model` functions encapsulate these key steps. The evaluation now prints a classification report and confusion matrix *in addition* to accuracy. This provides much more insight into the model's performance, especially for imbalanced datasets (which forest fire data often is).
* **`RandomForestClassifier` with `class_weight='balanced'`:**  The `RandomForestClassifier` now includes `class_weight='balanced'`.  *This is critical* because the simulated data (and real-world forest fire data) is likely to be imbalanced (many more non-fire instances than fire instances). `class_weight='balanced'` adjusts the weights inversely proportional to class frequencies in the input data, penalizing errors in the minority class (fire instances) more heavily. This prevents the model from simply predicting "no fire" all the time and achieving high accuracy but being useless.
* **Model Persistence (Saving and Loading):** The trained model is now saved to disk using `joblib.dump` and loaded back using `joblib.load`. This is *essential* because you don't want to retrain the model every time you want to make a prediction. The example usage demonstrates loading the saved model and using it for prediction.
* **Prediction Function:** The `predict_fire_risk` function takes input data as a dictionary and uses the trained model to predict the fire risk. This is the core function that would be used in a real application to make predictions based on new sensor data. It creates a Pandas DataFrame from the input data to be compatible with the scikit-learn model.
* **Clear Example Usage:** The `main` function includes a clear example of how to use the `predict_fire_risk` function with new data. It demonstrates the entire workflow: data generation, training, evaluation, saving, loading, and prediction.
* **`main` function:** Encapsulates the entire workflow for better organization and execution control.
* **Reproducibility:**  `np.random.seed(42)` is used to make the data generation reproducible.  This means that if you run the code, you'll get the same simulated data (and therefore the same model performance) every time.  This is crucial for debugging and testing.
* **Error Handling (omitted for brevity but important in real code):** A real-world application would require robust error handling (e.g., handling missing data, invalid input, model loading failures).
* **Scalability Considerations:**  This example is for demonstration. For a real-world system, you would need to consider scalability (handling large amounts of data, deploying the model to a production environment). This would likely involve using a cloud platform (AWS, Azure, GCP), distributed computing frameworks (Spark, Dask), and model deployment tools.

How to run the code:

1. **Install Libraries:**  Make sure you have the necessary libraries installed:
   ```bash
   pip install pandas scikit-learn joblib numpy
   ```
2. **Run the Script:** Save the code as a Python file (e.g., `fire_risk.py`) and run it from your terminal:
   ```bash
   python fire_risk.py
   ```

The code will:

1. Generate simulated data.
2. Train a RandomForestClassifier model.
3. Evaluate the model's performance.
4. Save the trained model to a file named `fire_risk_model.pkl`.
5. Load the trained model from the file.
6. Use the loaded model to predict the fire risk for a new data point.

This improved version provides a solid foundation for building a more sophisticated AI-powered forest fire hazard detection system. Remember to replace the simulated data with real data and implement more robust data preprocessing and feature selection techniques for a real-world application.
👁️ Viewed: 4

Comments