AI-Powered Crop Yield Predictor and Farming Recommendation System,Python

👤 Sharing: AI
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import pickle  # For saving and loading the model
import warnings  # For handling warnings

# Suppress a common warning related to future changes in pandas indexing
warnings.filterwarnings("ignore", category=FutureWarning)


# --- 1. Data Preparation (Replace with your actual data loading) ---

def load_and_preprocess_data(csv_file):
    """
    Loads data from a CSV file, handles missing values, and performs
    basic feature engineering (if needed).

    Args:
        csv_file (str): The path to the CSV file containing agricultural data.

    Returns:
        pandas.DataFrame: The preprocessed DataFrame.  Returns None if there are issues loading the data.
    """
    try:
        data = pd.read_csv(csv_file)
    except FileNotFoundError:
        print(f"Error: CSV file not found at {csv_file}")
        return None
    except Exception as e:
        print(f"Error loading CSV: {e}")
        return None
    # Basic handling of missing values (replace with more sophisticated methods if required)
    #  Important:  Choose an appropriate method based on your data.
    #  Options: Mean/Median imputation, dropping rows with missing values, etc.
    data = data.fillna(data.mean()) # Replace missing values with the mean of each column
    return data



# --- 2. Model Training ---

def train_model(data, features, target):
    """
    Trains a Random Forest Regressor model for crop yield prediction.

    Args:
        data (pandas.DataFrame): The DataFrame containing the data.
        features (list): A list of feature column names.
        target (str): The name of the target variable (crop yield).

    Returns:
        sklearn.ensemble.RandomForestRegressor: The trained model. Returns None if training fails.
    """
    X = data[features]
    y = data[target]

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # Adjust test_size as needed

    # Initialize and train the Random Forest Regressor model
    model = RandomForestRegressor(n_estimators=100, random_state=42)  # Adjust hyperparameters as needed
    try:
        model.fit(X_train, y_train)
    except Exception as e:
        print(f"Error during model training: {e}")
        return None

    # Evaluate the model on the test set (optional, but good practice)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"Mean Squared Error on Test Set: {mse}")

    return model


# --- 3. Recommendation System (Simplified) ---

def get_farming_recommendations(model, input_data, feature_names):
    """
    Provides farming recommendations based on the predicted yield.  This is a very
    simplified example.  A real recommendation system would be much more complex.

    Args:
        model (sklearn.ensemble.RandomForestRegressor): The trained model.
        input_data (dict): A dictionary of input feature values.  Keys must match feature_names.
        feature_names (list): A list of feature names in the correct order.

    Returns:
        str: A string containing farming recommendations.
    """

    # Create a DataFrame from the input data, ensuring it matches the model's expected input
    try:
      input_df = pd.DataFrame([input_data], columns=feature_names)
    except KeyError as e:
      return f"Error: Missing feature in input data: {e}"
    except Exception as e:
      return f"Error creating input DataFrame: {e}"

    # Ensure all values are numeric
    for col in input_df.columns:
        try:
            input_df[col] = pd.to_numeric(input_df[col])
        except ValueError:
            return f"Error: Non-numeric value found in column: {col}"


    # Make a yield prediction
    try:
        predicted_yield = model.predict(input_df)[0]
    except Exception as e:
        return f"Error during prediction: {e}"

    # Provide recommendations based on the predicted yield (example)
    if predicted_yield > 7:
        recommendation = "High yield expected. Maintain current practices."
    elif 4 <= predicted_yield <= 7:
        recommendation = "Moderate yield expected. Consider optimizing irrigation or fertilization."
    else:
        recommendation = "Low yield expected. Investigate soil health, pest control, and water management."

    return f"Predicted Yield: {predicted_yield:.2f}\nRecommendation: {recommendation}"


# --- 4. Model Saving and Loading ---

def save_model(model, filename="crop_yield_model.pkl"):
    """
    Saves the trained model to a file using pickle.

    Args:
        model (sklearn.ensemble.RandomForestRegressor): The trained model.
        filename (str): The filename to save the model to.
    """
    try:
        with open(filename, 'wb') as file:
            pickle.dump(model, file)
        print(f"Model saved to {filename}")
    except Exception as e:
        print(f"Error saving model: {e}")


def load_model(filename="crop_yield_model.pkl"):
    """
    Loads a trained model from a file.

    Args:
        filename (str): The filename to load the model from.

    Returns:
        sklearn.ensemble.RandomForestRegressor: The loaded model. Returns None if loading fails.
    """
    try:
        with open(filename, 'rb') as file:
            model = pickle.load(file)
        print(f"Model loaded from {filename}")
        return model
    except FileNotFoundError:
        print(f"Error: Model file not found at {filename}")
        return None
    except Exception as e:
        print(f"Error loading model: {e}")
        return None


# --- 5. Main Execution ---

if __name__ == "__main__":
    # 1. Load and Preprocess Data
    csv_file = "crop_data.csv"  # Replace with your actual CSV file
    data = load_and_preprocess_data(csv_file)

    if data is None:
        print("Data loading failed. Exiting.")
        exit()

    # 2. Define Features and Target Variable
    features = ['temperature', 'rainfall', 'soil_fertility', 'pest_infestation']  # Replace with your actual feature columns
    target = 'crop_yield'  # Replace with your actual target column

    # 3. Train the Model
    model = train_model(data, features, target)

    if model is None:
        print("Model training failed. Exiting.")
        exit()


    # 4. Save the Model
    save_model(model)  # Save the model after training

    # 5. Load the Model (Example - demonstrating saving and loading)
    loaded_model = load_model()

    if loaded_model is None:
        print("Model loading failed.  Using the originally trained model.")
        loaded_model = model # Use the original model if loading fails


    # 6. Get Farming Recommendations
    input_data = {
        'temperature': 25,
        'rainfall': 150,
        'soil_fertility': 7,
        'pest_infestation': 2
    } # Replace with your actual input data
    recommendations = get_farming_recommendations(loaded_model, input_data, features)
    print(recommendations)
```

Key improvements and explanations:

* **Clearer Structure:**  The code is divided into well-defined functions for data loading, model training, recommendation generation, and model saving/loading.  This makes the code much more readable and maintainable.
* **Error Handling:**  Includes `try...except` blocks for robust error handling during file loading, model training, prediction, and DataFrame creation.  This is crucial for a production-ready application.  Critically, it returns `None` from functions when an error occurs, allowing the main code to check for failure.
* **Data Preprocessing Function:** The `load_and_preprocess_data` function now encapsulates the data loading and preprocessing steps.  This includes handling potential `FileNotFoundError` and other exceptions that may occur during data loading.  Importantly, it fills missing values with the *mean* (you should adapt this based on your data).
* **Feature and Target Definition:**  The `features` and `target` variables are explicitly defined, making it easy to change the model's input and output.  The `features` list *must* match the column names in your CSV file.
* **Model Saving and Loading:** The `save_model` and `load_model` functions use `pickle` to persist the trained model to disk. This is essential so you don't have to retrain the model every time you want to use it.
* **Recommendation System Logic:** The `get_farming_recommendations` function now takes the trained model, input data, and feature names as input.  It creates a Pandas DataFrame from the input data to ensure compatibility with the model. The function includes improved error handling, particularly to catch missing features or non-numeric input, and provides more specific recommendations based on the predicted yield. *Crucially, the input data is checked to ensure it contains all required features and that those features are numeric*.
* **Input Data Validation:** Added more comprehensive validation of the input data in `get_farming_recommendations()`. It now checks for `KeyError` if a required feature is missing and `ValueError` if a non-numeric value is provided. This prevents common errors during prediction.
* **Clearer Recommendations:** Improved the recommendations to be more specific based on different yield ranges.
* **Random State:**  The `random_state` parameter is used in `train_test_split` and `RandomForestRegressor` for reproducibility.  This ensures that you get the same results every time you run the code with the same data.  This is important for debugging and comparing different model configurations.
* **Comments and Documentation:**  Added more comments to explain the purpose of each section of the code.  Docstrings are included for each function.
* **Example Usage:** The `if __name__ == "__main__":` block demonstrates how to use the functions to load data, train a model, save the model, load the model, and get recommendations.
* **Test Set Evaluation:** Includes code to evaluate the model's performance on a test set using Mean Squared Error (MSE). This helps you assess how well the model generalizes to unseen data.  This is essential during development.
* **Explicit Dependency Imports:** Includes all necessary import statements at the beginning of the code.
* **Handles missing data:** Missing data handling is included.
* **Handles non-numeric data:** The input data is checked to ensure all features are numeric.
* **Handles loading failure:** If the saved model fails to load, the program uses the trained model to generate the recommendation.

To use this code:

1. **Install Libraries:**
   ```bash
   pip install pandas scikit-learn
   ```

2. **Prepare Your Data:**
   * Create a CSV file named `crop_data.csv` (or change the `csv_file` variable).
   * The CSV file should have columns for features (e.g., temperature, rainfall, soil_fertility, pest_infestation) and a target variable (crop_yield).  *The names of these columns must match the `features` and `target` variables in the code*.
   * Make sure your data is clean and properly formatted.  Handle missing values appropriately.
   * The data should be numeric.

3. **Configure Features and Target:**
   * Update the `features` and `target` lists to match the column names in your CSV file.  *This is very important*.

4. **Run the Code:**
   ```bash
   python your_script_name.py
   ```

The code will train a model, save it to a file, load it, and then provide a farming recommendation based on the example input data.

**Important Considerations:**

* **Data Quality:** The accuracy of the model depends heavily on the quality and representativeness of your data.
* **Feature Engineering:**  Consider more advanced feature engineering techniques to improve model performance.  For example, you might create interaction terms (e.g., temperature * rainfall) or use polynomial features.
* **Hyperparameter Tuning:** Experiment with different hyperparameters for the Random Forest Regressor (e.g., `n_estimators`, `max_depth`, `min_samples_split`) to optimize model performance.  Use techniques like cross-validation to find the best hyperparameters.
* **Model Selection:** Explore other machine learning algorithms (e.g., linear regression, support vector machines, neural networks) to see if they perform better on your data.
* **Real-World Recommendations:** The recommendation system in this example is very basic. A real-world recommendation system would need to consider many more factors, such as:
    * Crop type
    * Location-specific conditions
    * Market prices
    * Farmer preferences
    * Available resources
* **Continuous Improvement:** Continuously monitor the model's performance and retrain it with new data to ensure it remains accurate.
* **Explainability:** Consider using techniques to make the model's predictions more explainable.  This can help farmers understand why the model is making certain recommendations and build trust in the system.  Libraries like SHAP or LIME can be helpful.
* **Deployment:**  To make the system accessible to farmers, you would need to deploy it as a web application or mobile app. This would involve creating a user interface for entering input data and displaying recommendations.

This improved version provides a more robust, well-structured, and functional starting point for your AI-powered crop yield prediction and farming recommendation system. Remember to adapt the code to your specific data and requirements.
👁️ Viewed: 4

Comments