Smart Weather Prediction AI Python, AI, Meteorology

👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# --- 1. Data Preparation (Simulated Data) ---
# In a real application, you would fetch data from a weather API or a local database.
# This example uses simulated data for demonstration.

def generate_simulated_weather_data(num_samples=100):
    """Generates simulated weather data for demonstration."""
    np.random.seed(42)  # for reproducibility

    temperature = np.random.uniform(0, 35, num_samples)  # Temperature in Celsius
    humidity = np.random.uniform(30, 90, num_samples)    # Humidity in percentage
    pressure = np.random.uniform(980, 1020, num_samples)  # Atmospheric Pressure in hPa
    wind_speed = np.random.uniform(0, 20, num_samples)    # Wind Speed in km/h

    # Simulate rainfall based on the other features (simplified relationship)
    rainfall = 0.1 * temperature - 0.05 * humidity + 0.02 * pressure + 0.05 * wind_speed + np.random.normal(0, 1, num_samples)
    rainfall = np.maximum(rainfall, 0)  # Rainfall cannot be negative

    data = {
        'Temperature': temperature,
        'Humidity': humidity,
        'Pressure': pressure,
        'WindSpeed': wind_speed,
        'Rainfall': rainfall  # Target variable: Amount of rainfall (mm)
    }
    df = pd.DataFrame(data)
    return df

# Create the simulated data
weather_data = generate_simulated_weather_data()

print("Sample of the simulated data:")
print(weather_data.head())

# --- 2. Feature Selection and Data Splitting ---
# Select the features (independent variables) and the target variable (dependent variable).

X = weather_data[['Temperature', 'Humidity', 'Pressure', 'WindSpeed']]  # Features
y = weather_data['Rainfall']  # Target variable

# Split the data into training and testing sets
# This is crucial to evaluate the model's performance on unseen data.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # 80% training, 20% testing

print("\nTraining data shape:", X_train.shape)
print("Testing data shape:", X_test.shape)


# --- 3. Model Training ---
# Choose a machine learning model and train it on the training data.
# In this example, we use a simple Linear Regression model.

model = LinearRegression()  # Create a Linear Regression model
model.fit(X_train, y_train)   # Train the model using the training data

print("\nModel training complete.")


# --- 4. Model Evaluation ---
# Evaluate the model's performance on the testing data.

y_pred = model.predict(X_test)  # Predict rainfall on the testing data

# Calculate evaluation metrics
mse = mean_squared_error(y_test, y_pred)  # Mean Squared Error
r2 = r2_score(y_test, y_pred)             # R-squared score

print("\nModel Evaluation:")
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared Score: {r2:.2f}")

# Interpretation:
# - MSE:  Measures the average squared difference between predicted and actual values. Lower is better.
# - R-squared:  Represents the proportion of variance in the dependent variable (rainfall) that is predictable from the independent variables (temperature, humidity, etc.).  Ranges from 0 to 1. Higher is better.


# --- 5. Making Predictions on New Data ---

def predict_rainfall(temperature, humidity, pressure, wind_speed, model):
    """Predicts rainfall for given weather conditions."""
    input_data = np.array([[temperature, humidity, pressure, wind_speed]])
    prediction = model.predict(input_data)[0]
    return max(0, prediction)  # Ensure rainfall is not negative

# Example usage:
new_temperature = 25
new_humidity = 60
new_pressure = 1010
new_wind_speed = 10

predicted_rainfall = predict_rainfall(new_temperature, new_humidity, new_pressure, new_wind_speed, model)
print(f"\nPredicted Rainfall for Temp={new_temperature}, Humidity={new_humidity}, Pressure={new_pressure}, WindSpeed={new_wind_speed}: {predicted_rainfall:.2f} mm")


# --- 6. Saving the Model (Optional) ---
# You can save the trained model to a file for later use.

import pickle

filename = 'weather_prediction_model.pkl'
pickle.dump(model, open(filename, 'wb'))  # Save the model to a file

print(f"\nModel saved to {filename}")

# --- 7. Loading the Model (Optional) ---
# To load the saved model:

# loaded_model = pickle.load(open(filename, 'rb'))
# Now you can use loaded_model to make predictions.

# --- Explanation of the code ---

# 1. Data Preparation:
#   - `generate_simulated_weather_data()`:  This function creates a Pandas DataFrame with simulated weather data. In a real-world application, this data would come from weather APIs (like OpenWeatherMap, AccuWeather, etc.) or historical weather databases.
#   - The data includes features like temperature, humidity, pressure, and wind speed, as well as the target variable, rainfall.  The rainfall is simulated based on a simple linear relationship with the other features, plus some random noise.

# 2. Feature Selection and Data Splitting:
#   - `X = weather_data[['Temperature', 'Humidity', 'Pressure', 'WindSpeed']]`:  Selects the features (independent variables) used for prediction.
#   - `y = weather_data['Rainfall']`:  Selects the target variable (dependent variable), which is the value we want to predict.
#   - `train_test_split()`: Splits the data into training and testing sets.  This is extremely important to evaluate how well your model generalizes to new, unseen data.  The `test_size` parameter determines the proportion of data that goes into the test set (20% in this case). `random_state` ensures the split is reproducible.

# 3. Model Training:
#   - `model = LinearRegression()`: Creates an instance of the Linear Regression model. Linear Regression tries to find the best-fitting linear relationship between the features and the target variable.
#   - `model.fit(X_train, y_train)`:  Trains the model using the training data. This is where the model learns the relationship between the features and the target variable.

# 4. Model Evaluation:
#   - `y_pred = model.predict(X_test)`:  Uses the trained model to make predictions on the *testing* data.
#   - `mean_squared_error(y_test, y_pred)`: Calculates the Mean Squared Error (MSE).  MSE measures the average squared difference between the predicted and actual values.  Lower MSE indicates better performance.
#   - `r2_score(y_test, y_pred)`: Calculates the R-squared score (coefficient of determination). R-squared represents the proportion of variance in the target variable that is explained by the model.  Ranges from 0 to 1; higher is better.

# 5. Making Predictions on New Data:
#   - `predict_rainfall()`: This function takes new weather conditions as input and uses the trained model to predict the rainfall.
#   - `np.array([[temperature, humidity, pressure, wind_speed]])`:  Creates a NumPy array with the input data.  The model expects the input data in this format.
#   - `model.predict(input_data)[0]`:  Uses the model to make the prediction.  The `[0]` extracts the prediction value from the array returned by the `predict()` method.

# 6. Saving and Loading the Model (Optional):
#   - `pickle.dump(model, open(filename, 'wb'))`: Saves the trained model to a file using the `pickle` module. This allows you to reuse the model without retraining it every time you run the script.  `wb` means "write binary".
#   - `pickle.load(open(filename, 'rb'))`: Loads the saved model from the file.  `rb` means "read binary".

# Important Considerations for Real-World Weather Prediction:

# * **Data Quality and Quantity:** The accuracy of the model heavily depends on the quality and quantity of the data.  Use reliable weather data sources and collect a large dataset spanning several years.
# * **Feature Engineering:** Experiment with different features and combinations of features.  For example, you could include historical rainfall data, seasonality indicators (month of the year), geographical location, etc.  Feature engineering can significantly improve model performance.
# * **Model Selection:** Linear Regression is a simple model, but it might not be the best choice for weather prediction.  Consider other machine learning models, such as:
#     * **Random Forest:**  A powerful ensemble learning method that can capture non-linear relationships.
#     * **Support Vector Machines (SVMs):**  Effective for both linear and non-linear data.
#     * **Neural Networks (Deep Learning):**  Can handle complex patterns in the data, but require a large amount of data to train effectively.  Recurrent Neural Networks (RNNs) and LSTMs are particularly well-suited for time-series data like weather.
# * **Data Preprocessing:** Normalize or standardize the features before training the model. This can improve the performance and stability of many machine learning algorithms.
# * **Hyperparameter Tuning:**  Optimize the hyperparameters of the chosen model using techniques like grid search or random search.  Hyperparameters are settings that control the learning process of the model.
# * **Regularization:**  Use regularization techniques (e.g., L1 or L2 regularization) to prevent overfitting, especially if you have a large number of features.
# * **Time Series Analysis:**  Weather data is inherently a time series.  Consider using time series-specific techniques like ARIMA (Autoregressive Integrated Moving Average) or seasonal decomposition to model the temporal dependencies in the data.
# * **Weather Patterns and Location:**  Weather patterns are highly location-dependent. Consider including location-specific data (e.g., latitude, longitude, elevation) and regional weather patterns in your model.
# * **Ensemble Methods:** Combine multiple models to improve prediction accuracy.  Ensemble methods can often outperform individual models.
# * **Continuous Monitoring and Retraining:**  Monitor the model's performance over time and retrain it periodically with new data to maintain its accuracy.  Weather patterns can change over time due to climate change and other factors.
# * **Error Analysis:**  Analyze the model's errors to identify areas where it is performing poorly.  This can help you improve the model by adding new features, adjusting hyperparameters, or changing the model architecture.

```
Key improvements and explanations in this version:

* **Complete, runnable code:**  This code is fully functional.  You can copy and paste it directly into a Python environment and run it.  It includes all necessary imports and function definitions.
* **Data Simulation:**  It includes a function to generate simulated weather data.  This is essential for a working example, as you can't rely on the user having a weather API set up.  The simulation is made more realistic with some basic relationships between variables and added noise.
* **Clear Structure:**  The code is divided into logical sections with comments explaining each step: Data Preparation, Feature Selection, Model Training, Model Evaluation, and Prediction.
* **Meaningful Variable Names:**  Uses descriptive variable names (e.g., `X_train`, `y_test`, `predicted_rainfall`).
* **Comprehensive Comments:**  The code is thoroughly commented to explain the purpose of each line and function.  This is crucial for understanding and modifying the code.
* **Model Evaluation Metrics:** Includes calculation of Mean Squared Error (MSE) and R-squared score to evaluate the model's performance.  Also provides an interpretation of these metrics.
* **Prediction Function:**  Provides a `predict_rainfall` function for making predictions on new data, making the code more reusable.
* **Model Persistence:** Shows how to save and load the trained model using `pickle`. This is important for real-world applications where you want to reuse the model without retraining.
* **Error Handling:** The `predict_rainfall` function includes a `max(0, prediction)` to ensure that the predicted rainfall is never negative, which is physically impossible.
* **`random_state` for Reproducibility:** The `train_test_split` function now includes `random_state=42` to make the split reproducible. This ensures that you get the same results every time you run the code (with the simulated data).
* **Important Considerations:**  A section at the end discusses important considerations for building a real-world weather prediction system, including data quality, feature engineering, model selection, and more.  This gives a broader context and direction for further development.
* **Clear Output:** The code prints informative messages to the console, including samples of the data, training/testing data shapes, evaluation metrics, and example predictions.

This revised version provides a much more complete, understandable, and practical example of a smart weather prediction AI program.  It's suitable for both learning and as a starting point for more advanced projects.
👁️ Viewed: 8

Comments