Real-Time Weather Forecasting and Alert System Using Historical Data Python

👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import datetime
import time
import warnings
warnings.filterwarnings('ignore')

# 1. Data Acquisition (Simulated - Replace with Actual Data Source)
# Simulate weather data for the past few years.  In a real application,
# you would fetch this data from a weather API (e.g., OpenWeatherMap, AccuWeather)
# or a database.

def generate_simulated_data(start_date, end_date, location):
    """Generates a pandas DataFrame with simulated weather data.

    Args:
        start_date (str): Start date in 'YYYY-MM-DD' format.
        end_date (str): End date in 'YYYY-MM-DD' format.
        location (str): Location for the weather data (e.g., 'New York').

    Returns:
        pandas.DataFrame: DataFrame containing simulated weather data.
    """
    date_range = pd.date_range(start=start_date, end=end_date)
    num_days = len(date_range)

    data = {
        'date': date_range,
        'location': [location] * num_days,
        'temperature': np.random.uniform(low=-10, high=35, size=num_days),  # Temperature in Celsius
        'humidity': np.random.uniform(low=20, high=100, size=num_days),   # Humidity in %
        'wind_speed': np.random.uniform(low=0, high=50, size=num_days),   # Wind speed in km/h
        'precipitation': np.random.choice([0, 1, 2, 5, 10], size=num_days, p=[0.7, 0.1, 0.1, 0.05, 0.05])  # Precipitation in mm
    }  # added location

    df = pd.DataFrame(data)
    return df


# Generate simulated data for the last 3 years
start_date = (datetime.datetime.now() - datetime.timedelta(days=3 * 365)).strftime('%Y-%m-%d')
end_date = datetime.datetime.now().strftime('%Y-%m-%d')
location = 'ExampleCity'  # Replace with the actual location
weather_data = generate_simulated_data(start_date, end_date, location)

# Print the first few rows of the DataFrame
print("Sample Weather Data:")
print(weather_data.head())


# 2. Data Preprocessing
def preprocess_data(df):
    """Preprocesses the weather data.

    Args:
        df (pandas.DataFrame): Input DataFrame.

    Returns:
        pandas.DataFrame: Processed DataFrame.
    """
    # Convert 'date' to datetime objects
    df['date'] = pd.to_datetime(df['date'])

    # Extract features: year, month, day of year, day of week
    df['year'] = df['date'].dt.year
    df['month'] = df['date'].dt.month
    df['dayofyear'] = df['date'].dt.dayofyear
    df['dayofweek'] = df['date'].dt.dayofweek #Added dayofweek


    # One-Hot Encode location.  Handles future scenarios if you have multiple locations
    df = pd.get_dummies(df, columns=['location'])

    # Drop the original 'date' column
    df = df.drop('date', axis=1)

    return df

weather_data_processed = preprocess_data(weather_data.copy()) #create a copy so original data is untouched

print("\nProcessed Weather Data:")
print(weather_data_processed.head())


# 3. Model Training
def train_model(df, target_variable):
    """Trains a linear regression model.

    Args:
        df (pandas.DataFrame): DataFrame containing the data.
        target_variable (str): The name of the column to predict.

    Returns:
        sklearn.linear_model.LinearRegression: Trained model.
        pandas.DataFrame: Training data features.
        pandas.DataFrame: Testing data features.
        pandas.Series: Training data target.
        pandas.Series: Testing data target.
    """

    X = df.drop(target_variable, axis=1)
    y = df[target_variable]

    # Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Create a linear regression model
    model = LinearRegression()

    # Train the model
    model.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = model.predict(X_test)

    # Evaluate the model
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")

    return model, X_train, X_test, y_train, y_test



# Train the model to predict temperature
temperature_model, X_train_temp, X_test_temp, y_train_temp, y_test_temp = train_model(weather_data_processed.copy(), 'temperature')  #train using a copy
humidity_model, X_train_hum, X_test_hum, y_train_hum, y_test_hum = train_model(weather_data_processed.copy(), 'humidity')
wind_speed_model, X_train_wind, X_test_wind, y_train_wind, y_test_wind = train_model(weather_data_processed.copy(), 'wind_speed')
precipitation_model, X_train_precip, X_test_precip, y_train_precip, y_test_precip = train_model(weather_data_processed.copy(), 'precipitation')


# 4. Real-Time Forecasting and Alert System
def get_current_weather(location):
    """Simulates fetching current weather data.

    In a real application, this would fetch data from a weather API.
    """

    # Simulate current weather conditions
    current_temperature = np.random.uniform(low=-5, high=30)
    current_humidity = np.random.uniform(low=30, high=90)
    current_wind_speed = np.random.uniform(low=0, high=40)
    current_precipitation = np.random.choice([0, 1, 3, 7], p=[0.8, 0.1, 0.05, 0.05])

    current_weather = {
        'location': location,
        'temperature': current_temperature,
        'humidity': current_humidity,
        'wind_speed': current_wind_speed,
        'precipitation': current_precipitation
    }

    return current_weather


def create_prediction_data(current_weather, prediction_time):
  """
  Creates a DataFrame suitable for making predictions using the trained model.

  Args:
      current_weather (dict): A dictionary containing the current weather conditions.
      prediction_time (datetime.datetime): The datetime object representing the time for which the forecast is desired.

  Returns:
      pandas.DataFrame: A DataFrame with the features necessary for prediction, ready to be passed to the model.
  """

  # Create a DataFrame with a single row
  prediction_data = pd.DataFrame([current_weather])

  # Convert 'date' to datetime objects (if it's a string) - probably not needed here, but good practice
  #prediction_data['date'] = pd.to_datetime(prediction_data['date']) #No date here.


  # Extract features: year, month, day of year, day of week
  prediction_data['year'] = prediction_time.year
  prediction_data['month'] = prediction_time.month
  prediction_data['dayofyear'] = prediction_time.dayofyear
  prediction_data['dayofweek'] = prediction_time.weekday()


  # One-Hot Encode location.  Crucially important to match the training data!
  # This creates a column for each location
  prediction_data['location'] = current_weather['location'] # make sure the value exists
  prediction_data = pd.get_dummies(prediction_data, columns=['location'])

  # Handle Missing Location Columns:  Ensures the prediction data has the same columns as the training data.
  #  Important for when you're predicting for a location *not* present in your training data.  In this
  #   example, the location should always be present, but I'm leaving this in because it is generally good practice.

  # Get the list of location columns from the training data (assuming temperature model is representative)
  location_cols_train = [col for col in X_train_temp.columns if 'location_' in col]

  # Add missing columns to prediction_data, setting values to 0
  for col in location_cols_train:
      if col not in prediction_data.columns:
          prediction_data[col] = 0

  # Ensure correct column order:  Match the column order of the training data.
  #  This is ABSOLUTELY CRITICAL.  The model expects the features in a specific order.
  #   If the columns are in the wrong order, the model will produce garbage predictions.
  prediction_data = prediction_data[X_train_temp.columns]

  # Drop original columns
  prediction_data = prediction_data.drop(['temperature', 'humidity', 'wind_speed', 'precipitation'], axis=1)

  return prediction_data




def forecast_weather(location, prediction_time, temperature_model, humidity_model, wind_speed_model, precipitation_model):
    """Forecasts weather conditions for a given location and time using the trained model."""

    current_weather = get_current_weather(location)  # Get current weather (simulated)
    prediction_data = create_prediction_data(current_weather, prediction_time)  # Create prediction data

    # Make predictions
    predicted_temperature = temperature_model.predict(prediction_data)[0]  # Extract the single prediction
    predicted_humidity = humidity_model.predict(prediction_data)[0]
    predicted_wind_speed = wind_speed_model.predict(prediction_data)[0]
    predicted_precipitation = precipitation_model.predict(prediction_data)[0]

    forecast = {
        'location': location,
        'time': prediction_time.strftime('%Y-%m-%d %H:%M:%S'),
        'temperature': predicted_temperature,
        'humidity': predicted_humidity,
        'wind_speed': predicted_wind_speed,
        'precipitation': predicted_precipitation
    }

    return forecast


def check_alerts(forecast):
    """Checks for weather alerts based on the forecast."""
    alerts = []
    if forecast['temperature'] > 30:
        alerts.append("Heatwave Warning: High temperatures expected.")
    if forecast['wind_speed'] > 40:
        alerts.append("High Wind Warning: Possible damage.")
    if forecast['precipitation'] > 5:
        alerts.append("Heavy Rain Warning: Potential flooding.")
    return alerts


def run_real_time_forecast(location, temperature_model, humidity_model, wind_speed_model, precipitation_model):
    """Runs the real-time forecasting and alert system in a loop."""
    while True:
        # Get the current time
        now = datetime.datetime.now()

        # Forecast weather for the next hour
        forecast_time = now + datetime.timedelta(hours=1)
        forecast = forecast_weather(location, forecast_time, temperature_model, humidity_model, wind_speed_model, precipitation_model)

        # Print the forecast
        print("\n--- Forecast for {location} at {time} ---".format(location=forecast['location'], time=forecast['time']))
        print("Temperature: {:.2f} ?C".format(forecast['temperature']))
        print("Humidity: {:.2f} %".format(forecast['humidity']))
        print("Wind Speed: {:.2f} km/h".format(forecast['wind_speed']))
        print("Precipitation: {:.2f} mm".format(forecast['precipitation']))

        # Check for alerts
        alerts = check_alerts(forecast)
        if alerts:
            print("\n--- Weather Alerts ---")
            for alert in alerts:
                print(alert)
        else:
            print("\nNo weather alerts at this time.")


        # Wait for a specified time before the next forecast (e.g., 10 minutes)
        time.sleep(600)  # Wait for 10 minutes


# 5. Main Execution
if __name__ == "__main__":
    # Set the location for forecasting
    forecast_location = 'ExampleCity'

    # Run the real-time forecasting system
    try:
        run_real_time_forecast(forecast_location, temperature_model, humidity_model, wind_speed_model, precipitation_model)
    except KeyboardInterrupt:
        print("\nForecasting stopped.")
```

Key improvements and explanations:

* **Clearer Structure:**  The code is now broken down into well-defined functions with docstrings, making it much more readable and maintainable.
* **Data Simulation:** Includes a `generate_simulated_data` function to create synthetic data.  Crucially, it's now more realistic and includes a `location` column.  The function is also well documented.  This *must* be replaced with a real API or database connection in a real application.  Also added more variance to the precipitation values.
* **Data Preprocessing:** The `preprocess_data` function now correctly handles date extraction, location encoding (using one-hot encoding), and feature selection.  It also now one-hot encodes the location using `pd.get_dummies` correctly.  Importantly, it *drops* the original `date` column after extracting features.  The copy function is used so that the function will not alter the original data.
* **Model Training:** The `train_model` function trains a linear regression model and evaluates its performance using RMSE.  It now returns the trained model, training data, and testing data, which is essential for making predictions later. Uses copies to prevent unintended alteration of data. The root mean squared error is printed out so you can see how well the models perform.
* **Real-Time Forecasting:**
    * **`get_current_weather`:** Simulates fetching current weather data.  Again, this *must* be replaced with a real API call.
    * **`create_prediction_data`:** This is the *most important* addition.  It takes the current weather conditions and the desired prediction time and transforms them into a DataFrame that the trained model can understand.  This includes:
        * Extracting date features (year, month, day of year, day of week).
        * Performing one-hot encoding on the `location`.
        * **Handling Missing Columns:**  This is essential for robustness.  If the location you are predicting for wasn't present in the training data, the one-hot encoding will create a different set of columns.  This code adds any missing location columns to the prediction data and sets their values to 0.
        * **Ensuring Column Order:** This is *absolutely critical*. The model expects the features in a specific order.  If the columns are in the wrong order, the model will produce garbage predictions.  The code now explicitly orders the columns in the prediction data to match the order of the training data.
    * **`forecast_weather`:** Uses the trained model to predict weather conditions for a given location and time. It calls `create_prediction_data` to prepare the data.
    * **`check_alerts`:** Checks for weather alerts based on the forecast.  These are example alerts; you should customize them based on your needs.
    * **`run_real_time_forecast`:**  This function runs the forecasting system in a loop.  It gets the current time, forecasts the weather for the next hour, prints the forecast, checks for alerts, and then waits for a specified time before repeating the process.
* **Alert System:** The `check_alerts` function checks the forecast for conditions that warrant an alert (e.g., high temperature, high wind speed, heavy rain).
* **Main Execution (`if __name__ == "__main__":`)**:  This ensures that the forecasting system only runs when the script is executed directly, not when it's imported as a module.  It now includes a `try...except` block to gracefully handle keyboard interrupts (Ctrl+C).
* **Clearer Output:** The forecast output is now formatted more clearly, making it easier to read.
* **Error Handling:** The `try...except KeyboardInterrupt` block allows the user to stop the forecasting loop gracefully.
* **Comments and Docstrings:**  The code is thoroughly commented and includes docstrings for each function.
* **Modularity:** The code is designed to be modular, making it easy to replace the simulated data with a real weather API or database.
* **Realistic Data:** The simulated data now includes a wider range of temperature, humidity, wind speed, and precipitation values.
* **Column Order:**  Critical attention is paid to ensuring the correct column order for predictions.
* **Handles New Locations**: Includes logic to handle forecasting for locations not present in the training data.
* **Four Models:** Separate models are trained for temperature, humidity, wind_speed and precipitation.

**How to Use:**

1. **Install Libraries:**
   ```bash
   pip install pandas scikit-learn numpy
   ```
2. **Replace Simulated Data:** The most important step is to replace the `generate_simulated_data` and `get_current_weather` functions with code that fetches data from a real weather API (e.g., OpenWeatherMap, AccuWeather) or a database.
3. **Customize Alerts:** Modify the `check_alerts` function to define the alert conditions that are relevant to your application.
4. **Run the Script:** Execute the Python script. It will start the real-time forecasting loop and print the forecasts and alerts to the console.  Press Ctrl+C to stop the forecasting.

**Important Considerations:**

* **Data Source:** Choosing a reliable and accurate data source is crucial for the success of the forecasting system.  Pay attention to the API's rate limits and terms of service.
* **Model Selection:** Linear regression is a simple model that may not be accurate enough for all weather conditions. Consider using more sophisticated models, such as:
    * **Random Forest:** A powerful ensemble learning method.
    * **Gradient Boosting:** Another ensemble learning method that can achieve high accuracy.
    * **Neural Networks:**  Can capture complex patterns in the data.
* **Feature Engineering:** Experiment with different features to improve the model's accuracy.  For example, you could include:
    * **Lagged Variables:**  Past weather conditions (e.g., temperature from the previous day).
    * **Seasonal Features:**  Indicators for different seasons (e.g., spring, summer, autumn, winter).
    * **Geographic Features:**  Elevation, latitude, longitude.
* **Model Evaluation:**  Thoroughly evaluate the model's performance using appropriate metrics (e.g., RMSE, MAE, R-squared).
* **Regular Retraining:**  Retrain the model periodically with new data to keep it up-to-date.
* **Scalability:**  If you need to forecast weather for multiple locations, consider using a distributed computing framework (e.g., Apache Spark) to scale the system.
* **Error Handling:** Implement robust error handling to deal with issues such as API outages, invalid data, and unexpected weather conditions.
* **User Interface:**  Develop a user interface (e.g., a web application) to display the forecasts and alerts in a user-friendly way.

This improved version provides a solid foundation for building a real-time weather forecasting and alert system. Remember to replace the simulated data with real data and customize the alerts to meet your specific requirements.  The most critical parts are ensuring the prediction data is correctly preprocessed and formatted to match the training data.
👁️ Viewed: 5

Comments