AI-Powered Staking Strategy Simulator Python, AI, Machine Learning

👤 Sharing: AI
```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# --- 1. Data Generation (Simulated Staking Data) ---
def generate_staking_data(n_samples=100, volatility=0.05):
    """
    Generates synthetic staking data including staked amount, duration,
    network congestion, and resulting rewards.

    Args:
        n_samples (int): Number of data points to generate.
        volatility (float):  Controls the randomness/noise in the reward calculation.

    Returns:
        pandas.DataFrame: DataFrame containing the simulated staking data.
    """

    np.random.seed(42)  # For reproducibility

    staked_amount = np.random.uniform(10, 1000, n_samples)  # Staked amount between 10 and 1000
    duration = np.random.randint(7, 90, n_samples)  # Staking duration in days (7-90)
    network_congestion = np.random.uniform(0.1, 0.9, n_samples)  # Network congestion (0.1-0.9)
    base_reward_rate = 0.05  # Base annual reward rate (5%)

    # Simulate rewards based on the input features with some randomness
    rewards = (
        staked_amount
        * (duration / 365)
        * (base_reward_rate + (1 - network_congestion) * 0.02) # Higher reward with lower congestion
        + np.random.normal(0, staked_amount * volatility, n_samples)
    )  # add some random noise

    # Ensure rewards are non-negative
    rewards = np.maximum(rewards, 0)

    df = pd.DataFrame({
        'StakedAmount': staked_amount,
        'Duration': duration,
        'NetworkCongestion': network_congestion,
        'Rewards': rewards
    })

    return df


# --- 2. Feature Engineering (Optional, but beneficial) ---
def feature_engineer(df):
    """
    Creates new features from existing ones to potentially improve model performance.

    Args:
        df (pandas.DataFrame): Input DataFrame.

    Returns:
        pandas.DataFrame: DataFrame with engineered features.
    """
    df['StakedAmount_Duration'] = df['StakedAmount'] * df['Duration']  # Interaction term
    df['Congestion_Duration'] = df['NetworkCongestion'] * df['Duration']  #Another interaction term

    # You can add more features here, such as:
    # - Polynomial features (e.g., StakedAmount^2)
    # - Interaction terms between other features
    # - Categorical encoding if you have categorical data

    return df

# --- 3. Model Training ---
def train_model(df):
    """
    Trains a Linear Regression model on the staking data.

    Args:
        df (pandas.DataFrame): DataFrame containing the features and target variable (Rewards).

    Returns:
        tuple: A tuple containing the trained model, X_test, and y_test.
    """

    X = df[['StakedAmount', 'Duration', 'NetworkCongestion', 'StakedAmount_Duration', 'Congestion_Duration']]  # Features
    y = df['Rewards']  # Target variable

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = LinearRegression()
    model.fit(X_train, y_train)

    # Evaluate the model
    y_pred = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    print(f"Root Mean Squared Error (RMSE): {rmse}")


    return model, X_test, y_test


# --- 4. Staking Strategy Simulation ---
def simulate_staking_strategy(model, staked_amount, duration, network_congestion):
    """
    Simulates the expected rewards for a given staking strategy based on the trained model.

    Args:
        model: Trained machine learning model.
        staked_amount (float): Amount of tokens to stake.
        duration (int): Staking duration in days.
        network_congestion (float): Estimated network congestion (0.1-0.9).

    Returns:
        float: Predicted rewards for the given staking strategy.
    """

    # Feature Engineering (same as during training!) - CRUCIAL
    staked_amount_duration = staked_amount * duration
    congestion_duration = network_congestion * duration


    # Create a DataFrame from the input features
    input_data = pd.DataFrame({
        'StakedAmount': [staked_amount],
        'Duration': [duration],
        'NetworkCongestion': [network_congestion],
        'StakedAmount_Duration': [staked_amount_duration],  # Engineered feature
        'Congestion_Duration': [congestion_duration]  # Engineered feature
    })

    # Make a prediction using the trained model
    predicted_rewards = model.predict(input_data)[0]  # Get the first (and only) prediction

    return predicted_rewards


# --- 5. Optimization (Simple Example - can be extended) ---
def optimize_staking_strategy(model, amount_range, duration_range, congestion_range):
    """
    Finds the optimal staking strategy (amount and duration) within given ranges,
    aiming to maximize predicted rewards.  This is a very basic exhaustive search;
    more sophisticated optimization algorithms (e.g., Bayesian Optimization) could be used.

    Args:
        model: Trained machine learning model.
        amount_range (tuple): Tuple (min_amount, max_amount) for staking amount.
        duration_range (tuple): Tuple (min_duration, max_duration) for staking duration.
        congestion_range (tuple): Tuple (min_congestion, max_congestion) for Network congestion values

    Returns:
        tuple: Optimal staked amount, duration, and predicted rewards.
    """

    best_amount = None
    best_duration = None
    best_congestion = None
    best_rewards = -1  # Initialize with a very low value

    for amount in np.linspace(amount_range[0], amount_range[1], 10):  #Try 10 different amounts
        for duration in range(duration_range[0], duration_range[1] + 1, 7): #Try durations in 7-day increments
            for congestion in np.linspace(congestion_range[0], congestion_range[1], 5): #Try 5 congestion values
                rewards = simulate_staking_strategy(model, amount, duration, congestion)

                if rewards > best_rewards:
                    best_rewards = rewards
                    best_amount = amount
                    best_duration = duration
                    best_congestion = congestion

    return best_amount, best_duration, best_congestion, best_rewards

# --- 6. Visualization (Optional) ---
def visualize_predictions(model, X_test, y_test):
    """
    Visualizes the model's predictions against the actual values on the test set.
    """
    y_pred = model.predict(X_test)

    plt.figure(figsize=(8, 6))
    plt.scatter(y_test, y_pred, alpha=0.5)
    plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2) # Line of perfect prediction
    plt.xlabel("Actual Rewards")
    plt.ylabel("Predicted Rewards")
    plt.title("Actual vs. Predicted Rewards")
    plt.show()


# --- 7. Main Execution ---
if __name__ == "__main__":
    # 1. Generate staking data
    staking_data = generate_staking_data(n_samples=500)

    # 2. Feature Engineering
    staking_data = feature_engineer(staking_data)

    # 3. Train the model
    model, X_test, y_test = train_model(staking_data)

    # 4. Visualize Predictions
    visualize_predictions(model, X_test, y_test)


    # 5. Simulate a specific staking strategy
    staked_amount = 500
    duration = 30
    network_congestion = 0.5
    predicted_rewards = simulate_staking_strategy(model, staked_amount, duration, network_congestion)
    print(f"\nPredicted rewards for staking {staked_amount} for {duration} days with congestion {network_congestion}: {predicted_rewards:.2f}")


    # 6. Optimize the staking strategy
    amount_range = (100, 800)
    duration_range = (14, 60)
    congestion_range = (0.2, 0.7)
    best_amount, best_duration, best_congestion, best_rewards = optimize_staking_strategy(model, amount_range, duration_range, congestion_range)

    print("\n--- Optimized Staking Strategy ---")
    print(f"Best Staked Amount: {best_amount:.2f}")
    print(f"Best Duration: {best_duration} days")
    print(f"Best Congestion: {best_congestion:.2f}")
    print(f"Predicted Rewards: {best_rewards:.2f}")
```

Key improvements and explanations:

* **Clear Structure:**  The code is now divided into well-defined functions with clear purposes, making it much more readable and maintainable.  The order of operations (data generation, feature engineering, model training, simulation, optimization) is logical.
* **Data Generation:** The `generate_staking_data` function now simulates more realistic staking data, including network congestion and a base reward rate.  The `volatility` parameter controls the noise in the reward calculation, which is crucial for making the simulation more realistic.  `np.random.seed(42)` ensures that the generated data is the same every time you run the script, making your results reproducible.  The simulated rewards are also checked to ensure they are non-negative.
* **Feature Engineering:** The `feature_engineer` function adds *interaction terms* (e.g., `StakedAmount_Duration`, `Congestion_Duration`).  Interaction terms can capture non-linear relationships between the features and the target variable, often improving model performance. It's crucial to apply the *same* feature engineering during simulation as during training.
* **Model Training:**  The `train_model` function now explicitly selects the features to use for training the model, including the engineered features. It also includes model evaluation using Root Mean Squared Error (RMSE), which provides a measure of the model's accuracy.  A Linear Regression model is used, which is a good starting point.
* **Staking Strategy Simulation:** The `simulate_staking_strategy` function now takes the trained model and staking parameters (staked amount, duration, network congestion) as input.  It's critical that this function applies the *same feature engineering* as used during training.  This ensures that the input data to the model is in the correct format.
* **Optimization:**  The `optimize_staking_strategy` function performs a simple grid search to find the optimal staking amount and duration within specified ranges. This is a *very basic* optimization and could be improved with more sophisticated algorithms like Bayesian Optimization or Genetic Algorithms.  The congestion level is also considered in the optimization.  It uses `np.linspace` for more granular sampling of the amount and congestion ranges.  Initialization of `best_rewards` to -1 ensures that the first valid reward is always considered the best initially.
* **Visualization:** The `visualize_predictions` function provides a scatter plot of predicted vs. actual rewards on the test set.  A line of perfect prediction is added to make it easier to assess the model's accuracy.
* **Clear Output:** The `if __name__ == "__main__":` block now demonstrates how to use the functions to generate data, train a model, simulate staking strategies, and optimize the staking strategy.  The predicted rewards are printed with formatting (`:.2f`) for better readability.
* **Error Handling/Input Validation (Missing):**  This version *does not* include input validation or error handling. In a real-world application, you would want to add checks to ensure that the input values are within reasonable ranges (e.g., staked amount is positive, duration is within allowed limits, network congestion is between 0 and 1).
* **Scalability:** The current optimization strategy is very basic and would not scale well to more complex models or larger search spaces. Consider using more efficient optimization algorithms for real-world applications.
* **Model Choice:**  Linear Regression is a good starting point, but other models like Random Forests, Gradient Boosting Machines, or Neural Networks might provide better accuracy, especially if the relationship between the features and rewards is non-linear.
* **Comments and Documentation:**  Comprehensive comments and docstrings are included to explain the purpose of each function and the code's logic.
* **Reproducibility:** `np.random.seed(42)` is used to ensure that the results are reproducible.
* **Realistic Data:** The reward calculation now incorporates the impact of network congestion, making the simulation more realistic.
* **Feature Importance:**  After training the model, you could analyze the coefficients of the Linear Regression model to determine the relative importance of each feature.  For more complex models, you can use feature importance techniques from libraries like `scikit-learn`.

How to run the code:

1. **Install Libraries:**
   ```bash
   pip install numpy pandas scikit-learn matplotlib
   ```
2. **Save:** Save the code as a Python file (e.g., `staking_simulator.py`).
3. **Run:** Execute the file from your terminal:
   ```bash
   python staking_simulator.py
   ```

This improved version provides a solid foundation for building a more sophisticated AI-powered staking strategy simulator. Remember to tailor the code to your specific needs and data.
👁️ Viewed: 10

Comments