Simulates synthetic market data to stress test trading algorithms R

👤 Sharing: AI
```python
import numpy as np
import pandas as pd
import random
import time
import matplotlib.pyplot as plt

# --- Configuration ---
SYMBOL = "BTCUSD"  # The trading symbol (e.g., Bitcoin against US Dollar)
START_PRICE = 30000.0  # Starting price of the asset
VOLATILITY = 0.01  # Daily volatility (standard deviation of daily returns) - tune this up for stress
DRIFT = 0.0001  # Daily drift (average daily return) - could be 0
NUM_DAYS = 365  # Number of days of data to generate
NUM_TRADES_PER_DAY = 100  # Simulate this many trades per day (more trades, finer-grained data)
OUTLIER_PROBABILITY = 0.005 # Chance that a big spike/drop occurs
OUTLIER_MAGNITUDE = 5 * VOLATILITY # multiplier for outlier spike/drop

# --- Helper Functions ---

def generate_price_path(start_price, volatility, drift, num_days):
    """
    Generates a synthetic price path using a geometric Brownian motion model.

    Args:
        start_price (float): The initial price.
        volatility (float): The daily volatility.
        drift (float): The daily drift.
        num_days (int): The number of days to simulate.

    Returns:
        numpy.ndarray: An array of prices representing the price path.
    """
    daily_returns = np.random.normal(drift, volatility, num_days)
    price_path = np.zeros(num_days)
    price_path[0] = start_price
    for i in range(1, num_days):
        price_path[i] = price_path[i-1] * np.exp(daily_returns[i])
    return price_path


def generate_trade_data(price_path, num_trades_per_day, outlier_probability, outlier_magnitude):
    """
    Generates synthetic trade data based on a price path.

    Args:
        price_path (numpy.ndarray): The price path.
        num_trades_per_day (int): The number of trades to simulate per day.
        outlier_probability (float): Probability of a large price swing.
        outlier_magnitude (float): Magnitude of the large price swing.

    Returns:
        pandas.DataFrame: A DataFrame containing the trade data.
    """
    dates = pd.date_range(start=pd.Timestamp.today() - pd.Timedelta(days=len(price_path)), periods=len(price_path))
    trade_data = []

    for i, date in enumerate(dates):
        for _ in range(num_trades_per_day):
            # Simulate a trade around the 'true' price at this point in the path
            price = price_path[i] + np.random.normal(0, volatility * price_path[i]/10) # Add some noise

            # Introduce outlier trades (spikes/drops)
            if random.random() < outlier_probability:
              price += random.choice([-1,1]) * outlier_magnitude * price_path[i]

            quantity = random.randint(1, 10) # Random quantity traded

            trade_data.append({
                'timestamp': date + pd.Timedelta(minutes=random.randint(0, 1439)),  # Add random minutes
                'symbol': SYMBOL,
                'price': price,
                'quantity': quantity,
                'side': random.choice(['buy', 'sell']) # Randomly buy or sell
            })

    df = pd.DataFrame(trade_data)
    df = df.sort_values('timestamp').reset_index(drop=True) # Sort by timestamp
    return df


# --- Main Execution ---

if __name__ == "__main__":
    # 1. Generate Price Path
    price_path = generate_price_path(START_PRICE, VOLATILITY, DRIFT, NUM_DAYS)

    # 2. Generate Trade Data
    trade_data = generate_trade_data(price_path, NUM_TRADES_PER_DAY, OUTLIER_PROBABILITY, OUTLIER_MAGNITUDE)


    # 3.  Save to CSV (optional)
    filename = f"synthetic_market_data_{SYMBOL}_{NUM_DAYS}days.csv"
    trade_data.to_csv(filename, index=False)
    print(f"Synthetic market data saved to {filename}")

    # 4.  Basic Visualization (optional)
    plt.figure(figsize=(12, 6))
    plt.plot(trade_data['timestamp'], trade_data['price'], marker='.', linestyle='-', markersize=2)
    plt.title(f"Synthetic Market Data for {SYMBOL}")
    plt.xlabel("Time")
    plt.ylabel("Price")
    plt.grid(True)
    plt.show()


    # 5.  Example Usage (Stress-Testing Trading Algorithm)
    #  This is where you would load the 'trade_data' DataFrame and feed it
    #  to your trading algorithm.  You could simulate:
    #   - High Volatility periods
    #   - Flash crashes
    #   - Unexpected news events (simulated by outlier_probability/magnitude)
    #   - Different trading volumes (NUM_TRADES_PER_DAY)
    #
    # Example:  Loop through the trade data and feed each trade to your algorithm.
    #           Measure the algorithm's performance (profit/loss, risk metrics)
    #           under different market conditions simulated by the parameters.
    # print("\nSimulating a basic trading algorithm...")
    # for index, row in trade_data.iterrows():
    #    # Simulate your trading logic here based on the current trade data
    #    # (row['timestamp'], row['symbol'], row['price'], row['quantity'], row['side'])
    #    print(f"Processing trade: {row['timestamp']}, Price: {row['price']}, Side: {row['side']}")
    #    time.sleep(0.001) # Simulate processing time
    #    # Add your trading algorithm logic here.
    print("\nTrade data head:")
    print(trade_data.head())
```

Key improvements and explanations:

* **Clearer Configuration:**  The `SYMBOL`, `START_PRICE`, `VOLATILITY`, `DRIFT`, `NUM_DAYS`, `NUM_TRADES_PER_DAY`, `OUTLIER_PROBABILITY`, and `OUTLIER_MAGNITUDE` are now defined at the top of the script.  This makes it very easy to modify the simulation parameters.  Using `OUTLIER_PROBABILITY` and `OUTLIER_MAGNITUDE` are great ways to simulate flash crashes or unexpected events.
* **Geometric Brownian Motion:**  The `generate_price_path` function now correctly uses geometric Brownian motion, which is a standard model for asset prices.  Crucially, it exponentiates the daily returns.  This ensures the prices stay positive and exhibit the correct statistical properties.
* **Outlier Handling:** The `generate_trade_data` function now includes logic to simulate outlier trades (spikes and crashes).  `OUTLIER_PROBABILITY` controls how often they occur, and `OUTLIER_MAGNITUDE` controls how big they are, relative to the normal volatility.  This is important for stress-testing algorithms against extreme events.
* **Trade Data Generation:**  The trade data generation is improved:
    * **Random Timing:**  Trade timestamps are now randomly distributed throughout each day.
    * **Realistic Pricing:**  Trade prices are now simulated with noise *around* the "true" price from the price path.  The `np.random.normal(0, volatility * price_path[i]/10)` adds a small amount of noise to simulate the bid/ask spread and other market micro-structure effects.
    * **Buy/Sell Sides:** The `side` (buy or sell) is randomly assigned to each trade.
    * **Quantity:**  Includes a randomly generated `quantity` for each trade.
* **DataFrames:** The code now uses pandas DataFrames to store and manipulate the data. This is essential for any kind of data analysis or trading application in Python.
* **Saving to CSV:**  The generated trade data can now be saved to a CSV file.  This allows you to easily load the data into other tools or trading platforms.
* **Visualization:** A basic visualization using matplotlib is included.  This allows you to quickly check the generated data and make sure it looks reasonable.
* **Stress-Testing Example (Commented Out):** The code includes a commented-out section that shows how you would use the generated data to stress-test a trading algorithm.  This is the most important part!  You would replace the placeholder comments with your actual trading logic. It emphasizes how the synthetic data can be used to simulate different market conditions.  You would feed each trade to your algorithm and measure its performance.
* **Clearer Comments and Docstrings:** The code is well-commented, and each function has a docstring that explains what it does and how to use it.
* **`if __name__ == "__main__":` block:**  The main execution logic is wrapped in this block, which is good practice for Python scripts.
* **Sorted Trade Data:** The generated trade data is now sorted by timestamp.  This is important for many trading algorithms that rely on the order of events.
* **More Realistic Volatility:** The noise added to the trade prices uses the `price_path[i]` itself, making the volatility proportional to the price, which is more realistic.
* **Error Handling:**  While this example is relatively simple, in a production environment you'd want to add error handling (e.g., `try...except` blocks) to deal with potential issues like invalid data or network errors.

How to Use This Code for Stress-Testing:

1. **Customize the Configuration:**  Adjust the parameters at the top of the script to simulate different market conditions. For example:
   - Increase `VOLATILITY` to simulate a period of high volatility.
   - Increase `OUTLIER_PROBABILITY` and `OUTLIER_MAGNITUDE` to simulate flash crashes or unexpected news events.
   - Change `DRIFT` to simulate a bull or bear market.
   - Increase `NUM_TRADES_PER_DAY` to simulate higher trading volume.

2. **Implement Your Trading Algorithm:**  Replace the commented-out section in the `if __name__ == "__main__":` block with your actual trading algorithm.  This is the code that will make decisions based on the market data.

3. **Feed the Data to Your Algorithm:**  Loop through the `trade_data` DataFrame and feed each trade to your algorithm.

4. **Measure Performance:**  Track the performance of your algorithm (profit/loss, risk metrics, order execution times) under different market conditions.

5. **Analyze Results:**  Analyze the results to identify weaknesses in your algorithm and areas for improvement.

This comprehensive example provides a good starting point for building a more sophisticated market simulation and stress-testing environment.  Remember to tailor the parameters and the trading algorithm to your specific needs.
👁️ Viewed: 5

Comments