AI-Powered Risk Analysis for Crypto Python, AI

👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier  # Example Model
from sklearn.metrics import classification_report, accuracy_score
import yfinance as yf  # To fetch crypto data (install with pip install yfinance)
import ta  # Technical Analysis library (install with pip install TA-Lib)
import matplotlib.pyplot as plt

# --- 1. Data Acquisition and Preprocessing ---

def fetch_crypto_data(ticker, start_date, end_date):
    """
    Fetches historical cryptocurrency data from Yahoo Finance.

    Args:
        ticker (str):  The ticker symbol (e.g., 'BTC-USD' for Bitcoin).
        start_date (str): Start date in 'YYYY-MM-DD' format.
        end_date (str): End date in 'YYYY-MM-DD' format.

    Returns:
        pandas.DataFrame: DataFrame containing the historical data, or None if an error occurred.
    """
    try:
        data = yf.download(ticker, start=start_date, end=end_date)
        return data
    except Exception as e:
        print(f"Error fetching data for {ticker}: {e}")
        return None


def preprocess_data(df):
    """
    Preprocesses the cryptocurrency data by adding technical indicators and creating a target variable.

    Args:
        df (pandas.DataFrame): The DataFrame containing the historical data.

    Returns:
        pandas.DataFrame: The preprocessed DataFrame.
    """
    if df is None or len(df) == 0:
        print("Error: Empty DataFrame.  Cannot preprocess.")
        return None

    # 1. Handle missing values (important!)
    df = df.dropna() # Drop rows with any missing values. A more sophisticated approach might involve imputation.

    # 2. Add Technical Indicators (using ta library)
    df['SMA_20'] = ta.trend.sma_indicator(df['Close'], window=20)
    df['RSI'] = ta.momentum.rsi(df['Close'], window=14)
    df['MACD'] = ta.trend.macd(df['Close']).macd()  # Get only the MACD line
    df['Volume_Change'] = df['Volume'].pct_change() # Volume change percentage

    # 3. Create Target Variable (e.g., 'Price Up' or 'Price Down' tomorrow)
    df['Price_Change'] = df['Close'].pct_change()  # percentage change in closing price

    # Define a threshold for price movement (e.g., 1% increase/decrease)
    threshold = 0.01  # 1%
    df['Target'] = np.where(df['Price_Change'] > threshold, 1, np.where(df['Price_Change'] < -threshold, 0, np.nan))
    df = df.dropna() # Drop rows where the target variable is NaN (no significant price change)

    # Remove Price_Change column. Now we have the TARGET variable.
    df = df.drop('Price_Change', axis=1)
    return df



# --- 2. Model Training and Evaluation ---

def train_model(df):
    """
    Trains a machine learning model to predict cryptocurrency price movements.

    Args:
        df (pandas.DataFrame): The preprocessed DataFrame containing the data and target variable.

    Returns:
        tuple: A tuple containing the trained model and the test data.
    """
    # 1. Split data into features (X) and target (y)
    X = df.drop('Target', axis=1)
    y = df['Target']

    # 2. Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # 3. Choose a model (Random Forest Classifier in this example)
    model = RandomForestClassifier(n_estimators=100, random_state=42) #Tune hyperparameters

    # 4. Train the model
    model.fit(X_train, y_train)

    # 5. Evaluate the model
    y_pred = model.predict(X_test)
    print("Classification Report:\n", classification_report(y_test, y_pred))
    print("Accuracy:", accuracy_score(y_test, y_pred))

    return model, X_test, y_test


# --- 3. Risk Analysis and Visualization ---

def risk_analysis(model, X_test, y_test):
    """
    Performs risk analysis by predicting on the test data and visualizing the results.

    Args:
        model: The trained machine learning model.
        X_test (pandas.DataFrame): The test data features.
        y_test (pandas.Series): The test data target variable.
    """
    # 1. Make predictions on the test data
    predictions = model.predict(X_test)
    probabilities = model.predict_proba(X_test)

    # 2. Analyze probabilities (Risk Assessment)
    #   - High probability of price decrease (class 0): High Risk
    #   - High probability of price increase (class 1): Lower Risk

    # Example: Create a DataFrame with predictions and probabilities
    results_df = pd.DataFrame({'Actual': y_test, 'Predicted': predictions,
                                'Probability_Decrease': probabilities[:, 0],
                                'Probability_Increase': probabilities[:, 1]})

    # Add a risk assessment column
    results_df['Risk'] = np.where(results_df['Probability_Decrease'] > 0.7, 'High',
                                  np.where(results_df['Probability_Increase'] > 0.7, 'Low', 'Moderate'))

    print("\nRisk Analysis Results:\n", results_df.head())

    # Visualization (Example: Plot Actual vs Predicted prices)
    plt.figure(figsize=(12, 6))
    plt.plot(y_test.values, label='Actual') # Use .values to avoid index alignment issues
    plt.plot(predictions, label='Predicted')
    plt.xlabel('Time')
    plt.ylabel('Price Movement (0 or 1)')
    plt.title('Actual vs Predicted Price Movements')
    plt.legend()
    plt.show()

    # Visualization (Example: Risk distribution)
    risk_counts = results_df['Risk'].value_counts()
    plt.figure(figsize=(6, 4))
    risk_counts.plot(kind='bar', color=['red', 'yellow', 'green'])
    plt.title('Risk Distribution')
    plt.xlabel('Risk Level')
    plt.ylabel('Number of Occurrences')
    plt.xticks(rotation=0) # Rotate x-axis labels for readability
    plt.show()

# --- 4. Main Execution ---

if __name__ == "__main__":
    # 1. Define parameters
    ticker = 'BTC-USD'  # Bitcoin
    start_date = '2022-01-01'
    end_date = '2023-12-31'

    # 2. Fetch and preprocess data
    crypto_data = fetch_crypto_data(ticker, start_date, end_date)
    if crypto_data is None:
        exit()

    preprocessed_data = preprocess_data(crypto_data.copy())  # Use .copy() to avoid modifying the original DataFrame
    if preprocessed_data is None:
        exit()


    # 3. Train the model
    model, X_test, y_test = train_model(preprocessed_data)

    # 4. Perform risk analysis and visualization
    risk_analysis(model, X_test, y_test)
```

Key improvements and explanations:

* **Clear Structure:** The code is divided into logical functions for data acquisition, preprocessing, model training, and risk analysis.  This makes it much easier to understand, maintain, and extend.
* **Error Handling:** Includes `try...except` blocks in `fetch_crypto_data` to handle potential errors during data fetching.  Also, checks for empty DataFrames after fetching and preprocessing to prevent crashes. Prints informative error messages.
* **Data Preprocessing:**
    * **Missing Value Handling:**  Crucially, now includes `df.dropna()` to remove rows with missing values *after* adding technical indicators.  Missing values will cause the ML model to fail. A more advanced approach might involve *imputation* (filling in missing values using the mean, median, or a more sophisticated method).
    * **Technical Indicators:** Uses the `ta` library to calculate common technical indicators (SMA, RSI, MACD, Volume Change).  Install it using `pip install TA-Lib`. *IMPORTANT: You may need to install TA-Lib library separately, depending on your OS*
    * **Target Variable Creation:**  The target variable (`Target`) is now created based on the *percentage change* in the closing price.  This makes it more robust to different price scales. The threshold is configurable.  Also handles cases where the price change is *within* the threshold (no significant movement) by assigning `NaN` and then dropping those rows to avoid bias.  The original `Price_Change` column is then removed, leaving just the binary `Target`.
* **Model Training:**
    * **`train_test_split`:**  Splits the data into training and testing sets to evaluate the model's performance on unseen data. `random_state` ensures reproducibility.
    * **Model Selection:** Uses a `RandomForestClassifier` as an example. You can easily experiment with other models like `LogisticRegression`, `GradientBoostingClassifier`, or even neural networks.
    * **Evaluation:** Prints a `classification_report` (precision, recall, F1-score) and accuracy score.  These are essential for understanding how well the model is performing.
* **Risk Analysis:**
    * **Probability Analysis:** The `risk_analysis` function now extracts the *probabilities* of each class (price increase/decrease) from the model. This allows for a more nuanced risk assessment.
    * **Risk Categorization:**  Assigns risk levels ('High', 'Low', 'Moderate') based on the predicted probabilities.  The thresholds for these categories are configurable.
    * **Visualization:** Includes example visualizations using `matplotlib`:
        * **Actual vs. Predicted:** Plots the actual price movements against the model's predictions.  This helps to visually assess the model's accuracy.
        * **Risk Distribution:**  Shows the distribution of risk levels ('High', 'Low', 'Moderate') in the test data.
* **Clarity and Comments:**  The code is well-commented to explain each step. Variable names are more descriptive.
* **Reproducibility:** Uses `random_state` in `train_test_split` and the model initialization to ensure the results are reproducible.
* **`if __name__ == "__main__":`:**  Ensures that the main code only runs when the script is executed directly (not when it's imported as a module).
* **`Ticker` and `dates`:** Ticker symbol, start and end dates are now easily configurable.
* **Important Note:**  This is a simplified example.  Real-world cryptocurrency trading involves much more sophisticated risk management, data analysis, and model tuning. *Never* use this code as the sole basis for making real trading decisions.

How to Run:

1. **Install Libraries:**
   ```bash
   pip install pandas scikit-learn yfinance TA-Lib matplotlib
   ```
   You might also need to install the TA-Lib library specifically for your operating system. Instructions can be found on the TA-Lib website or through other online resources.

2. **Run the script:**
   ```bash
   python your_script_name.py
   ```

Further Improvements:

* **Hyperparameter Tuning:**  Use techniques like `GridSearchCV` or `RandomizedSearchCV` to find the optimal hyperparameters for the model.
* **Feature Engineering:** Experiment with more technical indicators, sentiment analysis, or on-chain data.
* **Model Selection:** Compare different machine learning models and choose the one that performs best on your data.
* **Backtesting:** Implement a backtesting framework to evaluate the model's performance on historical data and simulate trading strategies.
* **Risk Management:** Incorporate more sophisticated risk management techniques, such as stop-loss orders and position sizing.
* **Real-Time Data:**  Adapt the code to fetch real-time cryptocurrency data and make predictions on live market conditions.
* **API Integration:**  Integrate with a cryptocurrency exchange API to automate trading decisions.
* **More Robust Data Cleaning:**  Address outliers, data errors, and other data quality issues.
* **Feature Importance Analysis:** Use the `feature_importances_` attribute of the Random Forest model to identify the most important features.  This can help you understand which factors are driving the model's predictions.
* **Time Series Cross-Validation:**  For time series data, use time series cross-validation techniques (e.g., `TimeSeriesSplit` in scikit-learn) to avoid data leakage.  This ensures that the model is evaluated on data that it has not seen before.

This revised example provides a much more robust and practical starting point for AI-powered risk analysis in cryptocurrency trading. Remember to always test thoroughly and use caution when applying this to real-world trading.
👁️ Viewed: 10

Comments