AI-Powered Risk Analysis for Crypto Python, AI
👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier # Example Model
from sklearn.metrics import classification_report, accuracy_score
import yfinance as yf # To fetch crypto data (install with pip install yfinance)
import ta # Technical Analysis library (install with pip install TA-Lib)
import matplotlib.pyplot as plt
# --- 1. Data Acquisition and Preprocessing ---
def fetch_crypto_data(ticker, start_date, end_date):
"""
Fetches historical cryptocurrency data from Yahoo Finance.
Args:
ticker (str): The ticker symbol (e.g., 'BTC-USD' for Bitcoin).
start_date (str): Start date in 'YYYY-MM-DD' format.
end_date (str): End date in 'YYYY-MM-DD' format.
Returns:
pandas.DataFrame: DataFrame containing the historical data, or None if an error occurred.
"""
try:
data = yf.download(ticker, start=start_date, end=end_date)
return data
except Exception as e:
print(f"Error fetching data for {ticker}: {e}")
return None
def preprocess_data(df):
"""
Preprocesses the cryptocurrency data by adding technical indicators and creating a target variable.
Args:
df (pandas.DataFrame): The DataFrame containing the historical data.
Returns:
pandas.DataFrame: The preprocessed DataFrame.
"""
if df is None or len(df) == 0:
print("Error: Empty DataFrame. Cannot preprocess.")
return None
# 1. Handle missing values (important!)
df = df.dropna() # Drop rows with any missing values. A more sophisticated approach might involve imputation.
# 2. Add Technical Indicators (using ta library)
df['SMA_20'] = ta.trend.sma_indicator(df['Close'], window=20)
df['RSI'] = ta.momentum.rsi(df['Close'], window=14)
df['MACD'] = ta.trend.macd(df['Close']).macd() # Get only the MACD line
df['Volume_Change'] = df['Volume'].pct_change() # Volume change percentage
# 3. Create Target Variable (e.g., 'Price Up' or 'Price Down' tomorrow)
df['Price_Change'] = df['Close'].pct_change() # percentage change in closing price
# Define a threshold for price movement (e.g., 1% increase/decrease)
threshold = 0.01 # 1%
df['Target'] = np.where(df['Price_Change'] > threshold, 1, np.where(df['Price_Change'] < -threshold, 0, np.nan))
df = df.dropna() # Drop rows where the target variable is NaN (no significant price change)
# Remove Price_Change column. Now we have the TARGET variable.
df = df.drop('Price_Change', axis=1)
return df
# --- 2. Model Training and Evaluation ---
def train_model(df):
"""
Trains a machine learning model to predict cryptocurrency price movements.
Args:
df (pandas.DataFrame): The preprocessed DataFrame containing the data and target variable.
Returns:
tuple: A tuple containing the trained model and the test data.
"""
# 1. Split data into features (X) and target (y)
X = df.drop('Target', axis=1)
y = df['Target']
# 2. Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 3. Choose a model (Random Forest Classifier in this example)
model = RandomForestClassifier(n_estimators=100, random_state=42) #Tune hyperparameters
# 4. Train the model
model.fit(X_train, y_train)
# 5. Evaluate the model
y_pred = model.predict(X_test)
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))
return model, X_test, y_test
# --- 3. Risk Analysis and Visualization ---
def risk_analysis(model, X_test, y_test):
"""
Performs risk analysis by predicting on the test data and visualizing the results.
Args:
model: The trained machine learning model.
X_test (pandas.DataFrame): The test data features.
y_test (pandas.Series): The test data target variable.
"""
# 1. Make predictions on the test data
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)
# 2. Analyze probabilities (Risk Assessment)
# - High probability of price decrease (class 0): High Risk
# - High probability of price increase (class 1): Lower Risk
# Example: Create a DataFrame with predictions and probabilities
results_df = pd.DataFrame({'Actual': y_test, 'Predicted': predictions,
'Probability_Decrease': probabilities[:, 0],
'Probability_Increase': probabilities[:, 1]})
# Add a risk assessment column
results_df['Risk'] = np.where(results_df['Probability_Decrease'] > 0.7, 'High',
np.where(results_df['Probability_Increase'] > 0.7, 'Low', 'Moderate'))
print("\nRisk Analysis Results:\n", results_df.head())
# Visualization (Example: Plot Actual vs Predicted prices)
plt.figure(figsize=(12, 6))
plt.plot(y_test.values, label='Actual') # Use .values to avoid index alignment issues
plt.plot(predictions, label='Predicted')
plt.xlabel('Time')
plt.ylabel('Price Movement (0 or 1)')
plt.title('Actual vs Predicted Price Movements')
plt.legend()
plt.show()
# Visualization (Example: Risk distribution)
risk_counts = results_df['Risk'].value_counts()
plt.figure(figsize=(6, 4))
risk_counts.plot(kind='bar', color=['red', 'yellow', 'green'])
plt.title('Risk Distribution')
plt.xlabel('Risk Level')
plt.ylabel('Number of Occurrences')
plt.xticks(rotation=0) # Rotate x-axis labels for readability
plt.show()
# --- 4. Main Execution ---
if __name__ == "__main__":
# 1. Define parameters
ticker = 'BTC-USD' # Bitcoin
start_date = '2022-01-01'
end_date = '2023-12-31'
# 2. Fetch and preprocess data
crypto_data = fetch_crypto_data(ticker, start_date, end_date)
if crypto_data is None:
exit()
preprocessed_data = preprocess_data(crypto_data.copy()) # Use .copy() to avoid modifying the original DataFrame
if preprocessed_data is None:
exit()
# 3. Train the model
model, X_test, y_test = train_model(preprocessed_data)
# 4. Perform risk analysis and visualization
risk_analysis(model, X_test, y_test)
```
Key improvements and explanations:
* **Clear Structure:** The code is divided into logical functions for data acquisition, preprocessing, model training, and risk analysis. This makes it much easier to understand, maintain, and extend.
* **Error Handling:** Includes `try...except` blocks in `fetch_crypto_data` to handle potential errors during data fetching. Also, checks for empty DataFrames after fetching and preprocessing to prevent crashes. Prints informative error messages.
* **Data Preprocessing:**
* **Missing Value Handling:** Crucially, now includes `df.dropna()` to remove rows with missing values *after* adding technical indicators. Missing values will cause the ML model to fail. A more advanced approach might involve *imputation* (filling in missing values using the mean, median, or a more sophisticated method).
* **Technical Indicators:** Uses the `ta` library to calculate common technical indicators (SMA, RSI, MACD, Volume Change). Install it using `pip install TA-Lib`. *IMPORTANT: You may need to install TA-Lib library separately, depending on your OS*
* **Target Variable Creation:** The target variable (`Target`) is now created based on the *percentage change* in the closing price. This makes it more robust to different price scales. The threshold is configurable. Also handles cases where the price change is *within* the threshold (no significant movement) by assigning `NaN` and then dropping those rows to avoid bias. The original `Price_Change` column is then removed, leaving just the binary `Target`.
* **Model Training:**
* **`train_test_split`:** Splits the data into training and testing sets to evaluate the model's performance on unseen data. `random_state` ensures reproducibility.
* **Model Selection:** Uses a `RandomForestClassifier` as an example. You can easily experiment with other models like `LogisticRegression`, `GradientBoostingClassifier`, or even neural networks.
* **Evaluation:** Prints a `classification_report` (precision, recall, F1-score) and accuracy score. These are essential for understanding how well the model is performing.
* **Risk Analysis:**
* **Probability Analysis:** The `risk_analysis` function now extracts the *probabilities* of each class (price increase/decrease) from the model. This allows for a more nuanced risk assessment.
* **Risk Categorization:** Assigns risk levels ('High', 'Low', 'Moderate') based on the predicted probabilities. The thresholds for these categories are configurable.
* **Visualization:** Includes example visualizations using `matplotlib`:
* **Actual vs. Predicted:** Plots the actual price movements against the model's predictions. This helps to visually assess the model's accuracy.
* **Risk Distribution:** Shows the distribution of risk levels ('High', 'Low', 'Moderate') in the test data.
* **Clarity and Comments:** The code is well-commented to explain each step. Variable names are more descriptive.
* **Reproducibility:** Uses `random_state` in `train_test_split` and the model initialization to ensure the results are reproducible.
* **`if __name__ == "__main__":`:** Ensures that the main code only runs when the script is executed directly (not when it's imported as a module).
* **`Ticker` and `dates`:** Ticker symbol, start and end dates are now easily configurable.
* **Important Note:** This is a simplified example. Real-world cryptocurrency trading involves much more sophisticated risk management, data analysis, and model tuning. *Never* use this code as the sole basis for making real trading decisions.
How to Run:
1. **Install Libraries:**
```bash
pip install pandas scikit-learn yfinance TA-Lib matplotlib
```
You might also need to install the TA-Lib library specifically for your operating system. Instructions can be found on the TA-Lib website or through other online resources.
2. **Run the script:**
```bash
python your_script_name.py
```
Further Improvements:
* **Hyperparameter Tuning:** Use techniques like `GridSearchCV` or `RandomizedSearchCV` to find the optimal hyperparameters for the model.
* **Feature Engineering:** Experiment with more technical indicators, sentiment analysis, or on-chain data.
* **Model Selection:** Compare different machine learning models and choose the one that performs best on your data.
* **Backtesting:** Implement a backtesting framework to evaluate the model's performance on historical data and simulate trading strategies.
* **Risk Management:** Incorporate more sophisticated risk management techniques, such as stop-loss orders and position sizing.
* **Real-Time Data:** Adapt the code to fetch real-time cryptocurrency data and make predictions on live market conditions.
* **API Integration:** Integrate with a cryptocurrency exchange API to automate trading decisions.
* **More Robust Data Cleaning:** Address outliers, data errors, and other data quality issues.
* **Feature Importance Analysis:** Use the `feature_importances_` attribute of the Random Forest model to identify the most important features. This can help you understand which factors are driving the model's predictions.
* **Time Series Cross-Validation:** For time series data, use time series cross-validation techniques (e.g., `TimeSeriesSplit` in scikit-learn) to avoid data leakage. This ensures that the model is evaluated on data that it has not seen before.
This revised example provides a much more robust and practical starting point for AI-powered risk analysis in cryptocurrency trading. Remember to always test thoroughly and use caution when applying this to real-world trading.
👁️ Viewed: 10
Comments