AI-Powered Crypto Market Trends Python, AI

👤 Sharing: AI
```python
import yfinance as yf
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import numpy as np

# 1. Data Acquisition: Using yfinance to get historical crypto data

def get_crypto_data(ticker, start_date, end_date):
    """
    Fetches historical crypto data from Yahoo Finance using yfinance.

    Args:
        ticker (str): The ticker symbol for the cryptocurrency (e.g., 'BTC-USD' for Bitcoin).
        start_date (str): The start date for data retrieval (YYYY-MM-DD).
        end_date (str): The end date for data retrieval (YYYY-MM-DD).

    Returns:
        pandas.DataFrame: A DataFrame containing historical crypto data.  Returns None if there's an error.
    """
    try:
        data = yf.download(ticker, start=start_date, end=end_date)
        return data
    except Exception as e:
        print(f"Error fetching data for {ticker}: {e}")
        return None

# Example usage:
ticker = 'BTC-USD'  # Bitcoin
start_date = '2022-01-01'
end_date = '2023-12-31'

crypto_data = get_crypto_data(ticker, start_date, end_date)

if crypto_data is None:
    print("Failed to retrieve data.  Exiting.")
    exit()


print(crypto_data.head())  # Display the first few rows of the data


# 2. Data Preprocessing and Feature Engineering

def preprocess_data(df):
    """
    Preprocesses the crypto data for model training.  This includes:
        - Adding lag features (previous day's prices).
        - Handling missing values (if any).

    Args:
        df (pandas.DataFrame): The DataFrame containing historical crypto data.

    Returns:
        pandas.DataFrame: The preprocessed DataFrame.
    """

    df['Lag1'] = df['Close'].shift(1)  # Add a lagged feature (previous day's closing price)
    df['Lag2'] = df['Close'].shift(2)  # Add another lagged feature

    df['SMA_5'] = df['Close'].rolling(window=5).mean()  # 5-day Simple Moving Average
    df['SMA_20'] = df['Close'].rolling(window=20).mean() # 20-day Simple Moving Average

    df.dropna(inplace=True)  # Remove rows with NaN values (created by the shift and rolling average)

    return df

crypto_data = preprocess_data(crypto_data)
print(crypto_data.head()) # Show the preprocessed data


# 3. Model Training: Using Linear Regression for prediction

def train_model(df):
    """
    Trains a Linear Regression model to predict crypto prices.

    Args:
        df (pandas.DataFrame): The preprocessed DataFrame.

    Returns:
        tuple: A tuple containing the trained model and the test data.
    """

    X = df[['Lag1', 'Lag2', 'SMA_5', 'SMA_20']]  # Features
    y = df['Close']  # Target variable (what we want to predict)

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # 80% training, 20% testing

    # Create a Linear Regression model
    model = LinearRegression()

    # Train the model
    model.fit(X_train, y_train)

    return model, X_test, y_test

model, X_test, y_test = train_model(crypto_data)


# 4. Model Evaluation

def evaluate_model(model, X_test, y_test):
    """
    Evaluates the trained model using Mean Squared Error (MSE).

    Args:
        model: The trained model.
        X_test: The test features.
        y_test: The test target values.

    Returns:
        float: The Mean Squared Error.
    """
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"Mean Squared Error: {mse}")

    # Plotting predictions vs. actual values (optional)
    plt.figure(figsize=(10, 6))
    plt.scatter(y_test, y_pred)
    plt.xlabel("Actual Prices")
    plt.ylabel("Predicted Prices")
    plt.title("Actual vs. Predicted Crypto Prices")
    plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red')  # Add a diagonal line for reference
    plt.show()


    return mse

mse = evaluate_model(model, X_test, y_test)



# 5. Prediction (Optional - but important to show how to use the model)

def predict_next_day(model, df):
    """
    Predicts the closing price for the next day.  Uses the last row of the DataFrame
    to create the features for the prediction.

    Args:
        model: The trained model.
        df: The DataFrame containing the historical data (already preprocessed).

    Returns:
        float: The predicted closing price for the next day, or None if there's an issue.
    """

    # Get the last row of the DataFrame
    last_row = df.iloc[-1]

    # Create the features for the prediction
    lag1 = last_row['Close']  # Use the most recent closing price as Lag1
    lag2 = df['Close'].iloc[-2] # use the second to last row as lag2 (one day before lag1)
    sma_5 = df['Close'].tail(5).mean() #calculate SMA_5
    sma_20 = df['Close'].tail(20).mean() # calculate SMA_20

    # Create a feature array for the model
    features = np.array([lag1, lag2, sma_5, sma_20]).reshape(1, -1)  # Reshape to a 2D array

    try:
        # Make the prediction
        prediction = model.predict(features)[0] # returns a numpy array of one element

        return prediction

    except Exception as e:
        print(f"Error making prediction: {e}")
        return None


# Example usage:
next_day_prediction = predict_next_day(model, crypto_data)

if next_day_prediction is not None:
    print(f"Predicted closing price for the next day: {next_day_prediction}")


# 6. Important Considerations and Improvements:

#    - Feature Engineering: Experiment with more sophisticated features, such as:
#        - Technical indicators (RSI, MACD, Bollinger Bands)
#        - Sentiment analysis from news articles or social media
#        - Volume data
#    - Model Selection: Try other machine learning models, such as:
#        - Random Forest
#        - Support Vector Regression (SVR)
#        - Neural Networks (e.g., LSTMs) - these are often very good for time series.
#    - Hyperparameter Tuning: Optimize the hyperparameters of the chosen model using techniques like GridSearchCV or RandomizedSearchCV.
#    - Rolling Window Training:  Re-train the model periodically (e.g., daily or weekly) using a rolling window of historical data to adapt to changing market conditions. This is crucial for time series data.
#    - Risk Management:  Always incorporate risk management strategies into your trading system.
#    - Data Cleaning:  Robust data cleaning is essential. Handle missing data carefully and consider outliers.
#    - Backtesting: Thoroughly backtest your strategy on historical data before deploying it with real money.
#    - Real-time Data:  Connect to a real-time data feed to get the latest crypto prices.
#    - Alerting:  Set up alerts to notify you when the model generates a buy or sell signal.
#    - Regularization: Use techniques like L1 or L2 regularization in your linear regression to prevent overfitting, especially if you have many features.

```

Key improvements and explanations:

* **Clearer Comments and Docstrings:**  The code is now thoroughly commented with docstrings that explain the purpose of each function, the arguments it takes, and the values it returns.  This makes the code much easier to understand.
* **Error Handling:** Includes `try...except` blocks to handle potential errors when fetching data from yfinance and making predictions.  This prevents the program from crashing if there are network issues or data problems.
* **Modular Design:** The code is broken down into functions, making it more organized and reusable.
* **Feature Engineering:**  Adds more meaningful features, including lagged prices, SMA (Simple Moving Average).  Lagged features are crucial for time series analysis, and SMA helps capture trends.
* **Data Preprocessing:** Explicitly handles missing values (NaNs) that can be introduced by the shifting and rolling average operations.  `dropna()` removes rows with NaNs, which is necessary before training the model.
* **Model Evaluation:** Evaluates the model using Mean Squared Error (MSE), a common metric for regression problems.  Also plots predicted vs. actual prices to visually assess the model's performance.
* **Prediction Function:**  A dedicated `predict_next_day` function shows how to use the trained model to predict future prices.  It takes the model and the DataFrame as input and returns the predicted closing price for the next day.  Crucially, it recreates the feature array using the *last* row of the DataFrame to represent the most recent data. This function now properly calculates and uses SMA values for prediction.  It also now correctly reshapes the feature array as the model expects 2D input.
* **Reshaping for Prediction:** The `predict_next_day` function includes the important step of reshaping the feature array using `reshape(1, -1)`.  This is necessary because the `LinearRegression` model expects a 2D array as input, even when predicting for a single instance.
* **Important Considerations:**  A section at the end discusses important considerations for building a real-world crypto trading system, such as feature engineering, model selection, hyperparameter tuning, rolling window training, risk management, and backtesting.
* **Real-World Focus:**  The code is designed to be more practical and addresses common challenges encountered in real-world crypto market analysis.
* **Clarity and Readability:**  Improved variable names and code formatting to enhance readability.
* **Exit on Data Failure:** The program now exits gracefully if it fails to retrieve data from yfinance.
* **`random_state` for Reproducibility:**  Sets `random_state` in `train_test_split` to ensure consistent results.
* **Clearer SMA Calculation:** The `predict_next_day` function now correctly calculates the SMA values using `tail(5)` and `tail(20)` to get the most recent data for the moving averages.
* **Bug Fix (Lag2):** The `Lag2` calculation in `predict_next_day` has been corrected to use `df['Close'].iloc[-2]` to get the closing price from two days ago.

This revised code provides a much more complete and robust example of how to use Python and AI to analyze crypto market trends. Remember that this is still a simplified example, and building a profitable trading system requires a lot more research, testing, and risk management.  Do not use this code with real money without extensive backtesting and understanding the risks involved.
👁️ Viewed: 10

Comments