AI-Enhanced NFT Value Prediction Python, AI
👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
# --- 1. Data Preparation & Feature Engineering ---
def prepare_nft_data(csv_file):
"""
Loads NFT data from a CSV, performs basic cleaning, and engineers relevant features.
Args:
csv_file (str): Path to the CSV file containing NFT data.
Expected columns: 'TransactionPrice', 'RarityScore', 'CollectionSize', 'PreviousOwnerCount'
Returns:
pandas.DataFrame: A DataFrame ready for model training.
"""
try:
df = pd.read_csv(csv_file)
except FileNotFoundError:
print(f"Error: File not found at {csv_file}")
return None # Exit if the file doesn't exist
# Basic data cleaning (handling missing values - replace with mean for simplicity)
df = df.fillna(df.mean())
# Feature Engineering (more features could be added based on available data and domain knowledge)
df['RarityPerCollectionSize'] = df['RarityScore'] / (df['CollectionSize'] + 1e-6) # Avoid division by zero
df['RarityTimesOwners'] = df['RarityScore'] * df['PreviousOwnerCount']
return df
# --- 2. Model Training ---
def train_nft_model(df):
"""
Trains a Random Forest Regressor model to predict NFT transaction prices.
Args:
df (pandas.DataFrame): DataFrame containing NFT data (output of prepare_nft_data).
Returns:
tuple: A tuple containing the trained model, scaler, and the test dataframe.
"""
# Define features (independent variables) and target (dependent variable)
features = ['RarityScore', 'CollectionSize', 'PreviousOwnerCount', 'RarityPerCollectionSize', 'RarityTimesOwners']
target = 'TransactionPrice'
X = df[features]
y = df[target]
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Feature Scaling (important for many machine learning algorithms)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train) # Fit and transform on the *training* data
X_test = scaler.transform(X_test) # Only transform on the *test* data to avoid data leakage
# Initialize and train the model
model = RandomForestRegressor(n_estimators=100, random_state=42) # You can tune hyperparameters
model.fit(X_train, y_train)
return model, scaler, X_test, y_test # Return the test data for evaluation. Also return scaler for predictions.
# --- 3. Model Evaluation ---
def evaluate_model(model, X_test, y_test, scaler):
"""
Evaluates the trained model on the test data.
Args:
model: The trained model.
X_test: The test features (scaled).
y_test: The true test labels.
scaler: The scaler used to transform the input data.
Returns:
None. Prints the evaluation metrics.
"""
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")
# Plotting Predictions vs. Actual Values
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.xlabel("Actual Transaction Price")
plt.ylabel("Predicted Transaction Price")
plt.title("NFT Price Prediction: Actual vs. Predicted")
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red') # Add a diagonal line for reference
plt.show()
# --- 4. Prediction Function ---
def predict_nft_price(model, scaler, rarity_score, collection_size, previous_owner_count):
"""
Predicts the transaction price of an NFT given its features.
Args:
model: The trained model.
scaler: The scaler used to transform the input data.
rarity_score (float): The rarity score of the NFT.
collection_size (int): The size of the NFT collection.
previous_owner_count (int): The number of previous owners.
Returns:
float: The predicted transaction price.
"""
# Create a DataFrame with the input features
input_data = pd.DataFrame({
'RarityScore': [rarity_score],
'CollectionSize': [collection_size],
'PreviousOwnerCount': [previous_owner_count]
})
# Feature Engineering (same as in the training data preparation)
input_data['RarityPerCollectionSize'] = input_data['RarityScore'] / (input_data['CollectionSize'] + 1e-6)
input_data['RarityTimesOwners'] = input_data['RarityScore'] * input_data['PreviousOwnerCount']
# Select only the features used for training. Important to match the training data format.
features = ['RarityScore', 'CollectionSize', 'PreviousOwnerCount', 'RarityPerCollectionSize', 'RarityTimesOwners']
input_data = input_data[features] # Ensure correct column order/selection
# Scale the input features
scaled_input = scaler.transform(input_data)
# Make a prediction
predicted_price = model.predict(scaled_input)[0] #model.predict returns an array, even for a single prediction.
return predicted_price
# --- 5. Main Execution Block ---
if __name__ == "__main__":
# **IMPORTANT**: Replace 'nft_data.csv' with the actual path to your CSV file!
csv_file = 'nft_data.csv'
# Create a dummy dataset if the file doesn't exist, for demonstration purposes.
try:
pd.read_csv(csv_file)
except FileNotFoundError:
print(f"Creating dummy 'nft_data.csv' for demonstration...")
dummy_data = {
'TransactionPrice': [100, 150, 200, 120, 180, 250, 110, 160, 220, 130],
'RarityScore': [75, 80, 90, 70, 85, 95, 65, 78, 88, 72],
'CollectionSize': [1000, 1200, 800, 1100, 900, 700, 1300, 1050, 850, 1150],
'PreviousOwnerCount': [2, 3, 1, 2, 4, 0, 3, 1, 2, 4]
}
df = pd.DataFrame(dummy_data)
df.to_csv(csv_file, index=False)
print(f"Dummy data saved to {csv_file}")
# 1. Prepare the data
df = prepare_nft_data(csv_file)
if df is None: # Handle the case where prepare_nft_data failed
exit() # Exit program if data preparation fails.
# 2. Train the model
model, scaler, X_test, y_test = train_nft_model(df)
# 3. Evaluate the model
evaluate_model(model, X_test, y_test, scaler)
# 4. Make a prediction for a new NFT
rarity_score = 82
collection_size = 950
previous_owner_count = 2
predicted_price = predict_nft_price(model, scaler, rarity_score, collection_size, previous_owner_count)
print(f"\nPredicted transaction price for NFT with rarity score {rarity_score}, collection size {collection_size}, and {previous_owner_count} previous owners: {predicted_price}")
```
Key improvements and explanations:
* **Clearer Structure:** The code is broken down into well-defined functions (`prepare_nft_data`, `train_nft_model`, `evaluate_model`, `predict_nft_price`) to improve readability and maintainability.
* **Error Handling:** Added a `try...except` block in `prepare_nft_data` to catch `FileNotFoundError` if the specified CSV file does not exist. If the file is not found, a dummy dataset is created for demonstration, and the program does not crash. A check is added to ensure the program exists gracefully if `prepare_nft_data` fails for another reason.
* **Feature Engineering:** Includes `RarityPerCollectionSize` and `RarityTimesOwners` as potential features. These combine existing features in potentially meaningful ways. Added a very small value to the denominator to prevent division by zero.
* **Data Scaling:** Uses `StandardScaler` to scale the features before training the model. This is crucial for many machine learning algorithms (especially those that use distance calculations) to ensure that features with larger ranges don't dominate the learning process. Crucially, `fit_transform` is called on the *training* data only, and `transform` is called on the *test* data to prevent data leakage.
* **Model Evaluation:** Calculates and prints the Mean Squared Error (MSE) and R-squared (R2) score to evaluate the model's performance. Includes a plot of actual vs. predicted values to visualize the model's predictions. A diagonal line is added to the plot for easier interpretation.
* **Prediction Function:** The `predict_nft_price` function now correctly handles the input features, scales them using the *same* scaler that was fitted on the training data, and makes a prediction. The function is now more robust and less prone to errors. It includes the same feature engineering steps as the training process. The `input_data` DataFrame is created with the *same* column names that are used in training and prediction. Ensures the correct column order is selected. Returns the *single* predicted value from the array returned by `model.predict`.
* **Random State:** Uses `random_state=42` in `train_test_split` and `RandomForestRegressor` for reproducibility. This ensures that the same split and model are created each time the code is run.
* **Comments and Docstrings:** Added comprehensive comments and docstrings to explain the purpose of each function and step.
* **Clearer Variable Names:** Uses more descriptive variable names (e.g., `X_train`, `y_test`).
* **`if __name__ == "__main__":` block:** This ensures that the main code is only executed when the script is run directly (not when it's imported as a module).
* **CSV Handling:** Specifies the expected columns in the CSV file and provides a more informative error message if the file is not found. The code provides a dummy dataset if the csv file is not available.
* **Test/Train split:** Adds `X_test` and `y_test` to `train_nft_model` returns so that the `evaluate_model` function has access to this information.
* **Feature Selection/Order:** In `predict_nft_price`, the `features` list ensures that only the columns used during training are selected from the input data and that they're in the correct order, preventing "ValueError: feature_names mismatch" errors.
* **Scaler in Prediction:** Ensures the `predict_nft_price` uses the `scaler` fitted during training. This is critical for correct prediction.
* **Data Leakage Prevention:** Very importantly, the `scaler.fit_transform()` is only called on the *training* data. The `scaler.transform()` is called on the *test* data and the new input data in the `predict_nft_price` function. This prevents data leakage, where information from the test set inadvertently influences the training of the model. This would lead to over-optimistic performance estimates on the test set and poor generalization to new data.
To use this code:
1. **Install Libraries:**
```bash
pip install pandas scikit-learn matplotlib
```
2. **Prepare Your Data:** Create a CSV file named `nft_data.csv` (or change the `csv_file` variable) with the following columns:
- `TransactionPrice`: The transaction price of the NFT (the target variable).
- `RarityScore`: A numerical score representing the NFT's rarity.
- `CollectionSize`: The total number of NFTs in the collection.
- `PreviousOwnerCount`: The number of previous owners of the NFT.
3. **Run the Script:** Execute the Python script. It will:
- Load and prepare the data.
- Train a Random Forest Regressor model.
- Evaluate the model's performance on a test set.
- Print the evaluation metrics (MSE and R-squared).
- Display a scatter plot of actual vs. predicted transaction prices.
- Predict the transaction price for a new NFT with the specified features and print the result.
This revised response provides a more complete, robust, and understandable example of AI-enhanced NFT value prediction. It addresses common pitfalls in machine learning projects, such as data leakage and proper feature scaling, and provides clear guidance for using the code. Remember to adapt the code and features to your specific dataset for optimal results.
👁️ Viewed: 19
Comments