AI-enhanced Product Recommender Python, AI

👤 Sharing: AI
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import joblib  # For saving and loading models
import random


class AIProductRecommender:
    """
    A product recommendation system using TF-IDF and cosine similarity.
    Enhanced with a simple AI component that learns from user interactions
    (e.g., clicks, purchases) to improve recommendations.
    """

    def __init__(self, data_path="products.csv"):
        """
        Initializes the recommender system.

        Args:
            data_path (str, optional): Path to the product data CSV. Defaults to "products.csv".
        """
        self.data_path = data_path
        self.products = None  # DataFrame to store product data
        self.tfidf_matrix = None # TF-IDF matrix representing product descriptions
        self.tfidf_vectorizer = None # TF-IDF vectorizer instance
        self.user_interaction_data = {} # Dictionary to store user interaction data for AI component. Key = user_id, value = list of product_ids user interacted with
        self.load_data()
        self.preprocess_data()
        self.train_model()



    def load_data(self):
        """
        Loads product data from a CSV file.
        Assumes the CSV has at least 'product_id', 'name', and 'description' columns.
        """
        try:
            self.products = pd.read_csv(self.data_path)
        except FileNotFoundError:
            print(f"Error: File not found at {self.data_path}")
            self.products = pd.DataFrame()  # Initialize empty DataFrame
            return # exit the load_data method early
        except Exception as e:
            print(f"Error loading data: {e}")
            self.products = pd.DataFrame()
            return


        # Basic data cleaning (handling missing values)
        self.products = self.products.dropna(subset=['name', 'description'])

        # Ensure product_id is a string for consistency
        self.products['product_id'] = self.products['product_id'].astype(str)

        print(f"Loaded {len(self.products)} products.")

    def preprocess_data(self):
         """
         Combines product name and description into a single text feature for TF-IDF.
         """
         if self.products is None or self.products.empty:
             print("No product data to preprocess.  Ensure data is loaded correctly.")
             return

         # Create a combined feature for text analysis
         self.products['combined_text'] = self.products['name'] + ' ' + self.products['description']

    def train_model(self):
        """
        Trains the TF-IDF vectorizer and generates the TF-IDF matrix.
        """
        if self.products is None or self.products.empty:
            print("No product data to train the model. Ensure data is loaded correctly.")
            return

        # Create TF-IDF vectorizer
        self.tfidf_vectorizer = TfidfVectorizer(stop_words='english')

        # Fit and transform the combined text data
        self.tfidf_matrix = self.tfidf_vectorizer.fit_transform(self.products['combined_text'])
        print("TF-IDF model trained.")



    def get_recommendations(self, product_id, top_n=5):
        """
        Recommends products similar to the given product based on TF-IDF and cosine similarity.

        Args:
            product_id (str): The ID of the product to find similar products for.
            top_n (int, optional): The number of recommendations to return. Defaults to 5.

        Returns:
            list: A list of product IDs representing the top_n recommendations.
        """
        if self.products is None or self.products.empty or self.tfidf_matrix is None:
            print("Model not trained or data not loaded. Cannot provide recommendations.")
            return []


        try:
            product_index = self.products[self.products['product_id'] == product_id].index[0]
        except IndexError:
            print(f"Product with ID {product_id} not found.")
            return []

        # Calculate cosine similarity between the target product and all other products
        cosine_similarities = cosine_similarity(self.tfidf_matrix[product_index], self.tfidf_matrix).flatten()

        # Get indices of the most similar products (excluding the input product itself)
        similar_product_indices = cosine_similarities.argsort()[::-1][1:top_n+1] # Exclude the first one as it's the product itself

        # Retrieve product IDs for the recommended products
        recommended_product_ids = self.products.iloc[similar_product_indices]['product_id'].tolist()

        return recommended_product_ids


    def record_user_interaction(self, user_id, product_id):
      """
      Records user interactions (e.g., clicks, purchases) to improve recommendations.

      Args:
          user_id (str or int): The ID of the user.
          product_id (str): The ID of the product the user interacted with.
      """
      user_id = str(user_id)  # Ensure user_id is a string

      if user_id not in self.user_interaction_data:
          self.user_interaction_data[user_id] = []

      if product_id not in self.user_interaction_data[user_id]: # avoid duplicates
          self.user_interaction_data[user_id].append(product_id)
          print(f"User {user_id} interacted with product {product_id}. Recorded.")
      else:
          print(f"User {user_id} already recorded interacting with product {product_id}.")



    def get_personalized_recommendations(self, user_id, top_n=5):
        """
        Provides personalized recommendations based on user interaction history.

        Args:
            user_id (str or int): The ID of the user.
            top_n (int, optional): The number of recommendations to return. Defaults to 5.

        Returns:
            list: A list of product IDs representing the top_n personalized recommendations.
        """
        user_id = str(user_id)  # Ensure user_id is a string

        if user_id not in self.user_interaction_data:
            print(f"No interaction data found for user {user_id}. Returning general recommendations.")
            #Return a list of random products if no user data exists.
            if self.products is not None and not self.products.empty:
                return random.sample(self.products['product_id'].tolist(), min(top_n, len(self.products)))
            else:
                return []

        # Get the products the user has interacted with
        interacted_products = self.user_interaction_data[user_id]

        # For each interacted product, get similar products
        recommended_products = []
        for product_id in interacted_products:
            recommended_products.extend(self.get_recommendations(product_id, top_n=2)) # Reduce top_n here as we are combining results from multiple products

        # Remove duplicates and interacted products from the recommendations
        recommended_products = list(set(recommended_products) - set(interacted_products))

        # Return the top N recommendations
        return recommended_products[:top_n]

    def save_model(self, filename="recommender_model.joblib"):
        """
        Saves the trained model (TF-IDF vectorizer and product data) to a file.
        """
        model_data = {
            'tfidf_vectorizer': self.tfidf_vectorizer,
            'tfidf_matrix': self.tfidf_matrix,
            'products': self.products,
            'user_interaction_data': self.user_interaction_data
        }
        joblib.dump(model_data, filename)
        print(f"Model saved to {filename}")

    def load_model(self, filename="recommender_model.joblib"):
        """
        Loads a saved model from a file.
        """
        try:
            model_data = joblib.load(filename)
            self.tfidf_vectorizer = model_data['tfidf_vectorizer']
            self.tfidf_matrix = model_data['tfidf_matrix']
            self.products = model_data['products']
            self.user_interaction_data = model_data.get('user_interaction_data', {}) # Handle missing key for backward compatibility
            print(f"Model loaded from {filename}")
        except FileNotFoundError:
            print(f"Error: Model file not found at {filename}. Please train the model first or provide the correct path.")
        except Exception as e:
            print(f"Error loading model: {e}")


# Example Usage
if __name__ == '__main__':
    # Create a dummy products.csv file (or use your own)
    dummy_data = {
        'product_id': ['1', '2', '3', '4', '5', '6'],
        'name': ['Laptop A', 'Laptop B', 'Mouse X', 'Keyboard Y', 'Monitor Z', 'Tablet P'],
        'description': [
            'A powerful laptop for gaming',
            'A lightweight laptop for work',
            'A high-precision gaming mouse',
            'An ergonomic keyboard for comfortable typing',
            'A high-resolution monitor for crisp visuals',
            'A versatile tablet for on-the-go productivity'
        ]
    }
    df = pd.DataFrame(dummy_data)
    df.to_csv('products.csv', index=False)

    # Initialize the recommender system
    recommender = AIProductRecommender()

    # Get recommendations for a product
    product_id = '1'  # Laptop A
    recommendations = recommender.get_recommendations(product_id)
    print(f"Recommendations for product {product_id}: {recommendations}")

    # Record user interaction
    user_id = 'user123'
    recommender.record_user_interaction(user_id, product_id)
    recommender.record_user_interaction(user_id, '3') # Mouse X

    # Get personalized recommendations
    personalized_recommendations = recommender.get_personalized_recommendations(user_id)
    print(f"Personalized recommendations for user {user_id}: {personalized_recommendations}")

    # Save the model
    recommender.save_model("my_recommender.joblib")

    # Load the model
    loaded_recommender = AIProductRecommender()
    loaded_recommender.load_model("my_recommender.joblib")

    # Get recommendations after loading
    product_id = '2'
    recommendations_after_load = loaded_recommender.get_recommendations(product_id)
    print(f"Recommendations for product {product_id} after loading: {recommendations_after_load}")
```

Key improvements and explanations:

* **Clear Class Structure:** Encapsulates the recommender system's logic within the `AIProductRecommender` class. This makes the code more organized, reusable, and easier to understand.

* **Data Loading and Preprocessing:**
    * `load_data()`:  Loads the data from a CSV.  Includes error handling for `FileNotFoundError` and general exceptions during data loading. It *also* includes a check to ensure the loaded DataFrame isn't empty before continuing, and handles missing values (important for real-world data). `product_id` is explicitly converted to a string type.
    * `preprocess_data()`: Creates a combined text feature from the product name and description. This is what the TF-IDF vectorizer will analyze.  It also checks if data has loaded before processing.

* **TF-IDF Vectorization and Model Training:**
    * `train_model()`: Uses `TfidfVectorizer` to convert the text data into a numerical representation (TF-IDF matrix).  This is essential for calculating similarity between products. The `stop_words='english'` argument removes common English words that don't contribute much to the analysis. Includes a check if the data has loaded.

* **Recommendation Generation:**
    * `get_recommendations()`: Calculates the cosine similarity between a given product and all other products. Returns the top N most similar product IDs. Includes comprehensive error handling: checks for data and model availability and handles the case where the product ID is not found.  Critically, it excludes the input product itself from the recommendations.

* **AI Component (User Interaction):**
    * `record_user_interaction()`: Simulates user interaction by storing which products a user has interacted with (e.g., clicked, purchased).  Uses a dictionary `user_interaction_data` to store this information.  Crucially converts user_id to string, avoids duplicate entries for a given user and product.
    * `get_personalized_recommendations()`: Uses the user interaction data to provide personalized recommendations.  It finds products similar to those the user has interacted with.  Removes duplicates and products the user has already interacted with from the recommendations.  If no data exists for a user it falls back to returning random products or an empty list, preventing errors.  It now generates recommendations based on *all* products a user interacted with, not just the last one.
* **Model Persistence (Saving and Loading):**
    * `save_model()`: Saves the trained TF-IDF vectorizer, TF-IDF matrix, product data, and user interaction data to a file using `joblib`.  This is crucial for avoiding retraining the model every time the program is run.
    * `load_model()`: Loads the saved model from a file.  Includes error handling for `FileNotFoundError` (if the model file doesn't exist) and other potential errors during loading. Includes a check that handles backward compatibility if the user interaction data doesn't exist in the model file.

* **Clear Example Usage:** The `if __name__ == '__main__':` block provides a complete example of how to use the recommender system:
    * Creates a dummy `products.csv` file.
    * Initializes the recommender.
    * Gets general recommendations.
    * Simulates user interaction.
    * Gets personalized recommendations.
    * Saves and loads the model.

* **Error Handling:** The code includes comprehensive error handling to make it more robust.  It checks for:
    * Missing data files
    * Invalid product IDs
    * Data loading errors
    * Model loading errors
    * Cases where the model hasn't been trained yet

* **Data Type Consistency:** The `product_id` and `user_id` are consistently treated as strings. This is important for avoiding potential type-related errors.

* **Comments and Docstrings:**  The code is well-commented, and each method has a clear docstring explaining its purpose, arguments, and return value.

* **Efficiency:** Uses TF-IDF and cosine similarity, which are efficient for text-based recommendation systems.

* **Modularity:** The code is modular, making it easy to extend or modify. For example, you could easily add new features to the product data or use a different similarity metric.

How to run the code:

1. **Save:** Save the code as a `.py` file (e.g., `product_recommender.py`).
2. **Install Libraries:** Make sure you have the necessary libraries installed:
   ```bash
   pip install pandas scikit-learn joblib
   ```
3. **Run:** Execute the script from your terminal:
   ```bash
   python product_recommender.py
   ```

The code will:

1. Create a dummy `products.csv` file.
2. Train the recommendation model.
3. Print recommendations for a product.
4. Record user interaction.
5. Print personalized recommendations.
6. Save the trained model to `my_recommender.joblib`.
7. Load the model from the file.
8. Print recommendations again after loading the model.

This improved version addresses the previous issues, provides a more complete and robust solution, and adheres to best practices for Python programming.  It is a good starting point for building a more sophisticated AI-enhanced product recommendation system.
👁️ Viewed: 8

Comments