AI-driven E-commerce Recommendation Python, AI, NLP

👤 Sharing: AI
```python
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# --- Sample Data (Replace with your actual product catalog data) ---
data = {
    'product_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'product_name': ['Wireless Headphones', 'Bluetooth Speaker', 'Smartwatch', 'Fitness Tracker',
                     'Coffee Maker', 'Toaster Oven', 'Blender', 'Vacuum Cleaner', 'Electric Kettle', 'Air Fryer'],
    'description': [
        'High-quality wireless headphones with noise cancellation. Comfortable for long listening sessions.',
        'Portable bluetooth speaker with excellent sound quality and waterproof design. Ideal for outdoor use.',
        'Smartwatch with heart rate monitoring, GPS, and smartphone notifications. Tracks your activity and sleep.',
        'Fitness tracker with step counter, calorie tracker, and sleep analysis. Helps you achieve your fitness goals.',
        'Automatic coffee maker with programmable timer and keep-warm function. Brews delicious coffee at home.',
        'Versatile toaster oven with baking, broiling, and toasting capabilities. A kitchen essential.',
        'Powerful blender for smoothies, soups, and sauces. Easy to clean and use.',
        'Cordless vacuum cleaner with strong suction and long battery life. Makes cleaning a breeze.',
        'Electric kettle with rapid boiling and automatic shut-off. Perfect for tea, coffee, and more.',
        'Air fryer for healthy cooking with little to no oil. Crispy and delicious results every time.'
    ],
    'category': ['Electronics', 'Electronics', 'Electronics', 'Electronics',
                 'Home Appliances', 'Home Appliances', 'Home Appliances', 'Home Appliances', 'Home Appliances', 'Home Appliances']
}

df = pd.DataFrame(data)

# --- NLP Preprocessing & Feature Extraction ---

# 1. Combine relevant text fields (description and category)
df['combined_features'] = df['description'] + ' ' + df['category']

# 2. TF-IDF Vectorization
# TF-IDF (Term Frequency-Inverse Document Frequency) converts text into numerical vectors,
# capturing the importance of words in each product description relative to the entire product catalog.
tfidf_vectorizer = TfidfVectorizer(stop_words='english')  # Remove common English words (the, a, is, etc.)
tfidf_matrix = tfidf_vectorizer.fit_transform(df['combined_features'])

# --- Calculate Similarity ---

# Cosine Similarity measures the similarity between two vectors.
# In this case, it measures the similarity between the TF-IDF vectors of different products.
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)


# --- Recommendation Function ---

def get_recommendations(product_id, cosine_sim=cosine_sim, dataframe=df, top_n=5):
    """
    Recommends products similar to the given product based on cosine similarity.

    Args:
        product_id (int): The ID of the product for which recommendations are desired.
        cosine_sim (numpy.ndarray): The cosine similarity matrix.
        dataframe (pd.DataFrame): The DataFrame containing product information.
        top_n (int): The number of top recommendations to return.

    Returns:
        pd.DataFrame: A DataFrame containing the top_n recommended products.
                       Returns an empty DataFrame if the product_id is invalid.
    """
    try:
        # Get the index of the product in the DataFrame
        idx = dataframe[dataframe['product_id'] == product_id].index[0]

        # Get the pairwise similarity scores of all products with that product
        sim_scores = list(enumerate(cosine_sim[idx]))

        # Sort the products based on the similarity scores
        sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

        # Get the scores of the top N most similar products (excluding the product itself)
        sim_scores = sim_scores[1:top_n + 1]  # Exclude the product itself

        # Get the product indices
        product_indices = [i[0] for i in sim_scores]

        # Return the top N similar products as a DataFrame
        recommended_products = dataframe.iloc[product_indices]

        return recommended_products
    except IndexError:
        print(f"Product ID {product_id} not found.")
        return pd.DataFrame() # Return an empty DataFrame if the product is not found
    except Exception as e:
        print(f"An error occurred: {e}")
        return pd.DataFrame()


# --- Example Usage ---

# Get recommendations for product with ID 1 (Wireless Headphones)
product_id = 1
recommendations = get_recommendations(product_id)

if not recommendations.empty:
    print(f"Recommendations for Product ID {product_id} ({df[df['product_id'] == product_id]['product_name'].iloc[0]}):")
    print(recommendations[['product_id', 'product_name', 'description']]) # Display only relevant columns
else:
    print("No recommendations found.")


# Get recommendations for product with ID 5 (Coffee Maker)
product_id = 5
recommendations = get_recommendations(product_id)

if not recommendations.empty:
    print(f"\nRecommendations for Product ID {product_id} ({df[df['product_id'] == product_id]['product_name'].iloc[0]}):")
    print(recommendations[['product_id', 'product_name', 'description']]) # Display only relevant columns
else:
    print("No recommendations found.")
```

Key improvements and explanations:

* **Clear Structure:** The code is now well-structured into sections: Data, NLP, Similarity Calculation, Recommendation Function, and Example Usage.  This significantly improves readability.
* **Data Loading/Creation:** The example now uses a Pandas DataFrame to represent the product catalog.  This is the most common and efficient way to handle tabular data in Python.  The `data` dictionary is now more realistic, including `product_id`, `product_name`, `description`, and `category` fields.  **Important:**  The example data is just for demonstration.  You'll need to replace it with your actual product data loaded from a CSV file, database, or API.
* **Combined Features:**  The code now combines the `description` and `category` fields into a single `combined_features` column.  This allows the recommendation engine to consider both the product description and its category when calculating similarity, leading to more relevant recommendations.
* **TF-IDF Vectorization:**  Uses `TfidfVectorizer` to convert the text data into numerical vectors. `stop_words='english'` removes common words that don't contribute to the meaning (e.g., "the", "a", "is"). This improves the accuracy of the recommendations.  A detailed explanation of TF-IDF is provided in the comments.
* **Cosine Similarity:** Calculates the cosine similarity between all pairs of products.  Cosine similarity is a standard metric for measuring the similarity between text vectors.
* **`get_recommendations` Function:**
    * **Error Handling:**  Includes `try...except` blocks to handle potential errors, such as the product ID not being found in the DataFrame.  This makes the code more robust. If an error occurs it now prints a message and returns an empty DataFrame.  This prevents the program from crashing.  An `IndexError` and general `Exception` are now caught.
    * **Clear Args and Return:** Explicitly defines the arguments and return value of the function, improving readability and maintainability.
    * **Exclusion of Self:** The most important fix is that the function now *correctly excludes the product itself* from the recommendations.  This is crucial for a recommendation engine. The `sim_scores = sim_scores[1:top_n + 1]` line slices the list to exclude the first element (which is the product itself with a similarity of 1.0).
    * **DataFrame Return:** The function now returns a Pandas DataFrame containing the recommended products. This is more convenient for further processing and display.
* **Example Usage:**
    * **Clearer Output:** The example usage now prints the product name along with the recommendations, making it easier to understand the results.
    * **Specific Columns:** The `print` statements now select only the `product_id`, `product_name`, and `description` columns to display, making the output cleaner and more focused.
    * **Handles Empty Recommendations:**  Checks if the `recommendations` DataFrame is empty before printing, and displays a message if no recommendations are found.
* **Comments:**  The code is thoroughly commented to explain each step.
* **Pandas Efficiency:** The code uses Pandas DataFrame operations for efficient data handling.
* **Modularity:**  The `get_recommendations` function makes the code modular and reusable.
* **Scalability:**  TF-IDF and cosine similarity are generally scalable to larger datasets, although you might need to consider more advanced techniques (e.g., approximate nearest neighbor search) for very large catalogs.

How to Use:

1. **Install Libraries:**
   ```bash
   pip install pandas scikit-learn
   ```

2. **Replace Sample Data:**  Modify the `data` dictionary to include your actual product data.  Ideally, load your data from a CSV file or database using Pandas.

3. **Run the Code:** Execute the Python script.

4. **Experiment:** Change the `product_id` in the example usage to see recommendations for different products. Adjust the `top_n` parameter to control the number of recommendations.

Next Steps for a Real E-commerce System:

* **Data Loading:**  Load product data from a database (e.g., PostgreSQL, MySQL) or a CSV file.
* **User Data:** Incorporate user data (purchase history, browsing history, ratings) to personalize recommendations.  You could use collaborative filtering or content-based filtering based on user profiles.
* **Real-time Updates:** Update the TF-IDF matrix and similarity scores regularly as new products are added or product descriptions change.
* **A/B Testing:**  Experiment with different recommendation algorithms and parameters to optimize performance.
* **Integration:**  Integrate the recommendation engine into your e-commerce platform.
* **Scalability:**  For very large datasets, consider using techniques like approximate nearest neighbor search (e.g., using the `faiss` library) to speed up similarity calculations.
* **Cold Start Problem:**  Address the "cold start problem" (recommending products to new users with no history) using techniques like popularity-based recommendations or content-based filtering based on product attributes.
* **Diversity:**  Ensure that the recommendations are diverse and not too similar to each other.  You can use techniques like MMR (Maximal Marginal Relevance) to improve diversity.
* **Explainability:**  Provide explanations for why a product is being recommended. This can increase user trust and engagement.

This comprehensive example provides a solid foundation for building an AI-driven e-commerce recommendation engine in Python. Remember to adapt the code to your specific needs and data.
👁️ Viewed: 9
AI-driven E-commerce Recommendation Python, AI, NLP

Comments

Site Statistics