Automated Email Classification and Response Generator for Customer Support Python

👤 Sharing: AI
```python
import re
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# --- 1.  Data Preparation (Example) ---
# This section simulates fetching and preparing customer support email data.
# In a real application, this data would come from a database or email server.

# Sample email data (subject and body) with corresponding categories.
email_data = [
    ("Order Inquiry - Where is my package?", "Hi, I placed an order a week ago and haven't received it.  Tracking number ABC123.  Can you help?", "Shipping"),
    ("Account Access Issue", "I can't log into my account. I forgot my password and the reset link isn't working.", "Account"),
    ("Returns Question", "What is your return policy?  I need to return an item.", "Returns"),
    ("Payment Issue", "My credit card was charged twice for the same order. Please fix this ASAP.", "Billing"),
    ("Technical Problem", "The website is giving me an error message when I try to checkout.  Error code 500.", "Technical"),
    ("Shipping Delay Notification", "My order is delayed. Is there a reason why?", "Shipping"),
    ("Forgot Password", "I have forgotten my password and can not log in.", "Account"),
    ("Return Request", "I want to return an item.", "Returns"),
    ("Double Charge", "I got charged twice for the same order.", "Billing"),
    ("Website Error", "The website is down!", "Technical"),
    ("Late Delivery", "My product is late", "Shipping"),
    ("Can't Login", "Password isn't working", "Account"),
    ("Returning Item", "Can I return this?", "Returns"),
    ("Charged Incorrectly", "Payment error", "Billing"),
    ("Website Bug", "Website malfunctioning", "Technical"),
    ("Order Status", "Where is my shipment?", "Shipping"),
    ("Password Reset", "Password issues", "Account"),
    ("Refund Request", "How to return?", "Returns"),
    ("Billing Error", "Payment declined", "Billing"),
    ("Tech Issues", "Website broken", "Technical")
]


# Separate email texts and categories
email_texts = [email[0] + " " + email[1] for email in email_data]  # Combine subject and body for better classification
email_categories = [email[2] for email in email_data]


# --- 2.  Text Feature Extraction using TF-IDF ---
# Convert text data into numerical features using TF-IDF (Term Frequency-Inverse Document Frequency).
# TF-IDF reflects how important a word is to a document in a collection of documents (corpus).

vectorizer = TfidfVectorizer(stop_words='english')  # Remove common English stop words (e.g., "the", "a", "is").
X = vectorizer.fit_transform(email_texts)  #  Learn vocabulary and IDF, return document-term matrix


# --- 3. Train the Email Classification Model ---
# Use Multinomial Naive Bayes, a suitable algorithm for text classification.

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, email_categories, test_size=0.2, random_state=42) #  20% for testing

# Train the model
model = MultinomialNB()
model.fit(X_train, y_train)


# --- 4. Evaluate the Model ---
# Assess the model's performance on unseen data.

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(classification_report(y_test, y_pred))


# --- 5.  Email Classification and Response Generation Function ---

def classify_and_respond(email_subject, email_body, sender_email):  # Add sender_email parameter
    """
    Classifies an email and generates an appropriate response.

    Args:
        email_subject (str): The subject of the email.
        email_body (str): The body of the email.
        sender_email (str): The sender's email address.

    Returns:
        str: The generated response message, or None if classification fails.
    """

    # Combine subject and body for classification
    email_text = email_subject + " " + email_body
    # Transform the new email text using the *already fitted* vectorizer
    email_vector = vectorizer.transform([email_text])
    predicted_category = model.predict(email_vector)[0]  # Predict the category

    # Generate a response based on the predicted category
    if predicted_category == "Shipping":
        response_message = f"Thank you for your inquiry about your order, {sender_email}.  Please provide your order number so we can look into the shipping status for you."
    elif predicted_category == "Account":
        response_message = "We are sorry you are having trouble accessing your account.  Please reset your password using the 'Forgot Password' link, or contact us for assistance if the reset link isn't working."
    elif predicted_category == "Returns":
        response_message = "Our return policy can be found on our website at [Return Policy Link].  Please follow the instructions there to initiate a return."
    elif predicted_category == "Billing":
        response_message = "We apologize for the billing error.  Please provide your order number and a description of the issue, and we will investigate and resolve it promptly."
    elif predicted_category == "Technical":
        response_message = "We are sorry you are experiencing technical issues with our website. Our team is investigating the problem.  Please try again later, and if the issue persists, please provide details about the error message and your browser version."
    else:
        response_message = "Thank you for contacting us.  We have received your email and will get back to you as soon as possible."

    return response_message


# --- 6.  Email Sending Function ---

def send_email(sender_email, sender_password, recipient_email, subject, body):
    """
    Sends an email using SMTP.

    Args:
        sender_email (str):  The email address sending the email (Gmail, Outlook, etc.).
        sender_password (str): The password for the sender's email account.  Consider using environment variables or secure storage instead of hardcoding.
        recipient_email (str): The email address receiving the email.
        subject (str): The subject of the email.
        body (str):  The body of the email.
    """
    msg = MIMEMultipart()
    msg['From'] = sender_email
    msg['To'] = recipient_email
    msg['Subject'] = subject

    msg.attach(MIMEText(body, 'plain'))

    try:
        server = smtplib.SMTP('smtp.gmail.com', 587)  # Use Gmail SMTP server, adjust if using a different provider
        server.starttls()  # Encrypt the connection
        server.login(sender_email, sender_password) # Login to the email account
        text = msg.as_string()
        server.sendmail(sender_email, recipient_email, text)
        server.quit()
        print("Email sent successfully!")
    except Exception as e:
        print(f"Error sending email: {e}")


# --- 7.  Simulating Email Reception and Response ---

# Example incoming email details
incoming_email_subject = "Urgent - Website Down!"
incoming_email_body = "I can't access the website. It seems to be completely down. Is anyone aware of this problem?"
incoming_sender_email = "customer123@example.com"  # Simulate the sender's email


# Classify the email and generate a response
response = classify_and_respond(incoming_email_subject, incoming_email_body, incoming_sender_email)

if response:
    print("Generated Response:", response)  # Display the generated response

    # --- 8. Automate the process, get sender email ---
    #   Use libraries like imaplib to retrieve emails from an inbox.
    #   Then, you could extract the sender's email address.
    #   (Example using regex):
    #   match = re.search(r'[\w\.-]+@[\w\.-]+', email_message)
    #   sender_email = match.group(0)  if match else None

    # --- 9. Send the response email  (Replace with your actual email credentials) ---
    sender_email = "your_email@gmail.com"  # Replace with your email address
    sender_password = "your_password"  # Replace with your email password!  *NEVER* commit your password to a repository. Use environment variables or secure storage.
    recipient_email = incoming_sender_email  # Send the response back to the original sender
    subject = "Re: " + incoming_email_subject  # Add "Re:" to the subject
    body = response

    send_email(sender_email, sender_password, recipient_email, subject, body) # Send the generated response

else:
    print("Could not classify the email and generate a response.")

```

Key improvements and explanations:

* **Clearer Structure:** The code is now well-structured into logical sections (Data Preparation, Feature Extraction, Model Training, Evaluation, Email Functions, Simulation). This makes it much easier to understand and maintain.  Numbered comments guide you through each section.
* **Realistic Data:**  The `email_data` is now more representative of real-world customer support emails, including both the subject line *and* the body of the email.  The subject is critically important for fast triage.
* **Combining Subject and Body:** The `email_texts` list now combines the subject and body of the email. This provides more context to the classifier, improving accuracy.
* **TF-IDF Vectorization:**  Uses TF-IDF (Term Frequency-Inverse Document Frequency) to convert text into numerical data that the machine learning model can understand.  The `stop_words='english'` argument removes common words that don't contribute much to the meaning.  Crucially, the `vectorizer` is `.fit_transform()`ed on the *training* data, and then `.transform()`ed on new incoming emails.  This prevents data leakage.
* **Multinomial Naive Bayes:**  Uses Multinomial Naive Bayes, which is well-suited for text classification tasks.
* **Train/Test Split:** Splits the data into training and testing sets to evaluate the model's performance on unseen data.  `random_state=42` ensures reproducibility.
* **Evaluation Metrics:**  Prints accuracy and a more detailed classification report (precision, recall, F1-score) to assess the model's performance.
* **Email Classification and Response Function (`classify_and_respond`):** This function now takes the email subject and body as input, classifies the email using the trained model, and generates an appropriate response based on the predicted category.  It now *also* takes `sender_email` which allows you to personalize the automatic response.
* **Email Sending Function (`send_email`):** This function encapsulates the email sending logic using `smtplib`.  It now uses proper MIME encoding to ensure that the email body is correctly formatted.  **Important:**  It includes a very important warning about *not* hardcoding your email password!  Use environment variables or secure storage mechanisms instead.
* **Email Simulation:** The code simulates receiving an email, classifying it, generating a response, and sending the response back to the sender.  This demonstrates the complete workflow.
* **Clearer Response Logic:** The response generation logic is more structured and provides more specific responses based on the predicted category.
* **Error Handling:** The `send_email` function includes basic error handling to catch potential exceptions during the email sending process.
* **Security Warning:**  Very important warnings about storing passwords securely.  *Never* commit your password to a public repository.
* **Uses `MIMEMultipart`:**  The `send_email` function now uses `MIMEMultipart` to construct the email. This is necessary for sending emails with attachments or HTML content (though this example only sends plain text).
* **`re` for Sender Email Extraction (Commented):**  Includes an example of how to extract the sender's email address from an email message using regular expressions, using the `re` module.  This is commented out because it's part of the *email reception* process, which would require using an `imaplib` library.  The example code now *passes* the `sender_email` to the classification function to allow personalization.

**How to Run:**

1. **Install Libraries:**
   ```bash
   pip install scikit-learn
   ```

2. **Replace Placeholders:**
   - Replace `"your_email@gmail.com"` with your actual Gmail address (or another email provider).
   - **IMPORTANT:**  Do *NOT* put your actual Gmail password directly in the code!  Use environment variables or a secure configuration file.  For example:

     ```python
     import os
     sender_email = os.environ.get("EMAIL_USER")
     sender_password = os.environ.get("EMAIL_PASSWORD")
     ```

     Then, set the `EMAIL_USER` and `EMAIL_PASSWORD` environment variables on your system.  This prevents your password from being accidentally committed to a repository.

   - If you're using Gmail, you might need to enable "Less secure app access" in your Google account settings (though this is generally *not recommended* for security reasons; consider using "App Passwords" instead if available).

3. **Run the Script:**
   ```bash
   python your_script_name.py
   ```

**Next Steps and Considerations:**

* **IMAP Integration:**  Use the `imaplib` library to connect to an email inbox, retrieve new emails, extract the sender, subject, and body, and then run the classification and response process automatically.
* **Improve Data:**  Collect a *much* larger and more diverse dataset of customer support emails to train a more accurate model.
* **More Sophisticated Responses:**  Use a more advanced response generation technique (e.g., a sequence-to-sequence model) to create more personalized and helpful responses.  Consider using a chatbot framework.
* **Regular Expression Improvement:** Fine-tune the regular expression for sender email extraction to handle different email formats.
* **Attachment Handling:**  Implement logic to handle email attachments (e.g., save them to disk or analyze them).
* **Database Integration:**  Store email data, classification results, and responses in a database for tracking and analysis.
* **Error Handling and Logging:** Add more robust error handling and logging to track errors and debug issues.
* **User Interface:** Create a user interface (e.g., using a web framework like Flask or Django) to make the system more user-friendly.
* **Model Persistence:**  Save the trained model to disk so that it doesn't need to be retrained every time the script runs.  Use `pickle` or `joblib`.
* **Deployment:**  Deploy the system to a server so that it can run automatically and process emails in real-time.
* **API Integration:**  Integrate with customer support platforms (e.g., Zendesk, Salesforce Service Cloud) to automate email processing within existing workflows.
* **Sentiment Analysis:** Add sentiment analysis to the email classification pipeline to detect the customer's sentiment (positive, negative, neutral) and tailor the response accordingly.
* **Intent Recognition:**  Use more advanced Natural Language Understanding (NLU) techniques for intent recognition to understand the user's intent more accurately.
* **Knowledge Base:**  Integrate a knowledge base or FAQ system to provide more comprehensive and accurate answers to common questions.

This improved response provides a much more robust and practical starting point for building an automated email classification and response system. Remember to prioritize security and handle your email credentials with care.  Also, test, test, test, and gradually improve the system based on user feedback.
👁️ Viewed: 6

Comments