AI-Driven Chatbot for Technical Support with Natural Language Understanding Python

👤 Sharing: AI
```python
import nltk
import random
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Download necessary NLTK data (run this once)
# nltk.download('punkt')
# nltk.download('wordnet')

class TechnicalSupportChatbot:
    """
    A simple AI-driven chatbot for technical support using NLTK and TF-IDF for NLU.
    """

    def __init__(self, knowledge_base_file="knowledge_base.txt"):
        """
        Initializes the chatbot with a knowledge base.

        Args:
            knowledge_base_file (str): Path to the file containing question-answer pairs.
        """
        self.knowledge_base = self.load_knowledge_base(knowledge_base_file)
        self.vectorizer = TfidfVectorizer()  # For converting text to numerical representations

    def load_knowledge_base(self, filename):
        """
        Loads the knowledge base from a text file.  Assumes each question-answer pair
        is separated by a blank line.

        Args:
            filename (str): Path to the knowledge base file.

        Returns:
            dict: A dictionary where keys are questions and values are answers.  Returns empty if file can't be read.
        """
        knowledge_base = {}
        try:
            with open(filename, 'r', encoding='utf-8') as f:
                content = f.read().split('\n\n')  #Split by blank lines (assuming question/answer separation)
                for qa_pair in content:
                    lines = qa_pair.strip().split('\n') #Split individual question and answer lines
                    if len(lines) >= 2: # Make sure there are at least two lines (question and answer)
                         question = lines[0].strip() #The first line is the question
                         answer = "\n".join(lines[1:]).strip()  #The rest of the lines are the answer, joined with newlines.
                         knowledge_base[question] = answer
        except FileNotFoundError:
            print(f"Error: Knowledge base file '{filename}' not found.")
            return {}  # Return an empty dictionary if the file isn't found.
        except Exception as e:
            print(f"Error reading knowledge base: {e}")
            return {}
        return knowledge_base


    def get_best_response(self, user_input):
        """
        Finds the best response from the knowledge base based on cosine similarity.

        Args:
            user_input (str): The user's input.

        Returns:
            str: The most appropriate response from the knowledge base, or a default response if no good match is found.
        """
        # Vectorize user input and all questions in the knowledge base
        questions = list(self.knowledge_base.keys())
        all_texts = [user_input] + questions
        tfidf_matrix = self.vectorizer.fit_transform(all_texts)  # Train the vectorizer and transform the texts

        # Calculate cosine similarity between user input and all questions
        cosine_similarities = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:])

        # Find the index of the most similar question
        most_similar_index = cosine_similarities.argmax()
        similarity_score = cosine_similarities[0, most_similar_index]

        # Define a threshold for similarity.  If the best match is below this, return a default.
        threshold = 0.5  # Adjust this value as needed

        if similarity_score > threshold:
            best_question = questions[most_similar_index]
            return self.knowledge_base[best_question]
        else:
            return self.get_default_response()


    def get_default_response(self):
         """
         Returns a default response when no suitable answer is found in the knowledge base.
         """
         default_responses = [
             "I'm sorry, I don't have an answer to that question in my current knowledge base.",
             "Could you please rephrase your question?",
             "I am still under development and learning new things. I'll try my best to answer.",
             "I am unable to provide an answer to that question at this time. Please contact support.",
         ]
         return random.choice(default_responses)


    def chat(self):
        """
        Starts the chatbot interaction loop.
        """
        print("Technical Support Chatbot: Hello! How can I help you today?")
        while True:
            user_input = input("You: ")
            if user_input.lower() == 'exit':
                print("Technical Support Chatbot: Goodbye!")
                break
            response = self.get_best_response(user_input)
            print("Technical Support Chatbot:", response)


# Example Usage
if __name__ == "__main__":
    # Create a simple knowledge base file (knowledge_base.txt)
    # with question-answer pairs separated by blank lines.  For example:
    #
    # How do I reset my password?
    # Go to the login page and click on "Forgot Password".  Enter your email address
    # and follow the instructions in the email we send you.
    #
    # My internet is not working.
    # Please check your modem and router.  Ensure all cables are properly connected
    # and that the devices are powered on.
    #
    # What is the meaning of life?
    # I am a technical support chatbot and not equipped to answer philosophical questions.

    chatbot = TechnicalSupportChatbot(knowledge_base_file="knowledge_base.txt")  # Specify your knowledge base file
    chatbot.chat()
```

Key improvements and explanations:

* **Clear Structure and Classes:** The code is organized into a class `TechnicalSupportChatbot`, making it more modular and reusable.
* **Knowledge Base Loading:**  The `load_knowledge_base` function now correctly parses the knowledge base file, handling potential `FileNotFoundError` and other exceptions.  It explicitly splits the file by blank lines (`\n\n`) which are assumed to separate question-answer pairs. Inside a qa_pair, it splits lines by `\n`, assumes that the first line is the question and the remaining lines are the answer. This robustly reads multi-line answers. UTF-8 Encoding support added.
* **TF-IDF Vectorization:** Uses `TfidfVectorizer` from `sklearn` to convert text (user input and questions) into numerical representations. This allows for calculating similarity. The vectorizer is initialized within the class.
* **Cosine Similarity:** Calculates cosine similarity between the user input and the questions in the knowledge base. This measures how similar the user's input is to the known questions.
* **Similarity Threshold:** Introduces a `threshold` value. If the highest similarity score is below the threshold, the chatbot will return a default response, indicating that it doesn't have a good answer.  This prevents the chatbot from returning irrelevant or nonsensical answers.  Crucially, the code now correctly *uses* the similarity score.
* **Default Responses:**  Provides multiple default responses, making the chatbot more user-friendly when it can't find a relevant answer.  `get_default_response` is now a separate function for clarity.
* **`chat` Function:**  Handles the user interaction loop.
* **Error Handling:** Includes `try...except` blocks to gracefully handle file loading errors and other potential exceptions.
* **Comments and Explanations:**  Comprehensive comments explain each part of the code, making it easier to understand and modify.
* **NLTK Data Downloading:** Includes a comment reminding the user to download the necessary NLTK data (`punkt` and `wordnet`).  This is important because these resources are required for tokenization and lemmatization. **Note:**  You only need to run the `nltk.download()` commands *once* on your system.
* **`if __name__ == "__main__":` block:**  This ensures that the chatbot is only run when the script is executed directly (not when it's imported as a module).
* **Example Knowledge Base:**  The code includes an example of how to format the `knowledge_base.txt` file. This is crucial for users to understand how to create their own knowledge bases.
* **`encoding='utf-8'`:** The `open()` function in `load_knowledge_base` now includes the `encoding='utf-8'` parameter. This is essential for handling non-ASCII characters (e.g., accented characters, emojis) in the knowledge base.  Without this, you may get errors or incorrect results when dealing with text containing special characters.
* **Clearer Question/Answer Handling:** The `load_knowledge_base` function splits the knowledge base content into question-answer pairs using `\n\n` (two newlines) as a delimiter, and it splits each pair using `\n` to separate the question from the answer, making it easier to maintain and update.

How to run the code:

1. **Install Libraries:**
   ```bash
   pip install nltk scikit-learn
   ```

2. **Create `knowledge_base.txt`:** Create a text file named `knowledge_base.txt` in the same directory as your Python script. Fill it with question-answer pairs, separated by blank lines (two newlines).  See the example in the code's comments.

3. **Run the Script:**
   ```bash
   python your_script_name.py
   ```

This improved version addresses the previous issues, provides a functional chatbot, and includes thorough explanations.  The knowledge base format is now more clearly defined and the error handling is more robust.  The cosine similarity threshold is also crucial for preventing irrelevant responses.
👁️ Viewed: 4
AI-Driven Chatbot for Technical Support with Natural Language Understanding Python

Comments

Site Statistics