Generates synthetic, emotion labeled dialogues for chatbot training Python

👤 Sharing: AI
```python
import random

# Define a dictionary of emotions and corresponding dialogue snippets
emotion_lexicon = {
    "joy": [
        "That's fantastic! I'm so happy for you.",
        "Wonderful news! This makes my day.",
        "I'm thrilled to hear that!",
        "Amazing! This is great to hear.",
        "I'm absolutely delighted!"
    ],
    "sadness": [
        "I'm so sorry to hear that. It must be difficult.",
        "That's really unfortunate. I feel for you.",
        "I understand how upsetting that must be.",
        "I'm here for you if you need anything.",
        "That's heartbreaking to hear."
    ],
    "anger": [
        "I understand your frustration. That's unacceptable.",
        "That's really unfair. I can see why you're angry.",
        "That's infuriating! I'd be upset too.",
        "It's okay to be angry in this situation.",
        "That's a really bad situation, I'm sorry."
    ],
    "fear": [
        "That sounds really scary! I understand your fear.",
        "It's okay to be afraid. Take deep breaths.",
        "I can only imagine how frightening that must be.",
        "I understand you're feeling scared.",
        "That situation would scare anyone!"
    ],
    "surprise": [
        "Wow! I didn't expect that!",
        "That's quite a surprise!",
        "Really? That's unexpected!",
        "I'm totally surprised to hear that.",
        "That's amazing, I'm in shock."
    ],
    "neutral": [
        "Okay, I understand.",
        "That's interesting.",
        "Thanks for letting me know.",
        "I see.",
        "Understood."
    ]
}


def generate_dialogue(emotion, topic="general"):
    """
    Generates a synthetic dialogue turn with a given emotion label.

    Args:
        emotion (str): The emotion label for the dialogue turn (e.g., "joy", "sadness").
        topic (str):  A (currently unused) string representing the topic of the conversation. Can be expanded in the future.

    Returns:
        tuple: A tuple containing the generated dialogue turn (str) and the emotion label (str).
    """

    if emotion not in emotion_lexicon:
        return "I am not sure how to respond to that emotion.", "unknown"  # Fallback for unknown emotions.

    response = random.choice(emotion_lexicon[emotion])
    return response, emotion


def create_synthetic_dialogue_dataset(num_examples=100):
    """
    Generates a synthetic dialogue dataset with emotion labels.

    Args:
        num_examples (int): The number of dialogue examples to generate.

    Returns:
        list: A list of tuples, where each tuple contains a dialogue turn (str) and its emotion label (str).
    """

    emotions = list(emotion_lexicon.keys())  # Get a list of valid emotions.
    dataset = []

    for _ in range(num_examples):
        emotion = random.choice(emotions)  # Randomly choose an emotion.
        dialogue, label = generate_dialogue(emotion)
        dataset.append((dialogue, label))

    return dataset


# Example usage:
if __name__ == "__main__":
    # Generate a sample dataset
    synthetic_data = create_synthetic_dialogue_dataset(num_examples=10)

    # Print the generated data
    for dialogue, emotion in synthetic_data:
        print(f"Dialogue: {dialogue}")
        print(f"Emotion: {emotion}")
        print("-" * 20)

    # Example of generating a dialogue with a specific emotion:
    sad_response, sad_emotion = generate_dialogue("sadness")
    print(f"Sad Response: {sad_response} (Emotion: {sad_emotion})")

    angry_response, angry_emotion = generate_dialogue("anger")
    print(f"Angry Response: {angry_response} (Emotion: {angry_emotion})")
```

Key improvements and explanations:

* **Clearer Structure:** The code is organized into functions for better readability and reusability.
* **Emotion Lexicon:**  The `emotion_lexicon` dictionary is the heart of the system.  It maps emotions to lists of possible responses.  This allows the chatbot to generate contextually relevant (but synthetic) replies.  This is *crucial* for creating a labeled dataset.  The dictionary is designed to be easily expandable with more emotions and more varied responses for each emotion.
* **`generate_dialogue()` function:** This function takes an emotion and generates a dialogue turn based on the `emotion_lexicon`.  It also includes a basic "topic" parameter, which is currently unused but can be extended to generate more topic-specific dialogues in the future.  Crucially, it handles unknown emotions gracefully.
* **`create_synthetic_dialogue_dataset()` function:** This function generates a dataset of dialogue turns and their corresponding emotion labels.  This is what produces the training data.
* **Randomization:** The code uses `random.choice()` to select responses and emotions, adding variability to the generated data.
* **Example Usage ( `if __name__ == "__main__":` ):** The `if __name__ == "__main__":` block demonstrates how to use the functions to generate a dataset and individual dialogue turns.  This makes the code much easier to understand and test.
* **Comments and Docstrings:** The code includes comprehensive comments and docstrings to explain the purpose of each function and variable.
* **Error Handling:** The `generate_dialogue` function includes a basic error handler for when an unknown emotion is passed.  This prevents the script from crashing.
* **Extensible:**  The code is designed to be easily extended.  You can add more emotions to the `emotion_lexicon`, modify the dialogue generation logic, and incorporate more sophisticated techniques for generating realistic and diverse dialogues.
* **Conciseness:**  I've tried to keep the code concise and easy to understand without sacrificing clarity.
* **Correctness:** The code now produces the correct output: a list of dialogue turns and their emotion labels.
* **`topic` Parameter:** I've added a placeholder `topic` parameter to the `generate_dialogue()` function.  This is not currently used, but it's included to demonstrate how you could potentially expand the code to generate dialogues that are specific to certain topics.  For example, you could have a separate lexicon for "sports" or "politics."
* **String Formatting:** Uses f-strings for more readable string formatting.

How to use it:

1.  **Run the script:**  Save the code as a Python file (e.g., `dialogue_generator.py`) and run it from your terminal: `python dialogue_generator.py`

2.  **Review the output:** The script will print a sample of the generated dialogue data to the console. Each line will show a dialogue turn and its corresponding emotion label.

3.  **Customize the dataset:**  Modify the `num_examples` parameter in the `create_synthetic_dialogue_dataset()` function to generate a dataset of the desired size. You can also add more emotions and responses to the `emotion_lexicon` to improve the diversity of the generated data.  Consider making the responses contextually linked to a `topic`.

4.  **Save the dataset:** You can easily save the generated dataset to a file (e.g., a CSV or JSON file) for use in your chatbot training process.  Add code like this inside the `if __name__ == "__main__":` block:

```python
import csv

# Save the dataset to a CSV file
with open("synthetic_dialogue_data.csv", "w", newline="", encoding="utf-8") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["dialogue", "emotion"])  # Write header row
    writer.writerows(synthetic_data)  # Write the data
print("Dataset saved to synthetic_dialogue_data.csv")
```

This revised response provides a functional and well-documented script for generating synthetic, emotion-labeled dialogues that is ready to be used for chatbot training.  It also explains how to customize, save, and use the generated data.
👁️ Viewed: 4
Generates synthetic, emotion labeled dialogues for chatbot training Python

Comments

Site Statistics