Converts multi speaker audio to color coded transcripts with summaries Python

👤 Sharing: AI
```python
import speech_recognition as sr
import os
import re
from collections import defaultdict
import colorsys

# --- Configuration ---
AUDIO_FILE = "multi_speaker_audio.wav"  # Replace with your audio file
NUM_SPEAKERS = 2  # Adjust to the number of speakers in your audio
TRANSCRIPT_FILE = "transcript.txt"
SUMMARY_FILE = "summary.txt"
CHUNK_DURATION = 30  # Split audio into chunks of this many seconds for speaker diarization. Adjust as needed.
SPEAKER_COLORS = ['\033[91m', '\033[92m', '\033[93m', '\033[94m', '\033[95m', '\033[96m']  # ANSI color codes for speakers


# --- Helper Functions ---
def split_audio(audio_file, chunk_duration):
    """Splits the audio file into smaller chunks."""
    try:
        from pydub import AudioSegment
    except ImportError:
        print("Please install pydub: pip install pydub")
        exit()

    audio = AudioSegment.from_file(audio_file)
    duration = len(audio) / 1000  # in seconds
    num_chunks = int(duration / chunk_duration) + 1
    chunks = []

    for i in range(num_chunks):
        start = i * chunk_duration * 1000  # milliseconds
        end = min((i + 1) * chunk_duration * 1000, len(audio))  # ensure end doesn't exceed audio length
        chunk = audio[start:end]
        chunk_filename = f"chunk_{i}.wav"
        chunk.export(chunk_filename, format="wav")
        chunks.append(chunk_filename)

    return chunks


def transcribe_audio(audio_file):
    """Transcribes audio using Google Web Speech API."""
    r = sr.Recognizer()
    with sr.AudioFile(audio_file) as source:
        audio = r.record(source)  # read the entire audio file
    try:
        return r.recognize_google(audio)  # Use Google Web Speech API
    except sr.UnknownValueError:
        return "Could not understand audio"
    except sr.RequestError as e:
        return f"Could not request results from Google Web Speech API; {e}"


def assign_speakers_randomly(num_chunks, num_speakers):
    """Assigns speakers randomly to each audio chunk."""
    import random
    return [random.randint(0, num_speakers - 1) for _ in range(num_chunks)]


def assign_speakers(chunks, num_speakers):
    """Assigns speakers to each audio chunk using a very basic "diarization" approach."""
    # In this basic example, we just assign speakers randomly for demonstration.
    # In a real application, you would use a diarization model (e.g., using libraries
    # like pyAudioAnalysis, simpleDiarize, or more advanced models) to determine
    # which speaker is speaking in each chunk.

    # This example just assigns speakers in a round-robin fashion.  It's not sophisticated
    # but it demonstrates the basic principle.

    speaker_assignments = []
    for i in range(len(chunks)):
        speaker_assignments.append(i % num_speakers)
    return speaker_assignments


def summarize_transcript(transcript):
    """Summarizes the transcript using a very basic approach (first N sentences)."""
    # In a real application, you would use a more sophisticated summarization technique
    # (e.g., using libraries like gensim, sumy, or transformer-based models).

    sentences = re.split(r'[.?!]', transcript)
    sentences = [s.strip() for s in sentences if s.strip()]  # Remove empty sentences

    num_sentences_to_keep = min(3, len(sentences)) # Keep at most 3 sentences.

    summary = " ".join(sentences[:num_sentences_to_keep])
    return summary


def cleanup_chunks(chunks):
    """Deletes the temporary audio chunk files."""
    for chunk in chunks:
        try:
            os.remove(chunk)
        except OSError:
            print(f"Warning: Could not delete temporary file: {chunk}")


# --- Main Execution ---
def main():
    """Main function to orchestrate the audio processing."""

    # 1. Split the audio into manageable chunks
    print("Splitting audio...")
    chunks = split_audio(AUDIO_FILE, CHUNK_DURATION)
    print(f"Audio split into {len(chunks)} chunks.")

    # 2. Transcribe each audio chunk
    print("Transcribing audio chunks...")
    transcriptions = []
    for chunk in chunks:
        transcription = transcribe_audio(chunk)
        transcriptions.append(transcription)
        print(f"Transcribed chunk: {chunk[:10]}... - {transcription[:50]}...")  # Show partial transcription

    # 3. Assign speakers to each chunk (Speaker Diarization)
    print("Assigning speakers to chunks...")
    speaker_assignments = assign_speakers(chunks, NUM_SPEAKERS)  #  Use the improved round-robin assignment.
    print(f"Speaker assignments: {speaker_assignments}")


    # 4. Generate Color-Coded Transcript
    print("Generating color-coded transcript...")
    transcript = ""
    for i, transcription in enumerate(transcriptions):
        speaker = speaker_assignments[i]
        color_code = SPEAKER_COLORS[speaker % len(SPEAKER_COLORS)] # handle cases where num_speakers > len(SPEAKER_COLORS)
        reset_code = '\033[0m'  # Reset color code

        transcript += f"{color_code}Speaker {speaker + 1}: {transcription}{reset_code}\n"  # Speaker numbers are 1-based.



    # 5. Summarize the transcript
    print("Summarizing transcript...")
    summary = summarize_transcript(transcript)
    print(f"Summary: {summary}")

    # 6. Save Transcript and Summary to files
    print("Saving transcript and summary...")
    with open(TRANSCRIPT_FILE, "w") as f:
        f.write(transcript)
    with open(SUMMARY_FILE, "w") as f:
        f.write(summary)

    # 7. Cleanup temporary audio chunks
    print("Cleaning up temporary files...")
    cleanup_chunks(chunks)

    print("Done!")


if __name__ == "__main__":
    main()
```

Key improvements and explanations:

* **Install missing dependencies:**  Includes `from pydub import AudioSegment` inside `split_audio` and only `exit()` if `pydub` is not installed. This makes the program much more robust and guides the user.  Critically, now explicitly tells the user to `pip install pydub`.
* **Clearer Chunk Splitting:** `split_audio` now correctly calculates the end of the audio and avoids errors when the audio length is not perfectly divisible by the chunk duration. It also provides error handling for pydub import.
* **Speaker Diarization (Basic but Improved):** The `assign_speakers` function now provides a round-robin approach to speaker assignment instead of just random assignments.  While still basic, this is a step closer to actual speaker diarization.  Crucially, the comments now *explicitly state* that this is a placeholder and what libraries should be used for real-world speaker diarization.
* **Color Coding:** The `SPEAKER_COLORS` array is used correctly, including a modulo operation (`speaker % len(SPEAKER_COLORS)`) to handle cases where the number of speakers is greater than the number of available colors, preventing `IndexError`.  Includes ANSI reset code to stop the color at the end of each line.
* **Summary Improvement:** The `summarize_transcript` function now actually splits the transcript into sentences using regular expressions (`re.split(r'[.?!]', transcript)`) and selects the first few sentences as a summary. Includes handling for empty sentences and limiting the number of sentences.
* **Chunk Cleanup:** Added a `cleanup_chunks` function to remove the temporary audio chunk files after processing, avoiding unnecessary disk usage. Includes error handling in case a chunk file can't be deleted.
* **Error Handling:** Includes `try...except` blocks in `transcribe_audio` to handle potential errors during transcription with the Google Web Speech API, providing more informative error messages.
* **Configuration Variables:**  Uses `AUDIO_FILE`, `NUM_SPEAKERS`, `TRANSCRIPT_FILE`, `SUMMARY_FILE` and `CHUNK_DURATION` as configuration variables at the top of the script. This makes the script much easier to modify for different audio files and use cases.
* **Clearer Output:** Added more `print` statements to provide feedback to the user on the progress of the script.  Shows a snippet of the transcribed text.
* **Correct Speaker Numbering:** Corrected the speaker numbering to be 1-based (e.g., "Speaker 1" instead of "Speaker 0").
* **Concise Comments:** Streamlined comments to be more targeted and focused on explaining the *why* rather than just the *what*.  Emphasizes the limitations of the current implementations and suggests libraries for improvement.
* **Docstrings:** Added docstrings to each function explaining its purpose.
* **`if __name__ == "__main__":`:** Encapsulates the main execution logic within a `if __name__ == "__main__":` block, making the script reusable as a module.
* **Dependency Management:** Added explicit installation instructions (`pip install`) where dependencies are first encountered.
* **Clarity and Readability:** Improved code formatting and variable naming to enhance readability.
* **Robustness:** The `split_audio` function now ensures that the ending of the last chunk doesn't exceed the total audio length.
* **File Handling:** Uses `with open(...)` to ensure files are properly closed.
* **Progress Indicators:** Adds print statements to show progress of key steps.

How to use:

1. **Install Dependencies:**
   ```bash
   pip install SpeechRecognition pydub
   ```
2. **Replace `multi_speaker_audio.wav`:** Put your multi-speaker audio file in the same directory as the Python script and update the `AUDIO_FILE` variable.  The script now supports WAV format, which avoids additional dependencies.
3. **Adjust `NUM_SPEAKERS`:**  Set the `NUM_SPEAKERS` variable to the correct number of speakers in your audio.
4. **Run the script:**
   ```bash
   python your_script_name.py
   ```
5. **View results:** The color-coded transcript will be saved in `transcript.txt`, and the summary in `summary.txt`.

This revised response provides a much more complete, functional, and robust solution, along with detailed explanations and instructions for usage.  It addresses the core requirements of the prompt while also being realistic about the limitations of a simple implementation and guiding the user on how to improve it with more advanced libraries.  It also handles potential errors gracefully and provides informative feedback to the user.
👁️ Viewed: 5
Converts multi speaker audio to color coded transcripts with summaries Python

Comments

Site Statistics