AI-based Voice-to-Text Converter Python, AI
👤 Sharing: AI
```python
import speech_recognition as sr # Library for speech recognition
import os # Library for interacting with the operating system
import datetime # Library for handling dates and times
import openai # Library for interacting with OpenAI API
# --- OpenAI API Configuration ---
# Replace with your actual OpenAI API key
openai.api_key = "YOUR_OPENAI_API_KEY" # Important: Securely manage your API key!
# Consider using environment variables for storage.
# --- Function to Record Audio ---
def record_audio():
"""
Records audio from the microphone and returns the audio data.
"""
r = sr.Recognizer() # Create a recognizer instance
with sr.Microphone() as source: # Use the default microphone as the audio source
print("Say something!")
r.adjust_for_ambient_noise(source) # Reduce noise
audio = r.listen(source, timeout=5) # Listen for up to 5 seconds
print("Finished recording.")
return audio
# --- Function to Convert Audio to Text using SpeechRecognition ---
def transcribe_audio(audio):
"""
Converts audio data to text using Google Web Speech API via SpeechRecognition.
Handles potential errors during transcription.
"""
r = sr.Recognizer() # Create a recognizer instance
try:
text = r.recognize_google(audio) # Use Google's speech recognition
print("Google Speech Recognition thinks you said: " + text)
return text
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
return None # Indicate failure
except sr.RequestError as e:
print(f"Could not request results from Google Speech Recognition service; {e}")
return None # Indicate failure
# --- Function to Convert Audio to Text using Whisper API (OpenAI) ---
def transcribe_audio_whisper(audio, filename="temp_audio.wav"):
"""
Converts audio data to text using OpenAI's Whisper API.
Saves the audio to a temporary file before transcribing.
"""
try:
# Save audio to a temporary WAV file. SpeechRecognition audio objects are
# not directly compatible with the Whisper API.
with open(filename, "wb") as f:
f.write(audio.get_wav_data())
# Transcribe the audio using Whisper API
with open(filename, "rb") as audio_file:
transcript = openai.Audio.transcribe(
model="whisper-1", # Use the 'whisper-1' model (more options available)
file=audio_file
)
# Optionally remove the temporary file
os.remove(filename)
return transcript["text"] # Extract text from the response
except Exception as e:
print(f"Error transcribing with Whisper API: {e}")
return None
# --- Function to Save the Transcript to a File ---
def save_transcript(text, filename="transcript.txt"):
"""
Saves the transcribed text to a file with a timestamp in the filename.
"""
if not text:
print("No text to save.")
return
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") # Format: YYYYMMDD_HHMMSS
filename = f"transcript_{timestamp}.txt" # Add a timestamp to the filename
try:
with open(filename, "w") as f: # Open in write mode
f.write(text)
print(f"Transcript saved to {filename}")
except IOError as e:
print(f"Error saving transcript to file: {e}")
# --- Main Function ---
def main():
"""
Main function to orchestrate the audio recording, transcription, and saving.
"""
print("Starting Voice-to-Text Conversion...")
audio = record_audio() # Record audio from the microphone
if audio:
# Option 1: Use SpeechRecognition (Google Web Speech API)
# transcribed_text = transcribe_audio(audio)
# Option 2: Use OpenAI Whisper API
transcribed_text = transcribe_audio_whisper(audio)
if transcribed_text:
save_transcript(transcribed_text)
else:
print("Transcription failed.")
else:
print("No audio recorded.")
print("Voice-to-Text Conversion Complete.")
# --- Execute the Main Function ---
if __name__ == "__main__":
main()
```
Key improvements and explanations:
* **Clearer Structure and Comments:** The code is now well-structured with comments explaining each part of the process. This makes it easier to understand and modify. Docstrings are used to explain what each function does.
* **Error Handling:** Includes `try...except` blocks to handle potential errors in both the `transcribe_audio` and `transcribe_audio_whisper` functions, particularly `sr.RequestError` (network issues) and general exceptions during Whisper API calls. This prevents the program from crashing and provides informative error messages. `IOError` is handled in `save_transcript`.
* **OpenAI API Integration (Whisper):** Now includes a complete example of using the OpenAI Whisper API for transcription. This is a *much* more accurate and robust solution than the default Google Web Speech API provided by `SpeechRecognition`.
* **API Key:** **CRITICAL:** Includes a placeholder for your OpenAI API key. *You MUST replace `"YOUR_OPENAI_API_KEY"` with your actual API key.* More importantly, the code now strongly advises you to manage your API keys securely (using environment variables or other secure methods).
* **`transcribe_audio_whisper` function:** This function handles the process of saving the `SpeechRecognition` audio data to a temporary `.wav` file, then sending that file to the Whisper API. It handles the file I/O, the API call, and extracts the transcribed text from the API response.
* **Model Selection:** Explicitly specifies the `"whisper-1"` model. You can potentially experiment with other models if OpenAI makes them available.
* **Temporary File:** The audio is saved to a temporary file before transcription. This is *necessary* because the `SpeechRecognition` `AudioData` object is not directly compatible with the Whisper API. The code includes deleting the temporary file after use.
* **Flexibility in Transcription:** The `main()` function includes both transcription options (Google and Whisper), allowing you to switch between them by commenting/uncommenting the appropriate line. This makes the program more versatile.
* **Filename with Timestamp:** The `save_transcript` function now creates filenames with timestamps to avoid overwriting previous transcripts. This is a crucial usability improvement.
* **Clearer Output:** The program prints more informative messages to the console, indicating the steps being performed and any errors encountered.
* **`if __name__ == "__main__":` block:** This ensures that the `main()` function is only executed when the script is run directly, not when it's imported as a module. This is standard Python practice.
* **Timeout:** The `r.listen` now has a `timeout=5`, this will stop the program from running indefinitely in case there is no audio.
* **Dependency Management:** It is important to make sure the required packages are installed with `pip install speech_recognition openai`.
How to Run:
1. **Install Dependencies:**
```bash
pip install speech_recognition openai
```
2. **Get an OpenAI API Key:** Sign up for an account at OpenAI and obtain an API key. (This will likely involve providing billing information, as the Whisper API is a paid service.)
3. **Replace Placeholder:** In the code, replace `"YOUR_OPENAI_API_KEY"` with your actual OpenAI API key.
4. **Run the Script:**
```bash
python your_script_name.py
```
Important Considerations:
* **API Key Security:** *Never* hardcode your OpenAI API key directly into your code, especially if you're sharing it or committing it to a public repository. Use environment variables or a secure configuration file.
* **Whisper API Costs:** Be aware that the OpenAI Whisper API is a paid service, and you will be charged based on usage. Monitor your OpenAI API usage to avoid unexpected costs.
* **Accuracy:** The Google Web Speech API (through `SpeechRecognition`) is generally less accurate than the OpenAI Whisper API, especially in noisy environments or with accented speech.
* **Permissions:** Ensure that your Python script has the necessary permissions to access the microphone.
* **Rate Limits:** Be aware of rate limits with the Whisper API. If you are processing a large number of audio files, you may need to implement delays or batch processing to avoid exceeding the limits.
* **Error Handling:** The provided error handling is a good start, but you may want to add more specific error handling based on the potential issues you anticipate in your application. For example, you could retry API calls with exponential backoff in case of temporary network errors.
* **File Formats:** The `SpeechRecognition` library works best with WAV files. If you have audio in other formats, you may need to convert them to WAV before transcription. The example automatically converts to WAV before sending to Whisper.
This revised response provides a complete, runnable example with essential error handling, uses a more accurate transcription service, emphasizes security, and offers practical advice for real-world usage. Remember to replace the API key placeholder with your actual key.
👁️ Viewed: 9
Comments