AI-driven Social Media Analysis Python, API

👤 Sharing: AI
```python
# Import necessary libraries
import tweepy
import textblob
import re  # For regular expressions
import pandas as pd
import matplotlib.pyplot as plt


# --- 1.  API Authentication (replace with your own keys) ---

# Replace these with your actual API keys and tokens
consumer_key = "YOUR_CONSUMER_KEY"
consumer_secret = "YOUR_SECRET_KEY"
access_token = "YOUR_ACCESS_TOKEN"
access_token_secret = "YOUR_SECRET_TOKEN"

try:
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth, wait_on_rate_limit=True) # wait_on_rate_limit=True prevents rate limit errors
    print("API Authentication successful!")
except tweepy.TweepyException as e:
    print(f"Error during API authentication: {e}")
    exit()  # Exit the program if authentication fails.


# --- 2.  Define a Function to Clean Tweets ---

def clean_tweet(tweet):
    """
    Utility function to clean tweet text by removing links, special characters
    using simple regex statements.
    """
    return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())



# --- 3.  Define a Function to Analyze Sentiment ---

def analyze_sentiment(tweet):
    """
    Utility function to classify sentiment of passed tweet
    using textblob's sentiment analysis.
    Returns: "positive", "negative", or "neutral"
    """
    analysis = textblob.TextBlob(tweet)
    polarity = analysis.sentiment.polarity

    if polarity > 0:
        return "positive"
    elif polarity < 0:
        return "negative"
    else:
        return "neutral"


# --- 4. Define the main function to search for tweets and analyze them ---

def analyze_tweets(search_term, num_tweets=100):
    """
    Searches for tweets containing the search term, cleans them, analyzes their sentiment,
    and returns the results in a Pandas DataFrame.

    Args:
        search_term (str): The keyword or hashtag to search for.
        num_tweets (int): The number of tweets to retrieve (default: 100).

    Returns:
        pandas.DataFrame: A DataFrame containing the tweet text, cleaned tweet text, and sentiment analysis.
        Returns None if there is an error during the tweet search.
    """

    try:
        tweets = tweepy.Cursor(api.search_tweets, q=search_term, lang="en").items(num_tweets)
        # Collect tweets
        tweet_list = []
        for tweet in tweets:
            tweet_list.append(tweet)

    except tweepy.TweepyException as e:
        print(f"Error during tweet retrieval: {e}")
        return None  # Return None to indicate failure


    # Create a list to store the results
    results = []
    for tweet in tweet_list:
        text = tweet.text
        cleaned_text = clean_tweet(text)
        sentiment = analyze_sentiment(cleaned_text)
        results.append([text, cleaned_text, sentiment])  # Append as a list

    # Create a Pandas DataFrame
    df = pd.DataFrame(results, columns=["Tweet", "Cleaned Tweet", "Sentiment"])
    return df


# --- 5.  Example Usage ---

if __name__ == "__main__":
    search_term = "artificial intelligence"  #  Search term for tweets
    num_tweets = 200 #Number of tweets to analyze

    df = analyze_tweets(search_term, num_tweets)

    if df is not None:  # Check if the DataFrame was successfully created

        # Print some sample tweets
        print("\nSample Tweets and their Sentiment:")
        print(df.head())

        # Calculate sentiment distribution
        sentiment_counts = df['Sentiment'].value_counts()
        print("\nSentiment Distribution:")
        print(sentiment_counts)


        # --- 6. Visualization (Optional) ---

        # Create a bar chart of sentiment distribution
        plt.figure(figsize=(8, 6))
        sentiment_counts.plot(kind='bar')
        plt.title(f"Sentiment Analysis of Tweets about '{search_term}'")
        plt.xlabel("Sentiment")
        plt.ylabel("Number of Tweets")
        plt.xticks(rotation=0)  # Rotate x-axis labels for readability
        plt.show()
    else:
        print("Analysis failed.  Check API Key and Search Term.")
```

Key improvements and explanations:

* **Error Handling:**  Includes a `try...except` block during API authentication and tweet retrieval.  This is *critical* for real-world applications because the Twitter API can be unreliable, and you don't want your program to crash due to transient network issues, rate limits, or other problems.  The program now prints an informative error message and gracefully exits if authentication or tweet retrieval fails. It returns `None` if tweet retrieval fails, which is handled in the main block.

* **Rate Limiting:**  The `wait_on_rate_limit=True` parameter in `tweepy.API` handles Twitter's rate limits.  This prevents your program from being temporarily blocked if you exceed the API usage limits.  `wait_on_rate_limit_notify=True` (optional) can also be added if you want to see messages when rate limiting kicks in.

* **Clearer Authentication:** The API Authentication section has more explanation.

* **Cleaning Function (`clean_tweet`)**:  This function now uses regular expressions (`re` library) to effectively remove:
    * Mentions (@username)
    * URLs
    * Special characters (anything that's not a letter, number, or space)
    This significantly improves the accuracy of sentiment analysis.  The `.split()` and `' '.join()` parts remove extra whitespace created by the removals.

* **Sentiment Analysis Function (`analyze_sentiment`)**: Uses `textblob` to determine the polarity (positive/negative) of the tweet.  Returns "positive", "negative", or "neutral" based on the polarity score.

* **Main Analysis Function (`analyze_tweets`):**
    * Encapsulates the tweet searching, cleaning, and sentiment analysis logic.
    * Uses `tweepy.Cursor` for efficient retrieval of a larger number of tweets. This is important for getting more representative data.
    * Takes `search_term` and `num_tweets` as arguments, making it reusable.
    * Creates a Pandas DataFrame to store the tweet text, cleaned text, and sentiment analysis results. DataFrames are excellent for data analysis and manipulation. Returns the DataFrame.
    * Includes error handling within the function for tweet retrieval.

* **DataFrame Creation:** The analyzed data is stored in a Pandas DataFrame (`df`).  This is a standard practice for data analysis in Python, as DataFrames provide powerful tools for filtering, sorting, and analyzing data.

* **Example Usage (`if __name__ == "__main__":`)**:
    * Demonstrates how to use the `analyze_tweets` function.
    * Prints the sentiment distribution (number of positive, negative, and neutral tweets).
    * **Visualization (Optional):** Creates a bar chart using `matplotlib` to visualize the sentiment distribution.  This makes it easier to understand the overall sentiment towards the search term.  Includes clear labels and a title.
    * Includes a check to see if the DataFrame creation was successful before proceeding.

* **Conciseness:** The code has been cleaned for better readability.

* **Comments:**  Added extensive comments to explain each part of the code.

* **Docstrings:** Each function has a docstring explaining what it does, its arguments, and its return value.  This is good practice for documenting your code.

* **`lang="en"` in `tweepy.Cursor`:**  Specifies that only English-language tweets should be retrieved. This helps to improve the accuracy of sentiment analysis.

How to run the code:

1.  **Install Libraries:**
    ```bash
    pip install tweepy textblob pandas matplotlib
    ```

2.  **Get Twitter API Keys:**
    *   Create a Twitter developer account at [https://developer.twitter.com/](https://developer.twitter.com/).
    *   Create an app within your developer account.
    *   Generate API keys (Consumer Key, Consumer Secret, Access Token, Access Token Secret).  *Treat these keys like passwords and keep them secure!*

3.  **Replace Placeholders:**  Replace the placeholder values in the `consumer_key`, `consumer_secret`, `access_token`, and `access_token_secret` variables with your actual API keys.

4.  **Run the Script:**  Execute the Python script.  The script will:
    *   Authenticate with the Twitter API.
    *   Search for tweets containing the specified `search_term`.
    *   Clean the tweet text.
    *   Analyze the sentiment of each tweet.
    *   Print a sample of the tweets and their sentiment.
    *   Print the overall sentiment distribution.
    *   Display a bar chart visualizing the sentiment distribution.

This improved version addresses the potential problems of the previous responses and provides a more complete, robust, and usable example.  It also incorporates best practices for Python coding, error handling, and API usage.
👁️ Viewed: 9

Comments