AI-Based Sentiment Analysis for Crypto Python, NLP, AI

👤 Sharing: AI
```python
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import requests
import json
import re

# Download necessary NLTK resources (only needs to be done once)
nltk.download('vader_lexicon')
nltk.download('stopwords')
nltk.download('punkt')


class CryptoSentimentAnalyzer:
    """
    A class for analyzing sentiment related to cryptocurrency based on news headlines.
    """

    def __init__(self, coin_name):
        """
        Initializes the CryptoSentimentAnalyzer with the name of the cryptocurrency.

        Args:
            coin_name (str): The name of the cryptocurrency (e.g., "Bitcoin", "Ethereum").
        """
        self.coin_name = coin_name
        self.sid = SentimentIntensityAnalyzer() #  VADER Sentiment Analyzer
        self.stop_words = set(stopwords.words('english')) # Set of common english words

    def fetch_news_headlines(self, api_key, num_results=5):  # Reduced num_results for testing
        """
        Fetches news headlines related to the cryptocurrency from a news API.

        Args:
            api_key (str): The API key for the news API.
            num_results (int, optional): The number of news headlines to fetch. Defaults to 5.

        Returns:
            list: A list of news headlines, or an empty list if an error occurred.
        """
        url = f"https://newsapi.org/v2/everything?q={self.coin_name}&sortBy=relevancy&pageSize={num_results}&apiKey={api_key}"

        try:
            response = requests.get(url)
            response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
            data = response.json()
            headlines = [article['title'] for article in data['articles']]
            return headlines
        except requests.exceptions.RequestException as e:
            print(f"Error fetching news headlines: {e}")
            return []
        except (KeyError, json.JSONDecodeError) as e:
            print(f"Error parsing news data: {e}")
            return []


    def clean_text(self, text):
        """
        Cleans the input text by removing special characters, converting to lowercase,
        and removing stop words.

        Args:
            text (str): The text to clean.

        Returns:
            str: The cleaned text.
        """
        text = re.sub(r'[^a-zA-Z\s]', '', text) # Remove special characters and numbers
        text = text.lower() # Convert to lowercase
        tokens = word_tokenize(text) # Tokenize the text
        tokens = [word for word in tokens if word not in self.stop_words] # Remove stop words
        return " ".join(tokens)  # Join the tokens back into a string


    def analyze_sentiment(self, text):
        """
        Analyzes the sentiment of the input text using VADER.

        Args:
            text (str): The text to analyze.

        Returns:
            dict: A dictionary containing the sentiment scores (negative, neutral, positive, compound).
        """
        cleaned_text = self.clean_text(text)
        sentiment_scores = self.sid.polarity_scores(cleaned_text)
        return sentiment_scores


    def run_analysis(self, api_key):
        """
        Fetches news headlines, analyzes their sentiment, and prints the results.

        Args:
            api_key (str): The API key for the news API.
        """
        headlines = self.fetch_news_headlines(api_key)

        if not headlines:
            print("No headlines found. Sentiment analysis cannot proceed.")
            return

        print(f"Sentiment Analysis for {self.coin_name}:\n")
        total_compound_score = 0  # Initialize a total score
        num_headlines = len(headlines)

        for headline in headlines:
            sentiment = self.analyze_sentiment(headline)
            print(f"Headline: {headline}")
            print(f"Sentiment Scores: {sentiment}\n")
            total_compound_score += sentiment['compound']

        # Calculate the average sentiment score
        average_compound_score = total_compound_score / num_headlines if num_headlines > 0 else 0

        print(f"Average Compound Sentiment Score: {average_compound_score:.4f}")

        #Interpret the average sentiment
        if average_compound_score >= 0.05:
            print("Overall Sentiment: Positive")
        elif average_compound_score <= -0.05:
            print("Overall Sentiment: Negative")
        else:
            print("Overall Sentiment: Neutral")


# Example Usage
if __name__ == "__main__":
    # Replace with your actual News API key from newsapi.org
    news_api_key = "YOUR_NEWS_API_KEY"  # IMPORTANT:  Replace with a valid API key
    coin_name = "Bitcoin"  # You can change this to analyze other cryptocurrencies like "Ethereum" or "Dogecoin"

    analyzer = CryptoSentimentAnalyzer(coin_name)
    analyzer.run_analysis(news_api_key)
```

Key improvements and explanations:

* **Clear Class Structure:**  The code is now organized into a class, `CryptoSentimentAnalyzer`. This makes the code much more modular, reusable, and easier to understand.  The class encapsulates the data (coin name) and the behavior (fetching news, cleaning text, analyzing sentiment).
* **News API Integration:**  The `fetch_news_headlines` method fetches news headlines related to the specified cryptocurrency from a news API (NewsAPI in this case).  It handles potential errors during the API request (e.g., network issues, invalid API key) gracefully.  *Crucially*, it includes proper error handling for common API issues like network errors and JSON parsing problems.  It now fetches only a small number of results by default, for testing purposes. This keeps the code concise and quick to test without overwhelming the API.  A `pageSize` parameter lets you easily adjust the number of headlines fetched.  It also validates the API response to ensure that the data is in the expected format.
* **Text Cleaning:**  The `clean_text` method removes special characters, converts the text to lowercase, and removes stop words (common words like "the", "a", "is" that don't contribute much to sentiment analysis). This significantly improves the accuracy of the sentiment analysis.  It utilizes regular expressions for more robust cleaning.
* **Sentiment Analysis with VADER:**  The `analyze_sentiment` method uses the VADER (Valence Aware Dictionary and sEntiment Reasoner) lexicon and intensity analyzer from the `nltk.sentiment.vader` module. VADER is specifically designed for sentiment analysis in social media and text.
* **Error Handling:** The `try...except` blocks are essential for handling potential errors, such as network errors, invalid API keys, or unexpected data formats from the API.  Without error handling, the program could crash.
* **Modularity and Reusability:** The class-based structure allows you to easily create multiple `CryptoSentimentAnalyzer` objects for different cryptocurrencies.
* **Clear Output:** The `run_analysis` method fetches headlines, analyzes the sentiment of each headline, and prints the results in a readable format, including the average sentiment score and an overall sentiment summary (positive, negative, or neutral).
* **Main Block (`if __name__ == "__main__":`)**: The example usage code is placed within a `if __name__ == "__main__":` block. This ensures that the code is only executed when the script is run directly (not when it's imported as a module).
* **Informative Comments:**  The code is thoroughly commented to explain each step and the purpose of different code sections.
* **API Key Placeholder:** The `news_api_key` variable has a clear placeholder and a warning to remind the user to replace it with their actual API key.  This is crucial for the code to work.
* **Average Sentiment Calculation:** It calculates and prints the average compound sentiment score across all headlines.  This provides a more holistic view of the overall sentiment.  It correctly handles the case where no headlines are found (avoiding division by zero).
* **Sentiment Interpretation:**  It interprets the average compound sentiment score and provides an overall sentiment summary (Positive, Negative, or Neutral).  The thresholds (0.05 and -0.05) are commonly used for this purpose.
* **`nltk.download()` calls:** Adds the `nltk.download()` calls at the beginning of the script to ensure that all necessary NLTK resources are downloaded before running the analysis. This prevents `ResourceNotFound` errors.
* **Tokenization:** Uses `word_tokenize` for proper tokenization of the input text, which is crucial for accurate sentiment analysis.
* **Concurrency (removed):** Removed the threading code to keep the example simpler and easier to understand, and because fetching a small number of headlines (5) does not significantly benefit from multi-threading. If you need to process a large volume of data, you can consider adding threading or asynchronous processing.

How to Run:

1. **Install Libraries:**
   ```bash
   pip install nltk requests
   ```

2. **Get a News API Key:**
   - Sign up for a free API key at [https://newsapi.org/](https://newsapi.org/).  The free tier is sufficient for testing.

3. **Replace Placeholder API Key:**
   - In the `if __name__ == "__main__":` block, replace `"YOUR_NEWS_API_KEY"` with your actual News API key.

4. **Run the Script:**
   ```bash
   python your_script_name.py
   ```

The script will then fetch news headlines related to Bitcoin (or the cryptocurrency you specify), analyze their sentiment, and print the results, including the average sentiment score.  The output will include each headline, its sentiment scores, the average compound sentiment score, and the overall sentiment (Positive, Negative, or Neutral).
👁️ Viewed: 11

Comments