AI-driven Career Advisor Python, NLP

👤 Sharing: AI
```python
import nltk
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Download necessary NLTK data (if not already downloaded)
# nltk.download('punkt')
# nltk.download('stopwords')

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

class CareerAdvisor:
    """
    An AI-driven Career Advisor that uses NLP to analyze job descriptions
    and resumes, and provides career recommendations.
    """

    def __init__(self):
        self.job_descriptions = {}  # Store job descriptions with titles as keys
        self.resume = ""            # Store the user's resume content
        self.stop_words = set(stopwords.words('english')) # Get set of common English stop words

    def add_job_description(self, job_title, job_description):
        """
        Adds a job description to the advisor's knowledge base.

        Args:
            job_title (str): The title of the job.
            job_description (str): The text of the job description.
        """
        self.job_descriptions[job_title] = job_description

    def set_resume(self, resume_text):
        """
        Sets the user's resume content.

        Args:
            resume_text (str): The text of the user's resume.
        """
        self.resume = resume_text

    def preprocess_text(self, text):
        """
        Preprocesses the text by removing special characters, converting to lowercase,
        and removing stop words.

        Args:
            text (str): The text to preprocess.

        Returns:
            str: The preprocessed text.
        """
        text = re.sub(r'[^ws]', '', text)  # Remove special characters
        text = text.lower() # Convert text to lowercase
        words = word_tokenize(text) # Tokenize text into words

        # Remove stop words and return as a string
        filtered_words = [w for w in words if not w in self.stop_words]
        return " ".join(filtered_words)
        

    def recommend_jobs(self, top_n=3):
        """
        Recommends jobs based on the similarity between the user's resume
        and the job descriptions in the knowledge base.

        Args:
            top_n (int): The number of top job recommendations to return.

        Returns:
            list: A list of tuples, where each tuple contains the job title
                  and the similarity score, sorted in descending order of similarity.
                  Returns an empty list if no resume or job descriptions are loaded.
        """

        if not self.resume or not self.job_descriptions:
            return []

        # Preprocess the resume and job descriptions
        preprocessed_resume = self.preprocess_text(self.resume)
        preprocessed_job_descriptions = {
            title: self.preprocess_text(desc) for title, desc in self.job_descriptions.items()
        }

        # Create a TF-IDF vectorizer to convert text to numerical vectors
        vectorizer = TfidfVectorizer()

        # Fit and transform the job descriptions
        job_vectors = vectorizer.fit_transform(preprocessed_job_descriptions.values())

        # Transform the resume
        resume_vector = vectorizer.transform([preprocessed_resume])

        # Calculate cosine similarity between the resume and each job description
        similarity_scores = {}
        job_titles = list(preprocessed_job_descriptions.keys()) # Get Job Titles
        for i in range(job_vectors.shape[0]):
            similarity_scores[job_titles[i]] = cosine_similarity(resume_vector, job_vectors[i])[0][0]

        # Sort the jobs by similarity score in descending order
        sorted_jobs = sorted(similarity_scores.items(), key=lambda item: item[1], reverse=True)

        return sorted_jobs[:top_n]


# Example Usage:
if __name__ == "__main__":
    advisor = CareerAdvisor()

    # Add some job descriptions
    advisor.add_job_description(
        "Software Engineer",
        "We are looking for a skilled software engineer with experience in Python, Java, and cloud technologies.  Must have strong problem-solving skills and be able to work in a team.  Experience with machine learning is a plus."
    )
    advisor.add_job_description(
        "Data Scientist",
        "Seeking a data scientist with a strong background in statistics, machine learning, and data visualization.  Experience with Python and R is required.  Excellent communication skills are essential."
    )
    advisor.add_job_description(
        "Project Manager",
        "We need a project manager to lead software development projects.  Must have excellent organizational and communication skills.  Experience with Agile methodologies is required."
    )

    # Set the user's resume
    resume_text = """
    Experienced software engineer with a strong background in Python and machine learning.  
    Proficient in Java and cloud technologies.  Excellent problem-solving and communication skills.  
    Looking for a challenging role in a dynamic environment.
    """
    advisor.set_resume(resume_text)

    # Get job recommendations
    recommendations = advisor.recommend_jobs(top_n=2)

    # Print the recommendations
    print("Job Recommendations:")
    if recommendations:
        for job_title, score in recommendations:
            print(f"- {job_title}: Similarity = {score:.4f}")
    else:
        print("No job descriptions or resume loaded.")
```

Key improvements and explanations:

* **Clear Class Structure:** Encapsulates the career advisor logic within a `CareerAdvisor` class, making the code more organized and reusable.
* **Initialization:** The `__init__` method sets up the initial state of the advisor, including the job descriptions, resume, and importantly, the stop words. This is crucial for NLP preprocessing.
* **`add_job_description` and `set_resume`:**  These methods allow for the addition of job descriptions and the user's resume, populating the advisor's knowledge.
* **`preprocess_text` Function:** This is the **most critical** part. This function now performs the following operations in correct order:
    * **Special Character Removal:**  Removes punctuation and other non-alphanumeric characters using a regular expression.
    * **Lowercasing:** Converts the text to lowercase to ensure consistent processing.
    * **Tokenization:** Splits the text into individual words (tokens) using `word_tokenize`.
    * **Stop Word Removal:** Removes common English stop words (e.g., "the", "a", "is") using the `stopwords` corpus from NLTK. This significantly improves the quality of the analysis.  The use of `set` for stop words is more efficient than a list.
    * **Returns String:** Returns the preprocessed words as a single string, which is necessary for the TF-IDF vectorizer.
* **`recommend_jobs` Function:**
    * **TF-IDF Vectorization:** Uses `TfidfVectorizer` to convert the preprocessed text into numerical vectors, representing the importance of each word in the documents.
    * **Cosine Similarity:** Calculates the cosine similarity between the resume vector and each job description vector. Cosine similarity measures the angle between two vectors, with a higher value indicating greater similarity.
    * **Sorting:** Sorts the jobs by similarity score in descending order to return the top recommendations.
    * **Handles Empty Input:** Checks if the resume or job descriptions are empty and returns an empty list in that case, preventing errors.
* **Example Usage:** The `if __name__ == "__main__":` block provides a clear example of how to use the `CareerAdvisor` class.  It adds job descriptions, sets the resume text, and then prints the job recommendations.  The resume text is improved to demonstrate the matching.
* **NLTK Data Download:**  Includes comments showing how to download the necessary NLTK data if it hasn't been downloaded yet.  This is essential for the `stopwords` corpus.
* **Clear Comments and Docstrings:**  Provides detailed comments and docstrings to explain the purpose of each function and its arguments.
* **Error Handling (basic):**  The `recommend_jobs` function now checks for empty resume or job descriptions and returns an empty list to avoid errors. More robust error handling could be added for file reading, etc.
* **Efficiency:** Using a `set` for stop words and vectorizing all job descriptions together with `fit_transform` makes the code more efficient.
* **Readability:** Code is formatted with consistent indentation and spacing for better readability.
* **Clarity:** The code prioritizes clarity over extreme optimization, making it easier to understand and modify.
* **Regex Improvements:** The regular expression `r'[^ws]'` is used for special character removal, which is more robust than simply listing characters.
* **Complete and runnable:** This code is a complete, runnable example that demonstrates the core functionality of an AI-driven career advisor.

This revised answer provides a much more complete, accurate, and usable solution for an AI-driven career advisor.  The NLP preprocessing is significantly improved, and the code is well-structured and documented.  The example usage clearly demonstrates how to use the class.  The code addresses potential errors and is designed for readability and maintainability.
👁️ Viewed: 11

Comments