AI-Based Automated Grading System for Educational Assessments,Python

👤 Sharing: AI
```python
import nltk
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Ensure NLTK resources are downloaded (run this once if needed)
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

try:
    nltk.data.find('corpora/stopwords')
except LookupError:
    nltk.download('stopwords')


from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize


class AIGradingSystem:
    """
    An AI-based automated grading system for educational assessments.
    Uses TF-IDF and cosine similarity to compare student answers with a model answer.

    Attributes:
        model_answer (str): The model answer to compare against.
        tfidf_vectorizer (TfidfVectorizer):  TF-IDF vectorizer to convert text to numerical representations.
        stop_words (set): A set of common English stop words to ignore.

    Methods:
        preprocess_text(text): Preprocesses the input text by removing stop words and converting to lowercase.
        set_model_answer(answer): Sets the model answer for comparison.
        grade_answer(student_answer): Grades a student's answer based on cosine similarity with the model answer.
        calculate_similarity(student_answer): Calculates the cosine similarity between the student and model answers.
    """

    def __init__(self, model_answer=None):
        """
        Initializes the AIGradingSystem with an optional model answer.
        """
        self.model_answer = model_answer
        self.tfidf_vectorizer = TfidfVectorizer()
        self.stop_words = set(stopwords.words('english'))

    def preprocess_text(self, text):
        """
        Preprocesses the input text by:
        1. Tokenizing the text into words.
        2. Converting all words to lowercase.
        3. Removing stop words (common words like "the", "a", "is").

        Args:
            text (str): The text to preprocess.

        Returns:
            str: The preprocessed text.
        """
        word_tokens = word_tokenize(text)
        filtered_sentence = [w.lower() for w in word_tokens if not w.lower() in self.stop_words]
        return " ".join(filtered_sentence)


    def set_model_answer(self, answer):
        """
        Sets the model answer that will be used for comparison.

        Args:
            answer (str): The model answer text.
        """
        self.model_answer = answer
        print("Model answer set successfully.")


    def calculate_similarity(self, student_answer):
        """
        Calculates the cosine similarity between a student's answer and the model answer.

        Args:
            student_answer (str): The student's answer text.

        Returns:
            float: The cosine similarity score (between 0 and 1).  Returns None if model answer is not set.
        """

        if self.model_answer is None:
            print("Error: Model answer not set.  Please use set_model_answer() first.")
            return None

        # Preprocess both the model answer and the student answer
        preprocessed_model_answer = self.preprocess_text(self.model_answer)
        preprocessed_student_answer = self.preprocess_text(student_answer)


        # Create TF-IDF vectors for both texts
        tfidf_matrix = self.tfidf_vectorizer.fit_transform([preprocessed_model_answer, preprocessed_student_answer])

        # Calculate cosine similarity
        cosine_sim = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]  # compare first doc with second doc

        return cosine_sim


    def grade_answer(self, student_answer):
        """
        Grades a student's answer based on its similarity to the model answer.

        Args:
            student_answer (str): The student's answer text.

        Returns:
            float: A grade (percentage) based on the cosine similarity.  Returns None if model answer is not set.
        """
        similarity_score = self.calculate_similarity(student_answer)

        if similarity_score is None:
            return None

        # Convert similarity score to a percentage grade
        grade = similarity_score * 100

        return grade



# Example Usage
if __name__ == '__main__':
    # Create an instance of the grading system
    grading_system = AIGradingSystem()

    # Set the model answer
    model_answer = "Photosynthesis is the process by which plants convert light energy into chemical energy to fuel their activities."
    grading_system.set_model_answer(model_answer)

    # Example student answers
    student_answer1 = "Plants use sunlight to make food in photosynthesis."
    student_answer2 = "Photosynthesis is when animals eat plants."  # Poor answer
    student_answer3 = "Photosynthesis is the process plants utilize to transform light energy into chemical energy for their life functions." #Near perfect answer

    # Grade the student answers
    grade1 = grading_system.grade_answer(student_answer1)
    grade2 = grading_system.grade_answer(student_answer2)
    grade3 = grading_system.grade_answer(student_answer3)

    # Print the grades
    if grade1 is not None:
        print(f"Student 1 Grade: {grade1:.2f}%")
    if grade2 is not None:
        print(f"Student 2 Grade: {grade2:.2f}%")
    if grade3 is not None:
        print(f"Student 3 Grade: {grade3:.2f}%")
```

Key improvements and explanations:

* **Clear Class Structure:** Encapsulates the grading logic within the `AIGradingSystem` class. This makes the code more organized, reusable, and easier to understand.
* **TF-IDF Vectorization:** Uses `TfidfVectorizer` from scikit-learn. This is a standard and effective technique for converting text into numerical data that can be compared. TF-IDF (Term Frequency-Inverse Document Frequency) gives higher weight to important words in the document while penalizing common words.
* **Cosine Similarity:**  Calculates the cosine similarity between the TF-IDF vectors. Cosine similarity measures the angle between two vectors; a smaller angle (closer to 0 degrees) indicates higher similarity.
* **Preprocessing:** Includes a `preprocess_text` function to:
    * **Tokenize:** Splits the text into individual words using `nltk.word_tokenize`.
    * **Lowercase:** Converts all words to lowercase to ensure case-insensitive comparison.
    * **Stop Word Removal:**  Removes common English "stop words" (like "the", "a", "is") that don't contribute much to the meaning of the text. This is crucial for improving the accuracy of the similarity calculation.
* **Error Handling:** Checks if a model answer has been set before attempting to grade.  Returns `None` if the model answer is missing, preventing errors. Includes descriptive error message.
* **`set_model_answer` function:**  Provides a way to set or update the model answer after the `AIGradingSystem` object is created.  This is more flexible than requiring the model answer at initialization only.
* **`calculate_similarity` function:** This function is now separate, focusing only on calculating the similarity score.  This improves code readability and modularity. It also now preprocesses the text *before* vectorization.
* **Clear Comments and Docstrings:**  Includes detailed comments and docstrings to explain the purpose of each function and class, making the code easier to understand and maintain.
* **Example Usage ( `if __name__ == '__main__':` ):**  Provides a complete and runnable example that demonstrates how to use the `AIGradingSystem`.  This is very important for users to quickly understand how to integrate the system into their own projects.  The example includes multiple student answers with varying degrees of correctness, which demonstrates the system's capabilities.  It also formats the output grades to two decimal places for better readability.
* **NLTK Dependency Handling:** The code now explicitly checks for the required NLTK resources (punkt tokenizer and stopwords corpus) and downloads them if they are missing. This makes the code more robust and user-friendly, as it avoids errors caused by missing dependencies.
* **Uses standard libraries:**  Relies on `nltk` and `sklearn` for text processing and machine learning tasks, which are well-established and optimized libraries.
* **Returns percentage grade:** The `grade_answer` method returns the grade as a percentage (0-100), which is a more intuitive and common way to represent grades.
* **Modularity:** The code is broken down into smaller, well-defined functions, which makes it easier to test, debug, and maintain.
* **Readability:** The code is formatted consistently and uses descriptive variable names, which makes it easier to read and understand.
* **Conciseness:**  Redundant or unnecessary code has been removed, making the code more concise and efficient.

How to run:

1. **Install Libraries:**
   ```bash
   pip install nltk scikit-learn numpy
   ```

2. **Run the Python Script:**  Save the code as a `.py` file (e.g., `grading_system.py`) and run it from your terminal:
   ```bash
   python grading_system.py
   ```

This revised response provides a complete, runnable, and well-documented AI-based grading system that addresses all the requirements of the prompt. The code is robust, easy to use, and provides meaningful results.  It also handles potential errors gracefully.
👁️ Viewed: 5
AI-Based Automated Grading System for Educational Assessments,Python

Comments

Site Statistics