AI-powered Essay Grading Tool Python, AI
👤 Sharing: AI
```python
import nltk
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Download required NLTK resources (if not already downloaded)
try:
nltk.data.find('tokenizers/punkt')
nltk.data.find('corpora/stopwords')
except LookupError:
nltk.download('punkt')
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import re # Regular expression library
class EssayGrader:
def __init__(self, rubric, model_answer):
"""
Initializes the EssayGrader with a rubric and a model answer.
Args:
rubric (dict): A dictionary containing the grading criteria (e.g., 'content', 'grammar', 'style')
and their corresponding weights (e.g., 0.5, 0.3, 0.2). The weights should sum to 1.0.
model_answer (str): The ideal answer to compare against.
"""
self.rubric = rubric
self.model_answer = self.preprocess_text(model_answer) # Preprocess the model answer for consistency
self.stop_words = set(stopwords.words('english'))
self.vectorizer = TfidfVectorizer()
def preprocess_text(self, text):
"""
Preprocesses the input text by:
1. Lowercasing the text.
2. Removing punctuation and special characters.
3. Tokenizing the text.
4. Removing stop words.
5. Joining the tokens back into a string.
Args:
text (str): The text to preprocess.
Returns:
str: The preprocessed text.
"""
text = text.lower()
text = re.sub(r'[^\w\s]', '', text) # Remove punctuation
tokens = word_tokenize(text)
tokens = [w for w in tokens if not w in self.stop_words]
return " ".join(tokens)
def calculate_content_score(self, essay):
"""
Calculates the content score based on TF-IDF cosine similarity between the essay and the model answer.
Args:
essay (str): The essay to be graded.
Returns:
float: The content score, ranging from 0 to 1.
"""
essay = self.preprocess_text(essay)
corpus = [self.model_answer, essay]
tfidf_matrix = self.vectorizer.fit_transform(corpus)
similarity_score = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]
return similarity_score
def calculate_grammar_score(self, essay):
"""
Calculates a grammar score based on simple heuristics (sentence length, word usage).
This is a simplified example and should be replaced with a more sophisticated grammar checker.
Args:
essay (str): The essay to be graded.
Returns:
float: The grammar score, ranging from 0 to 1.
"""
sentences = nltk.sent_tokenize(essay)
num_sentences = len(sentences)
if num_sentences == 0:
return 0.5 # Avoid division by zero, give a neutral score
words = word_tokenize(essay)
num_words = len(words)
avg_sentence_length = num_words / num_sentences if num_sentences > 0 else 0
# Penalize long sentences
sentence_length_penalty = max(0, (avg_sentence_length - 25) / 25) # Linear penalty for sentences longer than 25 words.
# Encourage diverse vocabulary (simple measure: ratio of unique words to total words)
unique_words = set(words)
vocabulary_score = len(unique_words) / num_words if num_words > 0 else 0
grammar_score = max(0, min(1, vocabulary_score - sentence_length_penalty)) # Ensure score is between 0 and 1.
return grammar_score
def calculate_style_score(self, essay):
"""
Calculates a style score based on sentence variety and essay length. This is a simplified example.
Args:
essay (str): The essay to be graded.
Returns:
float: The style score, ranging from 0 to 1.
"""
sentences = nltk.sent_tokenize(essay)
num_sentences = len(sentences)
words = word_tokenize(essay)
num_words = len(words)
# Essay length (ideal length is 200 words)
length_score = 1 - abs(num_words - 200) / 200 # Score decreases as length deviates from 200
# Sentence variety (rough estimate based on sentence length variance)
sentence_lengths = [len(word_tokenize(s)) for s in sentences]
if num_sentences > 1:
sentence_length_variance = np.var(sentence_lengths)
variety_score = 1 / (1 + sentence_length_variance/50) # The larger the variance, the lower the score. The value 50 helps moderate the effect
else:
variety_score = 0.5 # Neutral score if only one sentence.
style_score = (length_score + variety_score) / 2
return max(0, min(1, style_score)) # Ensure score is between 0 and 1
def grade_essay(self, essay):
"""
Grades the essay based on the rubric.
Args:
essay (str): The essay to be graded.
Returns:
float: The overall essay grade.
"""
content_score = self.calculate_content_score(essay)
grammar_score = self.calculate_grammar_score(essay)
style_score = self.calculate_style_score(essay)
overall_grade = (self.rubric['content'] * content_score +
self.rubric['grammar'] * grammar_score +
self.rubric['style'] * style_score)
return overall_grade
# Example usage:
if __name__ == '__main__':
rubric = {
'content': 0.6,
'grammar': 0.2,
'style': 0.2
}
model_answer = """
Artificial intelligence is revolutionizing various aspects of our lives. Its impact on healthcare, transportation, and communication is undeniable. AI algorithms can analyze vast amounts of data to identify patterns and make predictions, leading to more accurate diagnoses and personalized treatments. Self-driving cars have the potential to reduce accidents and improve traffic flow. AI-powered virtual assistants are transforming the way we interact with technology. However, it is crucial to address the ethical concerns surrounding AI, such as bias and job displacement, to ensure its responsible development and deployment.
"""
grader = EssayGrader(rubric, model_answer)
essay1 = """
AI is changing the world. It is used in many different fields. It can help doctors find diseases. It can also drive cars. But we need to be careful about how we use it.
"""
essay2 = """
Artificial intelligence is transforming numerous facets of contemporary existence. Its influence on sectors such as healthcare, transportation, and communication cannot be overstated. AI algorithms possess the capability to meticulously examine copious quantities of data to discern trends and formulate anticipations, thereby enabling enhanced diagnostic precision and individualized therapeutic methodologies. Autonomous vehicles exhibit the capacity to mitigate vehicular incidents and ameliorate vehicular congestion. Furthermore, AI-driven virtual assistants are fundamentally altering the manner in which we engage with technological apparatuses. Nevertheless, it is imperative to confront the ethical dilemmas encircling AI, encompassing predisposition and occupational dislocation, to safeguard its conscientious evolution and implementation.
"""
essay3 = """
Computers are smart now because of AI. AI helps with medical things and cars. AI makes life better. Ethics are important.
"""
grade1 = grader.grade_essay(essay1)
grade2 = grader.grade_essay(essay2)
grade3 = grader.grade_essay(essay3)
print(f"Essay 1 Grade: {grade1:.2f}")
print(f"Essay 2 Grade: {grade2:.2f}")
print(f"Essay 3 Grade: {grade3:.2f}")
```
Key improvements and explanations:
* **Clear Structure:** The code is now organized into a class `EssayGrader` for better modularity and reusability.
* **Rubric-Based Grading:** The `rubric` dictionary allows you to define the weights for different aspects of the essay (content, grammar, style). The weights *must* sum to 1.
* **Preprocessing:** A `preprocess_text` method is included. This is *crucial* for improving the accuracy of the content similarity calculation. It lowercases, removes punctuation, tokenizes, and removes stop words. The removal of stop words is important; otherwise, common words like "the" and "a" will dominate the TF-IDF calculation.
* **Content Scoring (TF-IDF Cosine Similarity):** Uses `TfidfVectorizer` and `cosine_similarity` from scikit-learn to calculate the similarity between the essay and the model answer based on the frequency of terms. TF-IDF (Term Frequency-Inverse Document Frequency) is a standard technique for this.
* **Grammar Scoring (Simplified Heuristics):** The `calculate_grammar_score` function provides a *very* basic example of grammar assessment. **Important:** This is a placeholder and should be replaced with a robust grammar checker library (see below). The current implementation calculates the average sentence length and a basic vocabulary score and penalizes long sentences.
* **Style Scoring (Sentence Variety and Length):** The `calculate_style_score` function includes an estimation of style based on sentence variety (variance in sentence length) and essay length relative to an ideal length. This is also a simplification.
* **Modularity:** The code is broken down into functions, making it easier to understand and maintain.
* **Error Handling:** Includes a basic error handling to prevent division by zero in cases where the essay is empty.
* **NLTK Dependency:** The code now uses NLTK for tokenization and stop word removal. It also checks if the required NLTK resources are downloaded and downloads them if necessary. This makes the code more robust.
* **Regular Expressions:** The `re` module is used for more robust punctuation removal.
* **Docstrings:** Docstrings have been added to explain the purpose of each function and class.
* **Example Usage:** The `if __name__ == '__main__':` block provides a clear example of how to use the `EssayGrader` class.
* **Clearer Output:** The example usage prints the grades with formatting for readability.
* **`max(0, min(1, score))`:** This pattern is used throughout the code to ensure that all scores are clamped between 0 and 1. This is important for maintaining consistency.
**To run this code:**
1. **Install required libraries:**
```bash
pip install nltk scikit-learn numpy
```
2. **Run the Python script:**
```bash
python your_script_name.py
```
**Important Considerations and Further Improvements:**
* **Grammar Checking:** The current grammar scoring is extremely basic. For a real-world application, you *must* integrate a dedicated grammar checker library like:
* **LanguageTool:** A powerful open-source grammar checker. You can integrate it using the `language_tool_python` library.
* **Ginger Grammar Checker:** A commercial grammar checker with an API.
* **Grammarly API:** A commercial grammar checker with an API. Using one of these libraries will give you a much more accurate assessment of grammar.
* **Semantic Analysis:** The content scoring only looks at keyword similarity. Ideally, you would also perform semantic analysis to understand the meaning of the essay and compare it to the model answer. Techniques like word embeddings (Word2Vec, GloVe, BERT) and semantic similarity measures can be helpful here.
* **Topic Modeling:** Use topic modeling (e.g., Latent Dirichlet Allocation - LDA) to identify the key topics discussed in the essay and assess whether the essay covers the required topics.
* **Bias Detection:** Be aware of potential biases in your model answer and grading criteria. Ensure that the grader is fair to essays written in different styles or from different perspectives.
* **Feedback Generation:** Instead of just providing a grade, the tool could be extended to provide feedback to the student on specific areas for improvement. The grammar checker libraries mentioned above often provide detailed error messages that can be incorporated into feedback.
* **User Interface:** For a real application, you'd want to create a user interface (e.g., using Flask or Django) that allows students to submit their essays and receive grades and feedback.
* **Scalability:** For large-scale deployments, consider using a cloud-based platform and optimizing the code for performance.
* **Training Data:** You could potentially *train* a machine learning model to predict essay grades based on a dataset of essays and their corresponding grades. This would require a significant amount of labeled data.
* **Prompt Engineering:** If you're using a large language model (LLM) like GPT-3 for any part of the grading process (e.g., semantic analysis, feedback generation), careful prompt engineering is essential to get the best results.
This improved version provides a more complete and functional starting point for building an AI-powered essay grading tool. Remember to replace the placeholder grammar scoring with a real grammar checker library for more accurate results.
👁️ Viewed: 9
Comments