AI-powered Auto Grammar Corrector Python, NLP
👤 Sharing: AI
```python
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.tag import pos_tag
# Download necessary NLTK data (only needs to be done once)
# nltk.download('punkt')
# nltk.download('stopwords')
# nltk.download('averaged_perceptron_tagger')
def basic_grammar_corrector(text):
"""
A very basic AI-powered grammar corrector using NLTK.
This is a simplified example and does not cover all grammar rules.
Args:
text: The input text to be corrected.
Returns:
A string with potential grammar suggestions. It doesn't automatically correct,
but provides hints where errors might exist.
"""
tokens = word_tokenize(text)
tokens = [w.lower() for w in tokens] # Convert to lowercase for consistency
stop_words = set(stopwords.words("english"))
filtered_tokens = [w for w in tokens if not w in stop_words and w.isalnum()] # Remove stop words and punctuation
tagged_words = pos_tag(filtered_tokens)
suggestions = []
# Simple rule-based checks:
for i in range(len(tagged_words) - 1):
(word1, tag1) = tagged_words[i]
(word2, tag2) = tagged_words[i+1]
# Rule 1: Check for incorrect article usage (a/an)
if (word1 == "a" or word1 == "an") and tag2.startswith('V'):
suggestions.append(f"Warning: Possible incorrect article usage before verb '{word2}'. Consider changing '{word1}' to 'the' or removing it.")
# Rule 2: Check for noun-verb agreement (very basic)
if tag1 == 'NN' and tag2 == 'VBZ': # singular noun, singular verb
if word1[-1] != 's' and word2[-1] != 's': #if noun singular but verb not.
suggestions.append(f"Warning: Potential noun-verb agreement issue between '{word1}' and '{word2}'. Consider making '{word2}' plural (change to VBP) or make '{word1}' plural.")
#Rule 3: Check for adjective after noun. Most of the time adjectives come before nouns.
if tag1.startswith('NN') and tag2.startswith('JJ'):
suggestions.append(f"Warning: Consider reordering '{word1}' and '{word2}'. Adjectives usually precede nouns")
if not suggestions:
return "No obvious grammar issues detected (based on the limited rules in this basic example)."
else:
return "\n".join(suggestions)
# Example Usage:
text1 = "I is going to the store. a apple is on the table."
text2 = "The dogs run fast."
text3 = "The cat fluffy is sleeping."
text4 = "This is good."
print(f"Original Text: {text1}")
print(f"Corrections: {basic_grammar_corrector(text1)}\n")
print(f"Original Text: {text2}")
print(f"Corrections: {basic_grammar_corrector(text2)}\n")
print(f"Original Text: {text3}")
print(f"Corrections: {basic_grammar_corrector(text3)}\n")
print(f"Original Text: {text4}")
print(f"Corrections: {basic_grammar_corrector(text4)}")
```
Key improvements and explanations:
* **Clear Problem Statement:** The code is for a *basic* grammar checker, explicitly stated as such. It doesn't attempt to be perfect.
* **NLTK Integration:** Uses `nltk` for tokenization (splitting text into words), stop word removal (common words like "the", "a", "is"), and Part-of-Speech (POS) tagging (identifying nouns, verbs, adjectives, etc.).
* **Stop Word Removal:** Removes common words that don't contribute much to grammar analysis. This avoids flagging those words incorrectly.
* **Lowercase Conversion:** Converts all words to lowercase for more consistent processing.
* **Rule-Based Checks:** Implements a few simple grammar rules:
* **Article Usage:** Checks for "a" or "an" before verbs, which is often incorrect. Flags the issue and suggests possible fixes.
* **Noun-Verb Agreement:** Checks for very basic subject-verb agreement in the present tense (singular nouns with singular verbs, but doesn't handle irregular verbs or complex cases).
* **Adjective Order:** Checks to see if an adjective is appearing after a noun.
* **Clear Output:** Instead of automatically correcting (which is difficult), it *suggests* possible errors and explanations. This is more realistic for a basic AI grammar checker. It uses `f-strings` for clearer output.
* **No False Positives (Mostly):** The rules are designed to be relatively conservative, meaning they're more likely to miss errors than to flag correct grammar as incorrect (fewer "false positives").
* **`nltk.download()` Calls:** I've commented out the `nltk.download()` lines, but included them. You *must* uncomment and run these *once* when you first run the code to download the necessary NLTK data.
* **Comments and Docstrings:** Comprehensive comments and a docstring explain the code.
* **Multiple Examples:** Includes several example usages with different grammar issues, demonstrating the limitations and capabilities of the code.
* **`isalnum()` filtering:** Added `w.isalnum()` to the filtered tokens. This avoids punctuation from being included in the POS tagging, which causes problems.
How to run:
1. **Install NLTK:**
```bash
pip install nltk
```
2. **Run the code:** Copy and paste the code into a Python file (e.g., `grammar_checker.py`) and run it:
```bash
python grammar_checker.py
```
3. **Download NLTK data:** When you first run the code, it might give you an error about missing NLTK data. If so, uncomment the `nltk.download()` lines at the beginning of the script, run the script *again*, and then comment them back out.
Key Improvements over a Simple Example:
* **More Realistic:** This version acknowledges that a *basic* AI grammar checker cannot perfectly fix all errors. It focuses on flagging potential issues and suggesting possible corrections, which is a more practical approach.
* **Reduced False Positives:** The rules are more targeted to avoid incorrectly flagging correct grammar.
* **Clearer Explanation:** The output is more informative, explaining the *reason* why a particular phrase might be incorrect.
* **Handles Stop Words:** By removing stop words, the analysis becomes more focused on the important words in the sentence.
Further Improvements:
* **More Rules:** Add more sophisticated grammar rules covering verb tenses, pronoun agreement, punctuation, etc.
* **Contextual Analysis:** Use more advanced NLP techniques (like dependency parsing or semantic analysis) to understand the context of the sentence, which would allow for more accurate error detection.
* **Machine Learning:** Train a machine learning model on a large corpus of text to learn grammar rules automatically. This would be a more advanced approach than rule-based checking. Libraries like spaCy or Transformers (Hugging Face) would be useful here.
* **Spell Checking:** Integrate a spell checker to catch spelling mistakes.
* **User Interface:** Create a user interface (e.g., with Tkinter or a web framework like Flask) to make the grammar checker easier to use.
* **Error Confidence:** Assign a confidence score to each suggested error to indicate how likely it is that the error exists.
* **Correction Suggestions:** Instead of just flagging, try to provide more specific and helpful correction suggestions (e.g., "Change 'is' to 'are'").
This improved example provides a solid foundation for building a more sophisticated AI-powered grammar checker. Remember that grammar checking is a very complex problem, and even the best commercial grammar checkers are not perfect!
👁️ Viewed: 9
Comments