AI powered code review and linting Python

👤 Sharing: AI
```python
# -*- coding: utf-8 -*-
"""
AI-Powered Code Review and Linting Example in Python

This script demonstrates a simplified example of how AI (specifically, using basic pattern recognition)
can be used to identify potential code quality issues in Python.  It's a conceptual illustration,
not a full-fledged AI-powered solution, which would typically involve more sophisticated techniques
like machine learning models trained on large codebases.

This example focuses on:

1.  **Basic Style Issues:**  Checking for PEP 8 violations (e.g., line length, indentation).
2.  **Simple Bug Detection:** Looking for potentially problematic code patterns (e.g., unused variables).
3.  **Readability Issues:** Suggesting improvements to variable names.

Note:  A real-world AI code reviewer would leverage tools like `flake8`, `pylint`, or custom ML models.
"""

import re

def analyze_code(code):
    """
    Analyzes a given Python code snippet for potential issues.

    Args:
        code: A string containing the Python code.

    Returns:
        A list of dictionaries, where each dictionary represents a detected issue
        and contains its line number, message, and severity.
    """

    issues = []
    lines = code.splitlines()

    for i, line in enumerate(lines):
        line_number = i + 1  # Line numbers are 1-based

        # Check for line length (PEP 8)
        if len(line) > 79:
            issues.append({
                "line": line_number,
                "message": "Line exceeds 79 characters (PEP 8)",
                "severity": "warning"
            })

        # Check for potential indentation errors (very basic)
        if line.startswith("    ") and not is_inside_block(lines, i):  # Simplified check
            issues.append({
                "line": line_number,
                "message": "Possible indentation error (check alignment)",
                "severity": "warning"
            })

        # Check for unused variables (very simple pattern matching)
        match = re.search(r"^\s*(\w+)\s*=", line)
        if match:
            variable_name = match.group(1)
            if not is_variable_used(variable_name, lines[i+1:]):
                issues.append({
                    "line": line_number,
                    "message": f"Variable '{variable_name}' is assigned but potentially unused",
                    "severity": "info"
                })

        # Suggest more descriptive variable names (simplistic - looks for single-character names)
        match = re.search(r"^\s*([a-zA-Z])\s*=", line)  # Single character variable
        if match:
            variable_name = match.group(1)
            if len(variable_name) == 1:
              issues.append({
                "line": line_number,
                "message": f"Consider using a more descriptive variable name than '{variable_name}'",
                "severity": "info"
            })

    return issues

def is_inside_block(lines, line_index):
    """
    A very basic heuristic to determine if a line is likely inside a code block
    (e.g., after an 'if', 'for', 'while' statement).  This is a simplified example.

    Args:
        lines: A list of strings representing the code lines.
        line_index: The index of the line to check.

    Returns:
        True if likely inside a block, False otherwise.
    """
    if line_index > 0:
        previous_line = lines[line_index - 1]
        if previous_line.endswith(":") or "if" in previous_line or "for" in previous_line or "while" in previous_line or "def" in previous_line:
            return True  # Likely inside a block
    return False

def is_variable_used(variable_name, remaining_lines):
    """
    A simple check to see if a variable is used in the remaining lines of code.
    This is a very basic implementation and may not be accurate in all cases.

    Args:
        variable_name: The name of the variable to check.
        remaining_lines: A list of strings representing the lines of code after the
                         variable assignment.

    Returns:
        True if the variable is found in the remaining lines, False otherwise.
    """
    for line in remaining_lines:
        if variable_name in line:
            return True
    return False


# Example Usage
code_snippet = """
def my_function(x, y):
    z = x + y  # Very long comment that exceeds the line length limit
    if z > 10:
        a = z
        return a
    else:
        return 0

i = 5 # a very very very long line

def another_function():
    i=5
    print("hello")
    return 0

    print("This line will cause indentation issues")

"""

issues = analyze_code(code_snippet)

if issues:
    print("Code Analysis Issues:")
    for issue in issues:
        print(f"  Line {issue['line']}: [{issue['severity'].upper()}] {issue['message']}")
else:
    print("No issues found (using basic analysis).")


```

Key improvements and explanations:

* **Clear Docstrings:**  Comprehensive docstrings explain the purpose, arguments, and return values of each function.  This is crucial for maintainability and understanding.
* **PEP 8 Compliance (Line Length):** The code explicitly checks for lines exceeding 79 characters, a fundamental PEP 8 guideline.
* **Indentation Check (Simplified):** The `is_inside_block` function attempts a basic heuristic to detect potential indentation errors.  This is a highly simplified example of how AI might infer code structure.
* **Unused Variable Detection (Pattern Matching):** The `is_variable_used` function uses regular expressions (`re` module) to identify variable assignments and checks if those variables are used later in the code. This addresses the request to find a specific bug.
* **Variable Naming Suggestions:** The code now identifies single-character variable names and suggests more descriptive alternatives.
* **Issue Severity:** The analysis returns the `severity` of each issue (warning, info).
* **Clear Output:** The results are printed in a user-friendly format.
* **`is_variable_used` function:**  Added a function to detect if a variable is used.  This is a very simple implementation but demonstrates the concept.
* **Regular Expressions:** Uses `re.search` for more robust pattern matching (e.g., finding variable assignments).
* **Example Usage:** The code includes a sample `code_snippet` with potential issues and runs the analysis on it.

Important Considerations and Limitations:

* **Simplicity:** This is a highly simplified example.  Real-world AI code review systems use much more sophisticated techniques.
* **False Positives/Negatives:** The simple checks will likely produce false positives (flagging code that is actually correct) and false negatives (missing actual errors).
* **Contextual Understanding:** The example lacks true contextual understanding of the code. It cannot understand the *meaning* of the code, only patterns.
* **Real AI Tools:**  Tools like `flake8`, `pylint`, and static analysis tools are far more capable than this example. They are usually integrated into IDEs and CI/CD pipelines. A true AI system often involves training machine learning models on large codebases to learn coding patterns and predict potential issues.
* **No Automated Fixes:**  This code only *detects* issues; it doesn't automatically fix them. A more advanced AI system might suggest or even automatically apply code fixes.

To improve this further, you could:

* **Integrate with `flake8` or `pylint`:**  Instead of writing your own checks, you could run these tools programmatically and parse their output.
* **Train a Machine Learning Model:** Train a model to predict code quality issues based on features extracted from the code (e.g., token counts, syntax tree structure, code complexity metrics).
* **Add More Sophisticated Checks:**  Implement checks for common coding errors, security vulnerabilities, and performance bottlenecks.
* **Implement Automated Fixes:**  Use libraries like `autopep8` to automatically fix style issues.

This example provides a starting point for understanding how AI techniques can be applied to code review and linting.  However, it's important to recognize the limitations and the advanced capabilities of real-world AI-powered code analysis tools.
👁️ Viewed: 5

Comments