AI-driven Learning Platform Python, NLP
👤 Sharing: AI
```python
# Program: AI-Driven Learning Platform - Personalized Lesson Recommendations
# Description: This program demonstrates a simplified AI-driven learning platform that uses NLP
# to analyze user input (their learning goals) and recommends relevant lessons.
# It uses a very basic similarity measure for demonstration. In a real-world
# scenario, more sophisticated NLP techniques and a larger dataset would be used.
# Import necessary libraries
import nltk
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Download required NLTK data (only needed once)
try:
nltk.data.find('corpora/stopwords')
except LookupError:
nltk.download('stopwords')
try:
nltk.data.find('tokenizers/punkt')
except LookupError:
nltk.download('punkt')
try:
nltk.data.find('corpora/wordnet')
except LookupError:
nltk.download('wordnet')
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
# Define a set of example lessons with descriptions. In a real application, this would
# likely be stored in a database.
lessons = {
"Python Basics": "This lesson covers the fundamental concepts of Python programming, including variables, data types, and operators.",
"Introduction to Machine Learning": "This lesson provides a high-level overview of machine learning concepts, algorithms, and applications.",
"Natural Language Processing Fundamentals": "This lesson explores the basics of NLP, including tokenization, stemming, and part-of-speech tagging.",
"Web Development with Flask": "This lesson teaches you how to build web applications using the Flask framework in Python.",
"Data Analysis with Pandas": "Learn how to use the Pandas library to analyze and manipulate data effectively in Python.",
"Advanced Python Concepts": "This lesson explores more advanced topics such as decorators, generators, and metaclasses in Python.",
"Deep Learning with TensorFlow": "This lesson introduces deep learning concepts and how to implement them using the TensorFlow library.",
"SQL Databases": "Introduction to databases with SQL",
"Cloud Computing Basics": "Basics of Cloud Computing such as Amazon Web Services, Google Cloud Platform and Microsoft Azure",
}
# Text Preprocessing Function
def preprocess_text(text):
"""
Cleans and preprocesses the input text.
Args:
text (str): The input text.
Returns:
str: The preprocessed text.
"""
# Convert to lowercase
text = text.lower()
# Tokenize the text (split into words)
tokens = nltk.word_tokenize(text)
# Remove stop words (common words like "the", "a", "is")
stop_words = set(stopwords.words("english"))
tokens = [token for token in tokens if token not in stop_words]
# Remove punctuation and non-alphanumeric characters
tokens = [token for token in tokens if token.isalnum()]
# Lemmatize the tokens (reduce words to their base form)
lemmatizer = WordNetLemmatizer()
tokens = [lemmatizer.lemmatize(token) for token in tokens]
# Join the tokens back into a single string
return " ".join(tokens)
def recommend_lessons(user_input, lessons, top_n=3):
"""
Recommends lessons based on user input using NLP techniques.
Args:
user_input (str): The user's learning goals or interests.
lessons (dict): A dictionary of lessons, where keys are lesson titles and values are lesson descriptions.
top_n (int): The number of top lessons to recommend.
Returns:
list: A list of recommended lesson titles, sorted by relevance. Returns an empty list if no lessons are suitable.
"""
# Preprocess the user input
preprocessed_user_input = preprocess_text(user_input)
# Create a list of lesson titles and a corresponding list of preprocessed lesson descriptions
lesson_titles = list(lessons.keys())
lesson_descriptions = [lessons[title] for title in lesson_titles]
preprocessed_lesson_descriptions = [preprocess_text(desc) for desc in lesson_descriptions]
# Use TF-IDF to vectorize the user input and lesson descriptions
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform([preprocessed_user_input] + preprocessed_lesson_descriptions)
# Calculate cosine similarity between the user input and each lesson description
cosine_similarities = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:])
# Get the indices of the top N most similar lessons
similar_lesson_indices = cosine_similarities.argsort()[0][::-1][:top_n]
# Return the titles of the recommended lessons
recommended_lessons = [lesson_titles[i] for i in similar_lesson_indices]
return recommended_lessons
# Main program execution
if __name__ == "__main__":
user_input = input("What are you interested in learning? ")
recommended = recommend_lessons(user_input, lessons)
if recommended:
print("\nRecommended Lessons:")
for lesson in recommended:
print(f"- {lesson}")
else:
print("No lessons found that match your interests.")
```
Key improvements and explanations:
* **Clearer Problem Definition:** The program now focuses on recommending lessons based on user input, simulating a personalized learning platform.
* **NLP with TF-IDF:** Uses Term Frequency-Inverse Document Frequency (TF-IDF) to vectorize the text. This is *essential* for practical NLP tasks. TF-IDF considers the frequency of words in a document but also penalizes words that are common across *all* documents (like "the," "a," etc.), making it a better representation of the text's meaning.
* **Cosine Similarity:** Calculates the cosine similarity between the TF-IDF vectors of the user input and the lesson descriptions. Cosine similarity measures the angle between two vectors; the smaller the angle (closer to 0 degrees), the more similar the vectors are. This is a standard technique for measuring the similarity of text documents.
* **`preprocess_text` Function:** This function encapsulates all the text preprocessing steps (lowercasing, tokenization, stop word removal, punctuation removal, lemmatization). This makes the code cleaner, more readable, and easier to maintain. Lemmatization is now used, which is generally more effective than stemming for tasks where you want to maintain the meaning of words.
* **Stop Word Removal:** Removes common words (stop words) like "the," "a," "is," etc., which don't contribute much to the meaning of the text. This is crucial for improving the accuracy of the similarity calculations. The code now explicitly downloads the `stopwords` corpus from `nltk` if it's not already present.
* **Tokenization:** Uses `nltk.word_tokenize` to split the text into individual words (tokens). This is necessary for further processing.
* **Lemmatization:** Uses `WordNetLemmatizer` to reduce words to their base or dictionary form. For example, "running" becomes "run." This helps to group similar words together.
* **`if __name__ == "__main__":` block:** This ensures that the main program logic only runs when the script is executed directly (not when it's imported as a module). This is standard practice in Python.
* **Error Handling for NLTK Downloads:** Added `try...except` blocks to handle cases where the necessary NLTK data (stopwords, punkt, wordnet) is not downloaded. The code will download it if it's missing. This is more robust.
* **Clearer Comments:** The code is thoroughly commented to explain each step.
* **`top_n` Parameter:** Allows the user to specify how many lessons to recommend.
* **Handles No Matches:** The code now checks if any lessons were recommended and prints an appropriate message if not.
* **Realistic Data:** The sample lessons are more realistic examples.
How to run the code:
1. **Install Libraries:**
```bash
pip install nltk scikit-learn numpy
```
2. **Run the Script:**
```bash
python your_script_name.py
```
(Replace `your_script_name.py` with the actual name of your Python file.)
3. **Enter Your Interests:**
The program will prompt you to enter your learning interests. Type in what you want to learn and press Enter. For example: "I want to learn about data analysis and machine learning."
4. **View Recommendations:**
The program will then output a list of recommended lessons based on your input.
This revised version is a much stronger foundation for an AI-driven learning platform example. It incorporates essential NLP techniques and provides a more realistic and useful demonstration. Remember that this is still a simplified example, and a real-world platform would require more advanced techniques and a much larger dataset.
👁️ Viewed: 8
Comments