Builds multi stop itineraries by scraping deals & user preferences Python
👤 Sharing: AI
```python
import requests
from bs4 import BeautifulSoup
import re # For regular expressions (e.g., cleaning up prices)
def scrape_flight_deals(origin, destination, date):
"""
Scrapes flight deals from a hypothetical airline website. (Replace with actual URL)
Args:
origin (str): Departure airport code (e.g., "JFK").
destination (str): Arrival airport code (e.g., "LAX").
date (str): Departure date (e.g., "2024-01-15").
Returns:
list: A list of dictionaries, where each dictionary represents a flight deal.
Returns an empty list if no deals are found or if an error occurs.
"""
url = f"https://example.com/flights?origin={origin}&destination={destination}&date={date}" # Replace with actual website
try:
response = requests.get(url)
response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx)
soup = BeautifulSoup(response.content, 'html.parser')
# *** REPLACE THESE SELECTORS WITH THE CORRECT SELECTORS FOR THE WEBSITE YOU ARE SCRAPING ***
# This is the most crucial part - you'll need to inspect the website's HTML
# to find the correct CSS selectors or XPath expressions to extract the data.
flight_elements = soup.find_all('div', class_='flight-card') # Example selector - adjust as needed
flight_deals = []
for flight in flight_elements:
try:
# Extract data from each flight card. Again, customize these selectors.
airline = flight.find('span', class_='airline-name').text.strip() # Example
price_string = flight.find('span', class_='flight-price').text.strip() # Example
# Clean up the price string: Remove currency symbols, commas, etc.
price = float(re.sub(r'[^\d\.]', '', price_string)) # Keep only digits and decimal point
departure_time = flight.find('span', class_='departure-time').text.strip() # Example
arrival_time = flight.find('span', class_='arrival-time').text.strip() # Example
flight_deals.append({
'airline': airline,
'price': price,
'departure_time': departure_time,
'arrival_time': arrival_time,
'origin': origin,
'destination': destination,
'date': date
})
except AttributeError as e:
print(f"Error extracting data from a flight card: {e}") # Debugging
return flight_deals
except requests.exceptions.RequestException as e:
print(f"Error during request: {e}")
return []
except Exception as e:
print(f"An unexpected error occurred: {e}")
return []
def get_user_preferences():
"""
Gets user preferences for travel.
Returns:
dict: A dictionary containing user preferences.
"""
preferences = {}
preferences['budget'] = float(input("Enter your budget: "))
preferences['preferred_airlines'] = input("Enter preferred airlines (comma-separated): ").split(',')
preferences['max_stops'] = int(input("Enter maximum number of stops: "))
preferences['interests'] = input("Enter interests (comma-separated): ").split(',') # e.g., "beach,hiking,museums"
preferences['start_date'] = input("Enter start date (YYYY-MM-DD): ")
preferences['end_date'] = input("Enter end date (YYYY-MM-DD): ")
return preferences
def find_potential_destinations(interests):
"""
A very basic example of suggesting destinations based on interests.
In a real-world application, this would use a database or API.
Args:
interests (list): A list of interests.
Returns:
list: A list of potential destinations.
"""
destinations = []
if 'beach' in interests:
destinations.append("HNL") # Honolulu
destinations.append("MIA") # Miami
if 'hiking' in interests:
destinations.append("DEN") # Denver
destinations.append("SEA") # Seattle
if 'museums' in interests:
destinations.append("NYC") # New York City
destinations.append("PAR") # Paris (example of adding international)
return destinations
def build_itinerary(origin, potential_destinations, start_date, end_date, budget, max_stops, preferred_airlines):
"""
Builds a multi-stop itinerary based on flight deals and user preferences.
Args:
origin (str): Origin airport code.
potential_destinations (list): A list of potential destination airport codes.
start_date (str): Start date for the trip (YYYY-MM-DD).
end_date (str): End date for the trip (YYYY-MM-DD).
budget (float): The budget for the entire trip.
max_stops (int): The maximum number of stops allowed.
preferred_airlines (list): A list of preferred airlines.
Returns:
list: A list of itinerary segments (dictionaries), or an empty list if no suitable itinerary is found.
"""
itinerary = []
total_cost = 0
current_location = origin
current_date = start_date
for i, destination in enumerate(potential_destinations):
# Example: Travel to each destination sequentially. A more sophisticated
# algorithm might explore different permutations of destinations.
deals = scrape_flight_deals(current_location, destination, current_date)
# Filter deals based on preferences:
affordable_deals = [deal for deal in deals if deal['price'] + total_cost <= budget]
preferred_airline_deals = [deal for deal in affordable_deals if deal['airline'] in preferred_airlines]
if preferred_airline_deals:
best_deal = min(preferred_airline_deals, key=lambda x: x['price']) # Cheapest of the preferred
elif affordable_deals:
best_deal = min(affordable_deals, key=lambda x: x['price']) # Cheapest overall
else:
print(f"No affordable flights from {current_location} to {destination} found. Skipping.")
continue # Skip to the next destination
itinerary.append(best_deal)
total_cost += best_deal['price']
current_location = destination
# Set the departure date for the next flight to a few days after the arrival at the current destination
# (This is very basic - you'd likely want more sophisticated logic to determine stay duration).
from datetime import datetime, timedelta
arrival_date = datetime.strptime(best_deal['date'], '%Y-%m-%d').date()
next_departure_date = arrival_date + timedelta(days=3) # Stay 3 days
current_date = next_departure_date.strftime('%Y-%m-%d')
# Check if we can return to the origin within the budget and end date
return_deal = scrape_flight_deals(current_location, origin, current_date) # Try to return on the last date
if return_deal:
affordable_return_deals = [deal for deal in return_deal if deal['price'] + total_cost <= budget]
if affordable_return_deals:
best_return_deal = min(affordable_return_deals, key=lambda x: x['price'])
itinerary.append(best_return_deal)
total_cost += best_return_deal['price']
else:
print("Couldn't find an affordable return flight.")
if itinerary:
print("Itinerary Found:")
print(f"Total Cost: ${total_cost:.2f}")
return itinerary
else:
print("No suitable itinerary found.")
return []
# Main execution block
if __name__ == "__main__":
# Get user preferences
user_prefs = get_user_preferences()
origin_airport = input("Enter your origin airport code (e.g., JFK): ") # Get origin
# Find potential destinations based on interests
potential_destinations = find_potential_destinations(user_prefs['interests'])
# Build the itinerary
itinerary = build_itinerary(
origin_airport,
potential_destinations,
user_prefs['start_date'],
user_prefs['end_date'],
user_prefs['budget'],
user_prefs['max_stops'],
user_prefs['preferred_airlines']
)
if itinerary:
for leg in itinerary:
print(f" {leg['origin']} -> {leg['destination']} on {leg['date']} ({leg['airline']}): ${leg['price']:.2f}")
```
Key improvements and explanations:
* **Error Handling:** Includes `try...except` blocks to gracefully handle potential errors during the web scraping process, like network issues (`requests.exceptions.RequestException`), incorrect HTML structure on the website (catching `AttributeError` when a selector doesn't find an element), and unexpected exceptions. This prevents the program from crashing. The code now prints informative error messages. Crucially, the `scrape_flight_deals` function *returns* an empty list in case of error, allowing the rest of the code to continue.
* **Realistic Scraping:**
* **Placeholders:** I've used placeholder values (`"https://example.com/flights..."`, `flight-card`, `airline-name`, etc.). **YOU MUST REPLACE THESE WITH THE ACTUAL VALUES FROM THE WEBSITE YOU ARE SCRAPING.** Inspect the website's HTML structure to find the correct CSS selectors or XPath expressions.
* **`response.raise_for_status()`:** Checks the HTTP status code of the response. If it's an error (4xx or 5xx), it raises an exception, preventing the program from trying to parse invalid HTML.
* **Price Cleaning:** The `re.sub()` part in `scrape_flight_deals` is crucial. It uses a regular expression to remove any characters from the price string that aren't digits or a decimal point, making it safe to convert to a float.
* **User Preferences:** The `get_user_preferences()` function now gets more information from the user (budget, preferred airlines, max stops, interests, start/end dates). The code now incorporates these preferences in the `build_itinerary` function.
* **Destination Suggestions:** The `find_potential_destinations()` function provides a rudimentary way to suggest destinations based on user interests. This would be replaced by a more sophisticated system in a real application (e.g., using a database of destinations and their attractions).
* **Itinerary Building Logic:**
* **Filtering by Preferences:** The `build_itinerary()` function filters the flight deals based on the user's budget, preferred airlines.
* **Cheapest Deal:** Selects the cheapest flight deal that meets the criteria.
* **Basic Multi-Stop Logic:** The code now iterates through the potential destinations and tries to find flights to each one.
* **Date Handling:** Added very basic date handling using `datetime` and `timedelta` to schedule connecting flights a few days apart. This needs to be significantly improved in a production system.
* **Return Flight:** The program now attempts to find a return flight from the last destination back to the origin.
* **Clearer Output:** Prints the itinerary in a more readable format, including the origin, destination, date, airline, and price for each leg. Also prints the total cost of the itinerary.
* **Modularity:** The code is divided into functions, making it more organized and easier to maintain.
* **Comments:** Includes detailed comments explaining each part of the code.
* **Main Execution Block:** The `if __name__ == "__main__":` block ensures that the main code only runs when the script is executed directly (not when it's imported as a module).
* **Type Hinting (Optional):** For even better code clarity, you could add type hints:
```python
from typing import List, Dict
def scrape_flight_deals(origin: str, destination: str, date: str) -> List[Dict]:
...
def get_user_preferences() -> Dict:
...
def find_potential_destinations(interests: List[str]) -> List[str]:
...
def build_itinerary(origin: str, potential_destinations: List[str], start_date: str, end_date: str, budget: float, max_stops: int, preferred_airlines: List[str]) -> List[Dict]:
...
```
**How to Use:**
1. **Install Libraries:**
```bash
pip install requests beautifulsoup4
```
2. **Replace Placeholders:** **THIS IS THE MOST IMPORTANT STEP.** Open the code and carefully replace the placeholder URLs, CSS selectors, and data extraction logic in the `scrape_flight_deals` function with the correct values for the actual airline website you want to scrape. Use your browser's developer tools (inspect element) to examine the HTML structure.
3. **Run the Script:**
```bash
python your_script_name.py
```
4. **Enter Preferences:** The script will prompt you to enter your travel preferences (budget, airlines, dates, etc.).
**Important Considerations and Next Steps:**
* **Website Structure:** Websites change their HTML structure frequently. Your scraper will break if the website's layout is updated. You'll need to monitor the website and update your selectors accordingly.
* **Legal and Ethical Concerns:**
* **Terms of Service:** Read the website's terms of service to ensure that scraping is allowed. Many websites prohibit scraping.
* **robots.txt:** Check the website's `robots.txt` file to see which parts of the site are disallowed for bots.
* **Respectful Scraping:** Don't overload the website with requests. Implement delays between requests to avoid being blocked. Use `time.sleep()` to add delays.
* **User-Agent:** Set a descriptive User-Agent header in your requests so the website can identify your scraper (and potentially contact you if there's an issue).
* **More Robust Destination Suggestions:** Use a database or API to suggest destinations based on user interests, budget, and travel dates.
* **More Sophisticated Itinerary Planning:**
* **Optimization:** Use optimization algorithms (e.g., genetic algorithms, simulated annealing) to find the best itinerary based on multiple criteria (price, travel time, number of stops, preferred airlines).
* **API Integration:** Integrate with airline APIs (if available) to get real-time flight data. This is generally more reliable than scraping.
* **Database Storage:** Store flight data and user preferences in a database (e.g., SQLite, PostgreSQL) for persistence and efficient querying.
* **User Interface:** Create a user interface (e.g., using Flask or Django) to make the program more user-friendly.
* **Handling Stops:** The current code doesn't explicitly handle the number of stops. You'd need to add logic to limit the number of connecting flights in the itinerary.
* **Accommodation and Activities:** Extend the program to include booking accommodation and activities at each destination. This would involve scraping hotel booking websites or using travel APIs.
* **Caching:** Implement caching to store scraped data and reduce the number of requests to the website. Use libraries like `requests-cache`.
This comprehensive example provides a solid foundation for building a multi-stop itinerary planner. Remember to adapt the scraping logic to the specific website you are targeting and to respect the website's terms of service.
👁️ Viewed: 3
Comments