Business report generation from raw data Python
👤 Sharing: AI
```python
import pandas as pd
from datetime import datetime
def generate_business_report(data_file, report_file):
"""
Generates a simple business report from a CSV data file.
Args:
data_file (str): Path to the CSV data file. Must contain columns like 'Date', 'Product', 'Quantity', and 'Price'.
report_file (str): Path to where the generated report will be saved (in text format).
"""
try:
# 1. Data Loading and Cleaning
df = pd.read_csv(data_file)
# Basic data validation (handle missing or invalid data)
if df.empty:
raise ValueError("Data file is empty.")
required_columns = ['Date', 'Product', 'Quantity', 'Price']
for col in required_columns:
if col not in df.columns:
raise ValueError(f"Missing required column: {col}")
# Convert 'Date' to datetime objects (robust handling of potential errors)
try:
df['Date'] = pd.to_datetime(df['Date'])
except ValueError:
raise ValueError("Invalid date format in the 'Date' column. Use YYYY-MM-DD or similar standard.")
# Convert 'Quantity' and 'Price' to numeric types. Handle potential non-numeric values gracefully.
try:
df['Quantity'] = pd.to_numeric(df['Quantity'])
df['Price'] = pd.to_numeric(df['Price'])
except ValueError:
raise ValueError("Invalid numeric data in 'Quantity' or 'Price' columns. Ensure these contain only numbers.")
# Handle potentially negative values in Quantity or Price - example, set negative quantities to 0.
df['Quantity'] = df['Quantity'].apply(lambda x: max(0, x))
df['Price'] = df['Price'].apply(lambda x: max(0, x))
# 2. Data Aggregation and Analysis
# Calculate total revenue for each product
df['Revenue'] = df['Quantity'] * df['Price']
product_revenue = df.groupby('Product')['Revenue'].sum().sort_values(ascending=False)
# Calculate total revenue for the entire period
total_revenue = df['Revenue'].sum()
# Determine the best-selling product
best_selling_product = product_revenue.index[0] if not product_revenue.empty else "No sales recorded" # Handle case where there are no sales.
# Calculate the average daily revenue
daily_revenue = df.groupby('Date')['Revenue'].sum()
average_daily_revenue = daily_revenue.mean()
# 3. Report Generation
report_content = f"""
Business Performance Report
--------------------------
Date Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
Total Revenue: ${total_revenue:,.2f}
Best Selling Product: {best_selling_product}
Average Daily Revenue: ${average_daily_revenue:,.2f}
Revenue by Product:
--------------------
{product_revenue.to_string()}
"""
# 4. Save the Report to a File
with open(report_file, 'w') as f:
f.write(report_content)
print(f"Report generated successfully and saved to: {report_file}")
except FileNotFoundError:
print(f"Error: Data file not found at {data_file}")
except ValueError as e:
print(f"Data Error: {e}") # Provide a more specific error message
except Exception as e:
print(f"An unexpected error occurred: {e}") # Catch any other potential errors
# Example usage:
if __name__ == "__main__":
# Create a sample CSV file for testing
sample_data = """Date,Product,Quantity,Price
2023-01-01,Laptop,10,1200
2023-01-01,Keyboard,50,75
2023-01-02,Laptop,5,1200
2023-01-02,Mouse,100,25
2023-01-03,Monitor,15,300
2023-01-03,Keyboard,25,75
2023-01-04,Laptop,7,1200
2023-01-04,Mouse,80,25""" # Added some example data
with open("sales_data.csv", "w") as f:
f.write(sample_data)
data_file = "sales_data.csv"
report_file = "business_report.txt"
generate_business_report(data_file, report_file)
```
Key improvements and explanations:
* **Error Handling:** Includes robust error handling with `try...except` blocks to catch common issues like `FileNotFoundError`, `ValueError` (for incorrect data types or missing columns), and generic `Exception` for unexpected errors. Critically, the `ValueError` exceptions now have much more specific messages to help with debugging.
* **Data Validation:** The code now *validates* the input data. It checks if the required columns exist and attempts to convert the 'Date', 'Quantity', and 'Price' columns to the correct data types. This prevents common errors that can crash the script. It also handles potentially negative `Quantity` and `Price` values by setting them to zero (you can adjust this behavior as needed).
* **Clearer Variable Names:** Uses more descriptive variable names (e.g., `product_revenue` instead of just `revenue`).
* **`if __name__ == "__main__":` block:** The example usage code is now inside an `if __name__ == "__main__":` block. This ensures that the example code only runs when the script is executed directly, and not when it's imported as a module. This is a best practice.
* **Sample Data Creation:** Creates a sample CSV file (`sales_data.csv`) directly in the script using a string. This makes the example self-contained and easily runnable. The data is also more realistic.
* **Date Handling:** Uses `pd.to_datetime` to convert the 'Date' column to datetime objects. The example includes error handling in case the date format is incorrect.
* **Numeric Conversion:** Uses `pd.to_numeric` with error handling to ensure 'Quantity' and 'Price' are numbers.
* **`to_string()` for DataFrame Output:** Uses `.to_string()` to properly format the `product_revenue` DataFrame in the report. Without this, you get a less readable representation.
* **Handles Empty Sales Data:** Checks if `product_revenue` is empty before trying to access its index. This prevents an `IndexError` if there are no sales in the data.
* **Formatting the Total Revenue** The total revenue is formatted to make it more readable: `${total_revenue:,.2f}`
* **Clearer Report Formatting:** The report formatting is improved for better readability.
* **Comments:** Added extensive comments to explain each step of the code.
* **`datetime` for Report Timestamp:** Uses `datetime.now()` to add a timestamp to the generated report.
* **Concise Error Messages:** Provides more informative error messages to aid in debugging.
How to run this code:
1. **Save:** Save the code as a Python file (e.g., `report_generator.py`).
2. **Run:** Execute the script from your terminal: `python report_generator.py`
3. **Check Output:** The script will create two files: `sales_data.csv` (the sample data) and `business_report.txt` (the generated report). Open `business_report.txt` to view the results.
This revised version is significantly more robust, readable, and easier to use. It addresses many potential issues that could occur with the initial code and provides a much better starting point for building a real-world business report generator. It also follows best practices for Python coding.
👁️ Viewed: 4
Comments