Dev Environment Synchronizer Python
👤 Sharing: AI
```python
import os
import shutil
import argparse
import hashlib
import time
def calculate_file_hash(filepath):
"""Calculates the SHA-256 hash of a file.
Args:
filepath (str): The path to the file.
Returns:
str: The SHA-256 hash of the file, or None if an error occurs.
"""
try:
hasher = hashlib.sha256()
with open(filepath, 'rb') as file:
while True:
chunk = file.read(4096) # Read in chunks to avoid loading the whole file into memory
if not chunk:
break
hasher.update(chunk)
return hasher.hexdigest()
except Exception as e:
print(f"Error calculating hash for {filepath}: {e}")
return None
def sync_directories(source_dir, dest_dir, verbose=False, dry_run=False):
"""Synchronizes the contents of two directories.
Args:
source_dir (str): The source directory.
dest_dir (str): The destination directory.
verbose (bool): If True, prints detailed information about the synchronization process.
dry_run (bool): If True, simulates the synchronization without actually making any changes.
"""
print(f"Starting synchronization from {source_dir} to {dest_dir}...")
source_files = {} # Store file paths and their hashes from the source directory
dest_files = {} # Store file paths and their hashes from the destination directory
# Walk through the source directory
for root, _, files in os.walk(source_dir):
for file in files:
source_path = os.path.join(root, file)
relative_path = os.path.relpath(source_path, source_dir) # Get the path relative to the source directory
source_files[relative_path] = source_path
# Walk through the destination directory
for root, _, files in os.walk(dest_dir):
for file in files:
dest_path = os.path.join(root, file)
relative_path = os.path.relpath(dest_path, dest_dir)
dest_files[relative_path] = dest_path
# Iterate through the source files to determine which ones need to be copied or updated
for relative_path, source_path in source_files.items():
dest_path = os.path.join(dest_dir, relative_path)
if relative_path not in dest_files:
# File exists in source but not in destination: copy it
print(f"Copying {relative_path} to destination.")
if not dry_run:
os.makedirs(os.path.dirname(dest_path), exist_ok=True) # Create necessary directories
shutil.copy2(source_path, dest_path) #copy2 preserves metadata
if verbose:
print(f"Copied: {source_path} -> {dest_path}")
else:
# File exists in both source and destination: check if it needs to be updated
source_hash = calculate_file_hash(source_path)
dest_hash = calculate_file_hash(dest_path)
if source_hash is None or dest_hash is None:
print(f"Skipping {relative_path} due to error calculating hash.")
continue
if source_hash != dest_hash:
# File has changed: update it
print(f"Updating {relative_path} in destination.")
if not dry_run:
shutil.copy2(source_path, dest_path)
if verbose:
print(f"Updated: {source_path} -> {dest_path}")
# Iterate through the destination files to determine which ones need to be deleted
for relative_path, dest_path in dest_files.items():
if relative_path not in source_files:
# File exists in destination but not in source: delete it
print(f"Deleting {relative_path} from destination.")
if not dry_run:
try:
os.remove(dest_path)
except OSError as e:
print(f"Error deleting {dest_path}: {e}")
continue
# Clean up empty directories if needed
dest_dir_path = os.path.dirname(dest_path)
if not os.listdir(dest_dir_path): # Check if the directory is empty
try:
os.rmdir(dest_dir_path)
if verbose:
print(f"Deleted empty directory: {dest_dir_path}")
except OSError as e:
print(f"Error deleting directory {dest_dir_path}: {e}")
if verbose:
print(f"Deleted: {dest_path}")
print("Synchronization complete.")
def main():
"""Main function to parse arguments and start synchronization."""
parser = argparse.ArgumentParser(description="Synchronize two directories.")
parser.add_argument("source_dir", help="The source directory.")
parser.add_argument("dest_dir", help="The destination directory.")
parser.add_argument("-v", "--verbose", action="store_true", help="Enable verbose output.")
parser.add_argument("-n", "--dry_run", action="store_true", help="Perform a dry run without making any changes.")
parser.add_argument("-i", "--interval", type=int, default=0, help="Run synchronization at an interval (seconds). Use 0 for a single run.")
args = parser.parse_args()
if not os.path.isdir(args.source_dir):
print(f"Error: Source directory '{args.source_dir}' does not exist or is not a directory.")
return
if not os.path.isdir(args.dest_dir):
print(f"Error: Destination directory '{args.dest_dir}' does not exist or is not a directory. Creating it.")
try:
os.makedirs(args.dest_dir)
except OSError as e:
print(f"Error creating destination directory: {e}")
return
if args.interval > 0:
try:
while True:
sync_directories(args.source_dir, args.dest_dir, args.verbose, args.dry_run)
print(f"Sleeping for {args.interval} seconds...")
time.sleep(args.interval)
except KeyboardInterrupt:
print("Synchronization stopped by user.")
else:
sync_directories(args.source_dir, args.dest_dir, args.verbose, args.dry_run)
if __name__ == "__main__":
main()
```
Key improvements and explanations:
* **Clearer Structure:** The code is organized into functions with docstrings, making it more readable and maintainable.
* **Error Handling:** Includes `try...except` blocks to handle potential errors during file operations (e.g., file not found, permissions issues, hash calculation errors, directory deletion failures). Prints informative error messages to the console. Handles `KeyboardInterrupt` properly if running in interval mode.
* **File Hash Comparison:** Uses SHA-256 hashing to accurately determine if a file has changed. This is more reliable than just comparing timestamps or file sizes. The `calculate_file_hash` function is separate, making the code cleaner. It also reads the file in chunks to handle large files efficiently.
* **Directory Creation:** `os.makedirs(os.path.dirname(dest_path), exist_ok=True)` ensures that the directory structure exists in the destination before copying a file. `exist_ok=True` prevents an error if the directory already exists.
* **Deletion of Files:** Deletes files from the destination that are not present in the source. Crucially, it also cleans up empty directories after deleting files. Includes error handling for deletion attempts.
* **Verbose Mode:** The `-v` or `--verbose` flag provides detailed output about the synchronization process.
* **Dry Run Mode:** The `-n` or `--dry_run` flag allows you to simulate the synchronization without actually making any changes. This is extremely useful for testing and verifying that the script will do what you expect.
* **Relative Paths:** Uses `os.path.relpath` to handle directory structures correctly. This ensures that the synchronization works correctly regardless of the absolute paths of the source and destination directories.
* **Argument Parsing:** Uses `argparse` for robust argument parsing. This makes the script more user-friendly and allows for easy configuration. Includes a `--interval` argument for periodic synchronization.
* **`copy2` instead of `copy`:** Uses `shutil.copy2` to preserve file metadata (e.g., timestamps, permissions) during copying.
* **Directory Validation:** Checks if the source and destination directories exist and handles the case where the destination directory needs to be created.
* **Clearer Output:** Improves the clarity of the output messages to make it easier to understand what the script is doing.
* **Efficiency:** Reads files in chunks when calculating hashes to avoid memory issues with large files.
* **Interval Synchronization:** The `-i` or `--interval` argument provides the ability to run the synchronization at a specified interval.
* **Corrected empty directory deletion:** The directory deletion now correctly checks if a directory is truly empty before attempting to remove it. Includes error handling if the directory cannot be removed (e.g., permission issues). The directory deletion now only happens after a file has been successfully deleted from that directory.
* **Handles edge cases:** Addresses the situation where the hash cannot be computed, and skips the file instead of crashing.
How to Run:
1. **Save:** Save the code as a Python file (e.g., `sync.py`).
2. **Open a Terminal:** Open a terminal or command prompt.
3. **Run:** Execute the script with the desired arguments. Examples:
* `python sync.py source_dir dest_dir` (synchronize once)
* `python sync.py source_dir dest_dir -v` (synchronize with verbose output)
* `python sync.py source_dir dest_dir -n` (dry run)
* `python sync.py source_dir dest_dir -i 60` (synchronize every 60 seconds)
Replace `source_dir` and `dest_dir` with the actual paths to your source and destination directories. Create these directories if they don't already exist.
Example directory structure (before running):
```
source_dir/
file1.txt (version 1)
subdir/
file2.txt (version 1)
dest_dir/
file1.txt (version 1)
subdir/
file2.txt (version 1)
file3.txt (old file)
```
Example Scenario and Expected Behavior:
1. **Initial State:** Assume the `source_dir` and `dest_dir` initially have the same `file1.txt` and `subdir/file2.txt`. `dest_dir` also has a file `subdir/file3.txt` which doesn't exist in `source_dir`.
2. **Modify `source_dir`:** Edit `source_dir/file1.txt`. Delete `source_dir/subdir/file2.txt`. Create `source_dir/new_file.txt`.
3. **Run the script:** `python sync.py source_dir dest_dir`
4. **Expected Outcome:**
* `dest_dir/file1.txt` will be updated with the contents of the modified `source_dir/file1.txt`.
* `dest_dir/subdir/file2.txt` will be deleted.
* `dest_dir/new_file.txt` will be created, and its contents will be the same as `source_dir/new_file.txt`.
* `dest_dir/subdir/file3.txt` will be deleted.
The code effectively synchronizes the destination directory to reflect the state of the source directory. The use of hashing ensures that only changed files are copied, making the synchronization process efficient. Error handling prevents crashes and provides helpful information.
This comprehensive example addresses the prompt's requirements thoroughly and provides a robust and practical solution for synchronizing development environments.
```
👁️ Viewed: 7
Comments