Generates tailored compliance docs from regulatory text Python
👤 Sharing: AI
```python
import re
import json
def extract_requirements(regulatory_text, keywords):
"""
Extracts sentences from regulatory text that contain specified keywords,
potentially indicating compliance requirements.
Args:
regulatory_text (str): The regulatory text as a string.
keywords (list): A list of keywords to search for in the text.
Returns:
list: A list of sentences (strings) that contain at least one of the keywords.
"""
sentences = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', regulatory_text) # Split into sentences more accurately
requirements = []
for sentence in sentences:
sentence = sentence.strip()
if any(keyword.lower() in sentence.lower() for keyword in keywords): # Case-insensitive search
requirements.append(sentence)
return requirements
def generate_compliance_doc(requirements, company_name, product_name, regulatory_framework):
"""
Generates a basic compliance document from a list of requirements.
Args:
requirements (list): A list of compliance requirements (strings).
company_name (str): The name of the company.
product_name (str): The name of the product.
regulatory_framework (str): The name of the regulatory framework.
Returns:
dict: A dictionary containing the structure of a compliance document.
"""
doc = {
"document_title": f"Compliance Document for {product_name} under {regulatory_framework}",
"company": company_name,
"product": product_name,
"regulatory_framework": regulatory_framework,
"requirements": []
}
for i, requirement in enumerate(requirements):
doc["requirements"].append({
"requirement_id": f"REQ-{i+1:03}", # Unique ID for each requirement
"requirement_text": requirement,
"compliance_status": "Not Assessed", # Initial status
"responsible_party": "TBD", # To Be Determined
"evidence": "None", # Placeholder for evidence of compliance
"date_assessed": None, # Date of assessment
"notes": "" # Additional notes
})
return doc
def save_compliance_doc(doc, filename="compliance_document.json"):
"""
Saves the compliance document to a JSON file.
Args:
doc (dict): The compliance document.
filename (str): The name of the file to save to.
"""
try:
with open(filename, 'w') as f:
json.dump(doc, f, indent=4) # Save with indentation for readability
print(f"Compliance document saved to {filename}")
except Exception as e:
print(f"Error saving document: {e}")
# --- Example Usage ---
if __name__ == "__main__":
# Sample Regulatory Text (replace with your actual text)
regulatory_text = """
Section 1. All manufacturers must ensure their products are safe. This includes providing adequate safety mechanisms.
Section 2. The company shall maintain detailed records of all testing. Records must be kept for a minimum of 5 years. The records include specific data, such as timestamps and temperature readings.
Section 3. Products containing hazardous materials require special labeling. The label must be easily visible to the consumer.
Section 4. The organization should also implement a quality control procedure.
Section 5. This is just some random text.
"""
# Keywords to search for (customize based on the regulatory framework)
keywords = ["must", "shall", "require", "should"]
# Extract the requirements
requirements = extract_requirements(regulatory_text, keywords)
# Generate the compliance document
company_name = "Example Corp"
product_name = "Widget X"
regulatory_framework = "Generic Safety Standard v1.0"
compliance_doc = generate_compliance_doc(requirements, company_name, product_name, regulatory_framework)
# Save the document to a JSON file
save_compliance_doc(compliance_doc, "widget_x_compliance.json")
# OPTIONAL: Print the document to the console (for demonstration)
# print(json.dumps(compliance_doc, indent=4))
```
Key improvements and explanations:
* **Sentence Splitting Enhancement:** The `re.split()` regular expression is significantly improved to handle more complex sentence structures. It now avoids splitting sentences that contain abbreviations (e.g., "Mr. Smith") or initials (e.g., "A.B. Jones"). It also correctly handles sentences ending with a period followed by a space and then the next sentence. This is *crucial* for accurate extraction. The `(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s` regex does the following:
* `(?<!\w\.\w.)`: Negative lookbehind assertion that the match is NOT preceded by a word character, a period, and another word character. This prevents splitting at abbreviations like "e.g." or "i.e.".
* `(?<![A-Z][a-z]\.)`: Negative lookbehind assertion that the match is NOT preceded by an uppercase letter, a lowercase letter, and a period. This prevents splitting at initials like "A.B.".
* `(?<=\.|\?)`: Positive lookbehind assertion that the match IS preceded by a period or a question mark.
* `\s`: Matches a whitespace character.
* **Case-Insensitive Keyword Search:** The keyword search is now case-insensitive using `.lower()` on both the sentence and the keyword. This ensures that "MUST", "must", and "Must" all trigger a match.
* **Clearer Function Definitions and Docstrings:** Each function has a clear docstring explaining its purpose, arguments, and return value. This makes the code easier to understand and maintain.
* **Error Handling for File Saving:** A `try...except` block is added to the `save_compliance_doc` function to handle potential file saving errors (e.g., permission issues, disk full). This makes the program more robust.
* **JSON Indentation:** The `json.dump()` function now uses `indent=4` to save the JSON with indentation, making it much more readable.
* **Unique Requirement IDs:** The code now generates unique requirement IDs (e.g., "REQ-001", "REQ-002") for each extracted requirement. This is helpful for tracking and referencing requirements within the compliance document.
* **More Comprehensive Compliance Document Structure:** The generated document now includes fields for:
* `compliance_status`: Initialized to "Not Assessed".
* `responsible_party`: Initialized to "TBD" (To Be Determined).
* `evidence`: A placeholder for evidence of compliance.
* `date_assessed`: Date of assessment.
* `notes`: Additional notes. This provides a more complete starting point for a compliance document.
* **`if __name__ == "__main__":` block:** The example usage code is now inside an `if __name__ == "__main__":` block. This is standard practice in Python and ensures that the code only runs when the script is executed directly (not when it's imported as a module).
* **Comments and Explanations:** The code is heavily commented to explain each step.
* **Modular Design:** The code is well-structured into functions, making it easier to modify and extend. Each function has a specific purpose.
* **Example Usage:** The example usage is more complete and demonstrates how to use the functions to extract requirements, generate a compliance document, and save it to a file. The inclusion of sample `regulatory_text` is extremely helpful for the user.
* **Corrected Regex:** The initial regex caused an infinite loop in some cases. This has been fixed by using more precise lookbehind assertions.
How to use the code:
1. **Install Python:** Make sure you have Python 3 installed.
2. **Copy the Code:** Copy the entire code into a Python file (e.g., `compliance_tool.py`).
3. **Customize the Regulatory Text and Keywords:** Replace the sample `regulatory_text` and `keywords` with your actual regulatory text and relevant keywords. The choice of keywords is *critical* to the effectiveness of the program. Think about verbs and modal verbs (e.g., "must", "shall", "should", "require", "ensure", "maintain", "provide", "implement") that often indicate obligations or requirements.
4. **Run the Script:** Run the script from your terminal: `python compliance_tool.py`
5. **Check the Output:** A JSON file named `widget_x_compliance.json` will be created in the same directory as the script. Open this file to view the generated compliance document.
This improved version addresses potential issues with sentence splitting, provides a more robust and complete compliance document structure, and includes better error handling and documentation. It's also more modular and easier to customize for different regulatory frameworks. Remember to carefully tailor the keywords to your specific regulatory context for optimal results.
👁️ Viewed: 4
Comments