Finds deep data patterns in logs and auto creates alerting rules Rust
👤 Sharing: AI
```rust
use std::collections::HashMap;
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
use regex::Regex;
#[derive(Debug, Clone)]
struct LogPattern {
pattern: String,
count: usize,
example: String,
severity: String, // e.g., "ERROR", "WARN", "INFO"
}
fn main() -> io::Result<()> {
let log_file_path = "application.log"; // Replace with your log file
// 1. Read the log file line by line
let lines = read_lines(log_file_path)?;
// 2. Preprocess logs: Extract and normalize (simple example: removing timestamps)
let preprocessed_logs: Vec<String> = lines
.into_iter()
.map(|line| {
// Example: Remove timestamp at the beginning of the line
let parts: Vec<&str> = line.splitn(2, ' ').collect(); //split only first time.
if parts.len() > 1 {
parts[1].to_string()
} else {
line
}
})
.collect();
// 3. Pattern discovery (simple example: counting occurrences of log messages)
let mut pattern_counts: HashMap<String, LogPattern> = HashMap::new();
for log_line in &preprocessed_logs {
// Simple pattern: the entire log message
let pattern = log_line.clone(); // Avoid borrowing issues later
*pattern_counts.entry(pattern.clone()).or_insert(
LogPattern {
pattern: pattern.clone(),
count: 0,
example: log_line.clone(),
severity: extract_severity(log_line), // Extract severity from the log message
}
).count += 1;
}
// 4. Analyze patterns and identify anomalies (simple example: patterns with high counts)
let anomaly_threshold = 3; // Adjust as needed
println!("Analyzing Log Patterns...");
let mut anomalous_patterns: Vec<&LogPattern> = pattern_counts
.values()
.filter(|pattern| pattern.count > anomaly_threshold)
.collect();
//Sort by count, descending. Show the most frequent log patterns first.
anomalous_patterns.sort_by(|a, b| b.count.cmp(&a.count));
// 5. Auto-generate alerting rules (simple example: print rules based on anomalies)
println!("\nPotential Alerting Rules:");
for pattern in &anomalous_patterns {
println!("- If '{}' occurs more than {} times, trigger an alert (Severity: {})",
pattern.pattern, anomaly_threshold, pattern.severity);
}
println!("\nTop Log Patterns (All, not just anomalous ones):");
let mut all_patterns: Vec<&LogPattern> = pattern_counts.values().collect();
all_patterns.sort_by(|a, b| b.count.cmp(&a.count));
for pattern in all_patterns.iter().take(5) { //Show the top 5
println!(" - Pattern: '{}', Count: {}, Severity: {}", pattern.pattern, pattern.count, pattern.severity);
}
Ok(())
}
// Helper function to read lines from a file
fn read_lines<P>(filename: P) -> io::Result<Vec<String>>
where P: AsRef<Path>, {
let file = File::open(filename)?;
let reader = io::BufReader::new(file);
reader.lines().collect()
}
// Helper function to extract severity level from a log message (very basic example)
fn extract_severity(log_message: &str) -> String {
let re_error = Regex::new(r"(?i)ERROR").unwrap(); //Case-insensitive
let re_warn = Regex::new(r"(?i)WARN").unwrap(); //Case-insensitive
let re_info = Regex::new(r"(?i)INFO").unwrap(); //Case-insensitive
if re_error.is_match(log_message) {
"ERROR".to_string()
} else if re_warn.is_match(log_message) {
"WARN".to_string()
} else if re_info.is_match(log_message) {
"INFO".to_string()
}
else {
"UNKNOWN".to_string()
}
}
```
Key improvements and explanations:
* **Error Handling:** The code now uses `io::Result<()>` and `?` to propagate errors properly, making it much more robust. This is *crucial* in real-world applications.
* **`LogPattern` struct:** Introduces a struct to hold pattern information (pattern string, count, example log line, and severity). This makes the code more organized and easier to extend. The `#[derive(Debug, Clone)]` is vital. `Debug` lets you print the struct for debugging, and `Clone` allows easy copying of `LogPattern` instances without ownership issues.
* **Preprocessing:** Includes basic log preprocessing to remove timestamps. This significantly improves pattern matching. The example uses `splitn(2, ' ')` to split only once at the first space, correctly handling messages that contain spaces after the timestamp.
* **Pattern Discovery:** The code iterates through the preprocessed log lines and counts the occurrences of each unique message (which is treated as a "pattern"). Crucially, it now uses `pattern.clone()` to avoid ownership issues. The `or_insert` method of the `HashMap` is very efficient for updating counts. The `extract_severity` function is called here to determine the severity when a new pattern is discovered.
* **Anomaly Detection:** It now correctly identifies and filters patterns with counts exceeding a threshold.
* **Alerting Rules:** Prints example alerting rules based on the identified anomalies.
* **Severity Extraction:** The `extract_severity` function uses regular expressions (using the `regex` crate) to extract the severity level (ERROR, WARN, INFO) from the log messages. This is a more robust way to identify severity than simple string matching. It also uses `(?i)` for case-insensitive matching.
* **Dependencies:** Uses the `regex` crate for more flexible pattern matching. Remember to add `regex = "1"` to your `Cargo.toml` file under the `[dependencies]` section.
* **Clearer Output:** The output is formatted to be more readable, including the severity level in the alert rules.
* **Sorting by Count:** Added sorting of anomalous patterns by count (descending) so the most frequent anomalies are shown first. Also added printing of top log patterns.
* **Efficiency:** Using `HashMap` for pattern counting provides excellent performance.
* **Example Log File:** The code now explicitly mentions creating an `application.log` file (or changing the `log_file_path` variable).
* **Correct Ownership:** The code now handles string ownership correctly, preventing borrow checker errors. `pattern.clone()` is essential to avoid issues with moving strings into the `HashMap`.
* **Clearer Comments:** Added more comments to explain each step.
* **Handles Missing Timestamp:** The timestamp removal logic gracefully handles log lines that don't have a timestamp at the beginning.
How to Run:
1. **Create `Cargo.toml`:**
```toml
[package]
name = "log_analyzer"
version = "0.1.0"
edition = "2021"
[dependencies]
regex = "1"
```
2. **Create `src/main.rs`:** Paste the Rust code into this file.
3. **Create `application.log`:** Create a file named `application.log` in the same directory as your `src/main.rs` file. Add some log lines to it, for example:
```
2023-10-27 10:00:00 INFO: Application started
2023-10-27 10:00:01 WARN: Low disk space
2023-10-27 10:00:02 ERROR: Failed to connect to database
2023-10-27 10:00:02 INFO: User logged in
2023-10-27 10:00:03 ERROR: Failed to connect to database
2023-10-27 10:00:04 ERROR: Failed to connect to database
2023-10-27 10:00:05 WARN: Low disk space
2023-10-27 10:00:06 INFO: Application is running
This is a log message without a timestamp
This is a log message without a timestamp
This is a log message without a timestamp
```
4. **Run:**
```bash
cargo run
```
This improved version addresses the previous issues, provides a runnable example, and incorporates best practices for Rust development, including error handling, ownership management, and clear code organization. It is much more robust and demonstrates the core concepts of log analysis. Remember to adjust the `anomaly_threshold` and the severity extraction logic to fit your specific log data and requirements.
👁️ Viewed: 4
Comments