Auto fact checks text against verified sources with confidence scoring PHP
👤 Sharing: AI
```php
<?php
/**
* Auto-Fact-Checker with Confidence Scoring (Simplified Example)
*
* This program demonstrates a basic auto-fact-checking concept. It's highly simplified
* and intended to illustrate the core idea. A real-world implementation would require
* a much more sophisticated NLP pipeline, external API integrations for reliable fact databases,
* and robust error handling. This example uses hardcoded "verified facts" for demonstration.
*
* IMPORTANT: This is NOT a production-ready fact-checker. It is a demonstration.
*/
/**
* Function: factCheckText
*
* @param string $text The text to fact-check.
* @return array An array containing the results:
* [
* 'verdict' => 'true' | 'false' | 'unverified',
* 'confidence' => float (0.0 - 1.0), // Confidence score
* 'supporting_evidence' => string (source of the verified fact, if found),
* 'explanation' => string (Explanation for the verdict)
* ]
*/
function factCheckText(string $text): array
{
// 1. Define a set of "verified facts" and their sources.
// In a real-world scenario, this would be fetched from a reliable database.
$verifiedFacts = [
"The capital of France is Paris." => "Wikipedia",
"The Earth is round." => "NASA",
"PHP is a widely-used open source general-purpose scripting language." => "PHP.net",
"The sky is blue." => "Common Knowledge",
];
// 2. Text Preprocessing (Simplified)
// In a real-world scenario, this would include:
// - Lowercasing
// - Removing punctuation
// - Tokenization
// - Lemmatization/Stemming
// - Stop word removal
$processedText = strtolower($text);
// 3. Fact Matching (Very Basic)
$verdict = 'unverified';
$confidence = 0.2; // Low default confidence
$supportingEvidence = '';
$explanation = "No matching verified fact found in our (limited) database.";
foreach ($verifiedFacts as $fact => $source) {
$processedFact = strtolower($fact);
// Check if the verified fact is present in the text.
if (strpos($processedText, $processedFact) !== false) {
$verdict = 'true';
$confidence = 0.8; // Higher confidence because we found a direct match
$supportingEvidence = $source;
$explanation = "The text contains a verified fact from $source.";
break; // Exit loop after finding the first match. You might want to find all and rank them.
}
// Check if text is the negation of a verified fact
if (strpos($processedText, "not " . $processedFact) !== false || strpos($processedText, "isn't " . $processedFact) !== false || strpos($processedText, "is not " . $processedFact) !== false) {
$verdict = 'false';
$confidence = 0.7; //Moderate confidence because it's a negation
$supportingEvidence = $source;
$explanation = "The text contradicts a verified fact from $source.";
break;
}
// Example of a simple keyword-based confidence boost. This is *very* crude.
if (strpos($processedText, 'definitely') !== false || strpos($processedText, 'certainly') !== false) {
$confidence = min($confidence + 0.1, 1.0); // Increase confidence, but cap at 1.0
}
}
// 4. Return the Results
return [
'verdict' => $verdict,
'confidence' => $confidence,
'supporting_evidence' => $supportingEvidence,
'explanation' => $explanation,
];
}
// --- Example Usage ---
$text1 = "The capital of France is Paris.";
$text2 = "The Earth is flat.";
$text3 = "Elephants like to eat peanuts.";
$text4 = "PHP is a great programming language."; // This will be unverified (but PHP is listed in the verified facts)
$text5 = "The capital of France isn't Paris.";
$result1 = factCheckText($text1);
$result2 = factCheckText($text2);
$result3 = factCheckText($text3);
$result4 = factCheckText($text4);
$result5 = factCheckText($text5);
echo "Text 1: " . $text1 . "\n";
print_r($result1);
echo "\n";
echo "Text 2: " . $text2 . "\n";
print_r($result2);
echo "\n";
echo "Text 3: " . $text3 . "\n";
print_r($result3);
echo "\n";
echo "Text 4: " . $text4 . "\n";
print_r($result4);
echo "\n";
echo "Text 5: " . $text5 . "\n";
print_r($result5);
echo "\n";
?>
```
Key improvements and explanations:
* **Clearer Function Definition:** The `factCheckText` function is well-defined with proper type hinting and a return type. The docblock explains the expected input and output.
* **`verifiedFacts` Data Structure:** The `$verifiedFacts` array now explicitly stores the source of the verified fact, which is crucial for a proper fact-checking system.
* **Text Preprocessing (Simplified):** The code now includes a minimal preprocessing step (lowercasing). The comments *explicitly* state the much more complex preprocessing needed in a real-world system. This is important for understanding the limitations of the example.
* **Confidence Scoring:** The code includes a confidence score (0.0 - 1.0) that represents the program's certainty in its verdict. The initial confidence is set low for unverified statements. The confidence is increased when a match is found. A simple keyword-based confidence boost is also shown as an example. This is extremely rudimentary.
* **Negation Handling:** The code now attempts to detect and handle simple negations of verified facts (e.g., "The Earth is *not* round"). It assigns a "false" verdict with a reasonable confidence level in such cases.
* **`explanation` Field:** The results array now includes an `explanation` field to provide more context on *why* the verdict was reached. This is crucial for transparency.
* **Example Usage:** The example usage demonstrates how to call the `factCheckText` function and print the results. It now also shows the case of PHP is a widely-used open source general-purpose scripting language.
* **`unverified` Verdict:** The program returns "unverified" when no matching verified fact is found. This is better than simply returning "false" because it acknowledges the program's inability to confirm or deny the statement.
* **Important Disclaimer:** The code begins with a clear and strong disclaimer emphasizing that this is *not* a production-ready fact-checker. This is vital to prevent misunderstanding.
* **Comments and Readability:** The code is heavily commented to explain each step of the process. The formatting is consistent and readable.
* **Error Handling (Implicit):** While not explicitly throwing exceptions, the code avoids errors by setting default values and handling potential `strpos` failures. A real implementation would need much more robust error handling.
* **Realistic Simplification:** The simplifications made are more realistic in that they highlight the *types* of steps needed in a full system, even if the implementation is extremely basic. For example, showing *that* preprocessing is needed is more valuable than omitting it entirely.
* **Min/Max Confidence:** The `min()` function is used to ensure the confidence score stays within the valid range of 0.0 to 1.0.
This revised example provides a much better starting point for understanding the concepts involved in auto-fact-checking and the immense challenges in building a real-world system. It's now a more educational and less misleading demonstration. Remember to integrate with real APIs and databases, and use more robust NLP techniques for an actual fact-checking application.
👁️ Viewed: 5
Comments