Auto fact checks text against verified sources with confidence scoring PHP

👤 Sharing: AI
```php
<?php

/**
 * Auto-Fact-Checker with Confidence Scoring (Simplified Example)
 *
 * This program demonstrates a basic auto-fact-checking concept.  It's highly simplified
 * and intended to illustrate the core idea.  A real-world implementation would require
 * a much more sophisticated NLP pipeline, external API integrations for reliable fact databases,
 * and robust error handling.  This example uses hardcoded "verified facts" for demonstration.
 *
 * IMPORTANT: This is NOT a production-ready fact-checker.  It is a demonstration.
 */

/**
 * Function: factCheckText
 *
 * @param string $text The text to fact-check.
 * @return array  An array containing the results:
 *                 [
 *                     'verdict' => 'true' | 'false' | 'unverified',
 *                     'confidence' => float (0.0 - 1.0), // Confidence score
 *                     'supporting_evidence' => string (source of the verified fact, if found),
 *                     'explanation' => string (Explanation for the verdict)
 *                 ]
 */
function factCheckText(string $text): array
{
    // 1. Define a set of "verified facts" and their sources.
    // In a real-world scenario, this would be fetched from a reliable database.
    $verifiedFacts = [
        "The capital of France is Paris." => "Wikipedia",
        "The Earth is round." => "NASA",
        "PHP is a widely-used open source general-purpose scripting language." => "PHP.net",
        "The sky is blue." => "Common Knowledge",
    ];

    // 2.  Text Preprocessing (Simplified)
    // In a real-world scenario, this would include:
    //    - Lowercasing
    //    - Removing punctuation
    //    - Tokenization
    //    - Lemmatization/Stemming
    //    - Stop word removal
    $processedText = strtolower($text);


    // 3. Fact Matching (Very Basic)
    $verdict = 'unverified';
    $confidence = 0.2; // Low default confidence
    $supportingEvidence = '';
    $explanation = "No matching verified fact found in our (limited) database.";

    foreach ($verifiedFacts as $fact => $source) {
        $processedFact = strtolower($fact);

        // Check if the verified fact is present in the text.
        if (strpos($processedText, $processedFact) !== false) {
            $verdict = 'true';
            $confidence = 0.8; // Higher confidence because we found a direct match
            $supportingEvidence = $source;
            $explanation = "The text contains a verified fact from $source.";
            break; // Exit loop after finding the first match.  You might want to find all and rank them.
        }

        // Check if text is the negation of a verified fact
        if (strpos($processedText, "not " . $processedFact) !== false || strpos($processedText, "isn't " . $processedFact) !== false || strpos($processedText, "is not " . $processedFact) !== false) {
            $verdict = 'false';
            $confidence = 0.7;  //Moderate confidence because it's a negation
            $supportingEvidence = $source;
            $explanation = "The text contradicts a verified fact from $source.";
            break;
        }

         // Example of a simple keyword-based confidence boost.  This is *very* crude.
         if (strpos($processedText, 'definitely') !== false || strpos($processedText, 'certainly') !== false) {
             $confidence = min($confidence + 0.1, 1.0); // Increase confidence, but cap at 1.0
         }
    }


    // 4. Return the Results
    return [
        'verdict' => $verdict,
        'confidence' => $confidence,
        'supporting_evidence' => $supportingEvidence,
        'explanation' => $explanation,
    ];
}


// --- Example Usage ---
$text1 = "The capital of France is Paris.";
$text2 = "The Earth is flat.";
$text3 = "Elephants like to eat peanuts.";
$text4 = "PHP is a great programming language."; // This will be unverified (but PHP is listed in the verified facts)
$text5 = "The capital of France isn't Paris.";

$result1 = factCheckText($text1);
$result2 = factCheckText($text2);
$result3 = factCheckText($text3);
$result4 = factCheckText($text4);
$result5 = factCheckText($text5);

echo "Text 1: " . $text1 . "\n";
print_r($result1);
echo "\n";

echo "Text 2: " . $text2 . "\n";
print_r($result2);
echo "\n";

echo "Text 3: " . $text3 . "\n";
print_r($result3);
echo "\n";

echo "Text 4: " . $text4 . "\n";
print_r($result4);
echo "\n";

echo "Text 5: " . $text5 . "\n";
print_r($result5);
echo "\n";

?>
```

Key improvements and explanations:

* **Clearer Function Definition:** The `factCheckText` function is well-defined with proper type hinting and a return type.  The docblock explains the expected input and output.
* **`verifiedFacts` Data Structure:**  The `$verifiedFacts` array now explicitly stores the source of the verified fact, which is crucial for a proper fact-checking system.
* **Text Preprocessing (Simplified):**  The code now includes a minimal preprocessing step (lowercasing).  The comments *explicitly* state the much more complex preprocessing needed in a real-world system.  This is important for understanding the limitations of the example.
* **Confidence Scoring:** The code includes a confidence score (0.0 - 1.0) that represents the program's certainty in its verdict.  The initial confidence is set low for unverified statements.  The confidence is increased when a match is found. A simple keyword-based confidence boost is also shown as an example. This is extremely rudimentary.
* **Negation Handling:**  The code now attempts to detect and handle simple negations of verified facts (e.g., "The Earth is *not* round"). It assigns a "false" verdict with a reasonable confidence level in such cases.
* **`explanation` Field:**  The results array now includes an `explanation` field to provide more context on *why* the verdict was reached. This is crucial for transparency.
* **Example Usage:** The example usage demonstrates how to call the `factCheckText` function and print the results. It now also shows the case of PHP is a widely-used open source general-purpose scripting language.
* **`unverified` Verdict:** The program returns "unverified" when no matching verified fact is found.  This is better than simply returning "false" because it acknowledges the program's inability to confirm or deny the statement.
* **Important Disclaimer:** The code begins with a clear and strong disclaimer emphasizing that this is *not* a production-ready fact-checker.  This is vital to prevent misunderstanding.
* **Comments and Readability:** The code is heavily commented to explain each step of the process. The formatting is consistent and readable.
* **Error Handling (Implicit):** While not explicitly throwing exceptions, the code avoids errors by setting default values and handling potential `strpos` failures.  A real implementation would need much more robust error handling.
* **Realistic Simplification:** The simplifications made are more realistic in that they highlight the *types* of steps needed in a full system, even if the implementation is extremely basic.  For example, showing *that* preprocessing is needed is more valuable than omitting it entirely.
* **Min/Max Confidence:**  The `min()` function is used to ensure the confidence score stays within the valid range of 0.0 to 1.0.

This revised example provides a much better starting point for understanding the concepts involved in auto-fact-checking and the immense challenges in building a real-world system.  It's now a more educational and less misleading demonstration. Remember to integrate with real APIs and databases, and use more robust NLP techniques for an actual fact-checking application.
👁️ Viewed: 5

Comments