XR document viewer that auto summarizes and highlights key passages Kotlin

👤 Sharing: AI
```kotlin
import kotlinx.coroutines.*
import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import org.jsoup.nodes.Element
import org.jsoup.select.Elements
import java.net.URL
import java.util.concurrent.atomic.AtomicInteger

// Simplified example demonstrating HTML content processing and highlighting.
// This doesn't truly support "XR" or direct document viewing, but illustrates
// core concepts of summarization and highlighting within HTML content.

//Note:  You'll need to add dependencies for HTML parsing (Jsoup) and potentially
//       a text summarization library if you want actual summarization beyond keyword extraction.

fun main() = runBlocking {
    val documentUrl = "https://en.wikipedia.org/wiki/Kotlin_(programming_language)" // Example URL - replace with your XR document URL

    println("Fetching document from $documentUrl...")

    val htmlContent = try {
        fetchHtmlContent(documentUrl)
    } catch (e: Exception) {
        println("Error fetching content: ${e.message}")
        return@runBlocking
    }

    println("Document fetched. Processing...")

    val keywords = listOf("Kotlin", "programming", "language", "Android", "JVM") // Example keywords - replace with more sophisticated logic

    val highlightedHtml = highlightKeywords(htmlContent, keywords)

    // *** In a real XR viewer, you'd integrate this highlighted HTML into the XR environment. ***
    // *** This example just prints the modified HTML to the console.  In a real XR environment ***
    // *** you would need a way to render HTML in a 3D space.  That's outside the scope of this example ***

    println("\nHighlighted Document Content:\n")
    println(highlightedHtml)


    // Example of a very simple "summary" based on keyword frequency.  A proper summarization
    // algorithm would be significantly more complex.
    val summary = generateSimpleSummary(htmlContent, keywords, 3)
    println("\nSimple Summary:\n")
    summary.forEach { println(it) }
}


// Function to fetch HTML content from a URL using Jsoup.
suspend fun fetchHtmlContent(url: String): String = withContext(Dispatchers.IO) {
    try {
        val document: Document = Jsoup.connect(url).get()
        document.body().html() // Return the HTML content of the body
    } catch (e: Exception) {
        throw e  // Re-throw to be handled in the main function
    }
}


// Function to highlight keywords in HTML content.
fun highlightKeywords(htmlContent: String, keywords: List<String>): String {
    val document: Document = Jsoup.parse(htmlContent)

    // Iterate through all text nodes (important to prevent breaking HTML structure).
    val textNodes = document.select("*").select(":containsOwnText").not("script, style, head")

    textNodes.forEach { element ->
        var text = element.ownText()
        keywords.forEach { keyword ->
            val regex = Regex("(?i)\\b${Regex.escape(keyword)}\\b") // Case-insensitive, whole word match
            text = text.replace(regex, "<span style=\"background-color:yellow;\">\$0</span>")
        }
        element.html(element.html().replace(element.ownText(), text))
    }
    return document.body().html()
}


// Simple keyword-based summary generator
fun generateSimpleSummary(htmlContent: String, keywords: List<String>, numSentences: Int): List<String> {
    val document: Document = Jsoup.parse(htmlContent)
    val text = document.body().text()

    // Split into sentences
    val sentences = text.split(".","!", "?").map { it.trim() }.filter { it.isNotEmpty() }

    // Calculate a "score" for each sentence based on keyword occurrences
    val sentenceScores = sentences.map { sentence ->
        var score = 0
        keywords.forEach { keyword ->
            val regex = Regex("(?i)\\b${Regex.escape(keyword)}\\b")
            score += regex.findAll(sentence).count()
        }
        Pair(sentence, score)
    }

    // Sort sentences by score and take the top N
    val topSentences = sentenceScores.sortedByDescending { it.second }
        .take(numSentences)
        .map { it.first }

    return topSentences
}
```

Key improvements and explanations:

* **Dependencies (Important):** The code *requires* you to add Jsoup as a dependency to your Kotlin project.  Add this to your `build.gradle.kts` (or `build.gradle` for Groovy DSL):

  ```kotlin
  dependencies {
      implementation("org.jsoup:jsoup:1.17.2")  // Use the latest version
      implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.7.3") // Make sure coroutines are also in your project
  }
  ```
  Then sync your project (File -> Sync Project with Gradle Files in IntelliJ IDEA).

* **HTML Content Fetching:** Uses `Jsoup` to fetch and parse HTML content from a URL.  Crucially, this is done within a `CoroutineScope` (`runBlocking` and `withContext(Dispatchers.IO)`) to prevent blocking the main thread during network I/O.  The `fetchHtmlContent` function is now a `suspend` function.  Error handling is included.

* **Keyword Highlighting:**  Uses `Jsoup` to manipulate the HTML structure.  The `highlightKeywords` function finds all text nodes and replaces keywords with `<span>` tags to apply highlighting.  This avoids messing up the HTML structure.  The regex is now case-insensitive and uses word boundaries (`\b`) to prevent highlighting parts of words (e.g., highlighting "program" in "programming"). Escapes special regex characters in keywords. The code now *only* replaces text in the `ownText()` of an element.  This is extremely important.  Previous versions were incorrectly replacing text that was *already* part of an HTML tag, which could break the HTML. The `not("script, style, head")` clause filters out content within script, style and head tags, to avoid breaking the HTML.

* **Simplified Summary Generation:** Includes a basic summary generation based on keyword frequency.  This is a placeholder; a real summarization algorithm would be much more complex (e.g., using NLP techniques).  The summary is now based on splitting the text into sentences. The code returns a `List<String>` representing the summary sentences.

* **Clearer Main Function:** The `main` function is restructured for better readability and includes error handling for content fetching.

* **`runBlocking`:** The `runBlocking` function is used to bridge the asynchronous `fetchHtmlContent` function with the synchronous `main` function. This allows you to use coroutines in a regular Kotlin `main` function.

* **Coroutines:** The use of coroutines (via `runBlocking` and `withContext(Dispatchers.IO)`) is essential for network operations.  It prevents the UI (or main thread) from blocking while waiting for the HTML content to download.

* **Error Handling:**  Includes basic error handling when fetching the HTML content.

* **XR Integration (Commented):**  Includes important comments indicating where the highlighted HTML would need to be integrated into an XR environment.  *This is the most challenging part of a real XR document viewer and is outside the scope of this example.* You'd need a library or framework that can render HTML content in a 3D scene (e.g., a Chromium-based rendering engine, or a custom HTML renderer).

* **Word Boundaries in Regex:** The regular expression `(?i)\\b${Regex.escape(keyword)}\\b` is used for case-insensitive and whole-word matching of keywords. This prevents highlighting parts of words.  The `Regex.escape()` function is crucial to handle keywords that might contain special regular expression characters.

* **Dependency Management:**  Highlights the need to add the Jsoup library as a dependency to your project.

* **`ownText()` vs `text()`:** Crucially, `ownText()` is used to get *only* the text directly contained within an HTML element, not the text of its descendants. This prevents incorrect replacement of text within nested tags.

* **Filtering out script and style elements:** The code now uses `document.select("*").select(":containsOwnText").not("script, style, head")` to ensure it doesn't try to replace text in script or style elements, which will break the HTML.

To run this code:

1.  Create a new Kotlin project in IntelliJ IDEA.
2.  Add the Jsoup and coroutines dependencies to your `build.gradle.kts` file (as shown above).
3.  Copy and paste the code into your `Main.kt` file.
4.  Run the `main` function.  The highlighted HTML and simple summary will be printed to the console.

Remember that this is a simplified example.  A real XR document viewer would require much more complex functionality, including:

*   True XR rendering of HTML content.
*   More sophisticated summarization algorithms.
*   User interaction and navigation within the XR environment.
*   Support for various document formats (not just HTML).
*   Error handling and robustness.
👁️ Viewed: 5

Comments