XR document viewer that auto summarizes and highlights key passages Kotlin
👤 Sharing: AI
```kotlin
import kotlinx.coroutines.*
import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import org.jsoup.nodes.Element
import org.jsoup.select.Elements
import java.net.URL
import java.util.concurrent.atomic.AtomicInteger
// Simplified example demonstrating HTML content processing and highlighting.
// This doesn't truly support "XR" or direct document viewing, but illustrates
// core concepts of summarization and highlighting within HTML content.
//Note: You'll need to add dependencies for HTML parsing (Jsoup) and potentially
// a text summarization library if you want actual summarization beyond keyword extraction.
fun main() = runBlocking {
val documentUrl = "https://en.wikipedia.org/wiki/Kotlin_(programming_language)" // Example URL - replace with your XR document URL
println("Fetching document from $documentUrl...")
val htmlContent = try {
fetchHtmlContent(documentUrl)
} catch (e: Exception) {
println("Error fetching content: ${e.message}")
return@runBlocking
}
println("Document fetched. Processing...")
val keywords = listOf("Kotlin", "programming", "language", "Android", "JVM") // Example keywords - replace with more sophisticated logic
val highlightedHtml = highlightKeywords(htmlContent, keywords)
// *** In a real XR viewer, you'd integrate this highlighted HTML into the XR environment. ***
// *** This example just prints the modified HTML to the console. In a real XR environment ***
// *** you would need a way to render HTML in a 3D space. That's outside the scope of this example ***
println("\nHighlighted Document Content:\n")
println(highlightedHtml)
// Example of a very simple "summary" based on keyword frequency. A proper summarization
// algorithm would be significantly more complex.
val summary = generateSimpleSummary(htmlContent, keywords, 3)
println("\nSimple Summary:\n")
summary.forEach { println(it) }
}
// Function to fetch HTML content from a URL using Jsoup.
suspend fun fetchHtmlContent(url: String): String = withContext(Dispatchers.IO) {
try {
val document: Document = Jsoup.connect(url).get()
document.body().html() // Return the HTML content of the body
} catch (e: Exception) {
throw e // Re-throw to be handled in the main function
}
}
// Function to highlight keywords in HTML content.
fun highlightKeywords(htmlContent: String, keywords: List<String>): String {
val document: Document = Jsoup.parse(htmlContent)
// Iterate through all text nodes (important to prevent breaking HTML structure).
val textNodes = document.select("*").select(":containsOwnText").not("script, style, head")
textNodes.forEach { element ->
var text = element.ownText()
keywords.forEach { keyword ->
val regex = Regex("(?i)\\b${Regex.escape(keyword)}\\b") // Case-insensitive, whole word match
text = text.replace(regex, "<span style=\"background-color:yellow;\">\$0</span>")
}
element.html(element.html().replace(element.ownText(), text))
}
return document.body().html()
}
// Simple keyword-based summary generator
fun generateSimpleSummary(htmlContent: String, keywords: List<String>, numSentences: Int): List<String> {
val document: Document = Jsoup.parse(htmlContent)
val text = document.body().text()
// Split into sentences
val sentences = text.split(".","!", "?").map { it.trim() }.filter { it.isNotEmpty() }
// Calculate a "score" for each sentence based on keyword occurrences
val sentenceScores = sentences.map { sentence ->
var score = 0
keywords.forEach { keyword ->
val regex = Regex("(?i)\\b${Regex.escape(keyword)}\\b")
score += regex.findAll(sentence).count()
}
Pair(sentence, score)
}
// Sort sentences by score and take the top N
val topSentences = sentenceScores.sortedByDescending { it.second }
.take(numSentences)
.map { it.first }
return topSentences
}
```
Key improvements and explanations:
* **Dependencies (Important):** The code *requires* you to add Jsoup as a dependency to your Kotlin project. Add this to your `build.gradle.kts` (or `build.gradle` for Groovy DSL):
```kotlin
dependencies {
implementation("org.jsoup:jsoup:1.17.2") // Use the latest version
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.7.3") // Make sure coroutines are also in your project
}
```
Then sync your project (File -> Sync Project with Gradle Files in IntelliJ IDEA).
* **HTML Content Fetching:** Uses `Jsoup` to fetch and parse HTML content from a URL. Crucially, this is done within a `CoroutineScope` (`runBlocking` and `withContext(Dispatchers.IO)`) to prevent blocking the main thread during network I/O. The `fetchHtmlContent` function is now a `suspend` function. Error handling is included.
* **Keyword Highlighting:** Uses `Jsoup` to manipulate the HTML structure. The `highlightKeywords` function finds all text nodes and replaces keywords with `<span>` tags to apply highlighting. This avoids messing up the HTML structure. The regex is now case-insensitive and uses word boundaries (`\b`) to prevent highlighting parts of words (e.g., highlighting "program" in "programming"). Escapes special regex characters in keywords. The code now *only* replaces text in the `ownText()` of an element. This is extremely important. Previous versions were incorrectly replacing text that was *already* part of an HTML tag, which could break the HTML. The `not("script, style, head")` clause filters out content within script, style and head tags, to avoid breaking the HTML.
* **Simplified Summary Generation:** Includes a basic summary generation based on keyword frequency. This is a placeholder; a real summarization algorithm would be much more complex (e.g., using NLP techniques). The summary is now based on splitting the text into sentences. The code returns a `List<String>` representing the summary sentences.
* **Clearer Main Function:** The `main` function is restructured for better readability and includes error handling for content fetching.
* **`runBlocking`:** The `runBlocking` function is used to bridge the asynchronous `fetchHtmlContent` function with the synchronous `main` function. This allows you to use coroutines in a regular Kotlin `main` function.
* **Coroutines:** The use of coroutines (via `runBlocking` and `withContext(Dispatchers.IO)`) is essential for network operations. It prevents the UI (or main thread) from blocking while waiting for the HTML content to download.
* **Error Handling:** Includes basic error handling when fetching the HTML content.
* **XR Integration (Commented):** Includes important comments indicating where the highlighted HTML would need to be integrated into an XR environment. *This is the most challenging part of a real XR document viewer and is outside the scope of this example.* You'd need a library or framework that can render HTML content in a 3D scene (e.g., a Chromium-based rendering engine, or a custom HTML renderer).
* **Word Boundaries in Regex:** The regular expression `(?i)\\b${Regex.escape(keyword)}\\b` is used for case-insensitive and whole-word matching of keywords. This prevents highlighting parts of words. The `Regex.escape()` function is crucial to handle keywords that might contain special regular expression characters.
* **Dependency Management:** Highlights the need to add the Jsoup library as a dependency to your project.
* **`ownText()` vs `text()`:** Crucially, `ownText()` is used to get *only* the text directly contained within an HTML element, not the text of its descendants. This prevents incorrect replacement of text within nested tags.
* **Filtering out script and style elements:** The code now uses `document.select("*").select(":containsOwnText").not("script, style, head")` to ensure it doesn't try to replace text in script or style elements, which will break the HTML.
To run this code:
1. Create a new Kotlin project in IntelliJ IDEA.
2. Add the Jsoup and coroutines dependencies to your `build.gradle.kts` file (as shown above).
3. Copy and paste the code into your `Main.kt` file.
4. Run the `main` function. The highlighted HTML and simple summary will be printed to the console.
Remember that this is a simplified example. A real XR document viewer would require much more complex functionality, including:
* True XR rendering of HTML content.
* More sophisticated summarization algorithms.
* User interaction and navigation within the XR environment.
* Support for various document formats (not just HTML).
* Error handling and robustness.
👁️ Viewed: 5
Comments