{
"id": "",
"meta": {
"instanceId": "",
"templateCredsSetupCompleted": true
},
"name": "Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls",
"tags": [],
"nodes": [
{
"id": "ca701618-b2d5-48ee-a503-d3513d018a65",
"name": "Sticky Note",
"type": "n8n-nodes-base.stickyNote",
"position": [
360,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Form - Screaming Frog internal_html.csv upload \n\nThis form node is used to trigger the workflow. \n\nIt contains **three input fields**: \n- Name of the website \n- Short description of the website \n- **Screaming Frog** export containing the internal URLs \n\n\n\nIt is recommended to use the **internal_html.csv** export, but **internal_all.csv** will also work, as the workflow includes a filter to process only indexable URLs.\n"
},
"typeVersion": 1
},
{
"id": "bc040ca0-f38d-4458-a60c-17f71dbfd1ea",
"name": "Sticky Note1",
"type": "n8n-nodes-base.stickyNote",
"position": [
780,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Extract data from Screaming Frog file\n\nThis node extracts data from the **CSV file** provided by the user. \n\nIt produces an output that is **easily usable** in the following nodes. \n\n\u26a0\ufe0f **Caution:** \nIf the uploaded file is **not** the expected Screaming Frog export, the workflow will still proceed but will likely **fail in the next steps** due to missing required fields. \n\n"
},
"typeVersion": 1
},
{
"id": "f71a7d10-847d-48e7-8820-ec0c1e7ea055",
"name": "Sticky Note2",
"type": "n8n-nodes-base.stickyNote",
"position": [
1200,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Set Useful Fields \n\nThis node sets **7 key fields** from the Screaming Frog export: \n\n- `url` \u2192 from the **\"Address\"** column \n- `title` \u2192 from the **\"Title 1\"** column \n- `description` \u2192 from the **\"Meta Description 1\"** column \n- `status` \u2192 from the **\"Status Code\"** column \n- `indexability` \u2192 from the **\"Indexability\"** column \n- `content_type` \u2192 from the **\"Content Type\"** column \n- `word_count` \u2192 from the **\"Word Count\"** column \n\n\n**Multi-language compatibility** \nIf you're using Screaming Frog in **French, Italian, German, or Spanish**, the column names will be different. \nHowever, the workflow is designed to handle this, so it will **still work correctly**! \ud83e\udd73\n"
},
"typeVersion": 1
},
{
"id": "6f6546b8-adeb-4998-ae19-d93525337eb7",
"name": "Set useful fields",
"type": "n8n-nodes-base.set",
"position": [
1340,
60
],
"parameters": {
"options": [],
"assignments": {
"assignments": [
{
"id": "0e7d4a06-83fc-4834-93fe-2e758cbe2307",
"name": "url",
"type": "string",
"value": "={{ $json.Address || $json.Adresse || $json.Direcci\u00f3n || $json.Indirizzo }}"
},
{
"id": "c82f4d4c-9d0b-4c7d-9647-5d0240b58643",
"name": "title",
"type": "string",
"value": "={{ $json['Title 1'] || $json['Titolo 1'] || $json['Titolo 1'] || $json['T\u00edtulo 1'] || $json['Titel 1'] }}"
},
{
"id": "abea81db-ce3b-4ac1-bd21-09ccfffb567a",
"name": "description",
"type": "string",
"value": "={{ $json['Meta Description 1'] || $json['Meta description 1'] }}"
},
{
"id": "2ca75d74-70f8-400b-b862-9da186135915",
"name": "statut",
"type": "string",
"value": "={{ $json['Status Code'] || $json['Code HTTP'] || $json['Status-Code'] || $json['C\u00f3digo de respuesta'] || $json['Codice di stato']}}"
},
{
"id": "754d3202-38b0-4d79-ba24-8078b3244307",
"name": "indexability",
"type": "string",
"value": "={{ $json.Indexability || $json.Indexabilit\u00e9 || $json.Indicizzabilit\u00e0 || $json.Indexabilidad || $json.Indexierbarkeit}}"
},
{
"id": "8bc6583d-bb34-4d22-b310-fe79bb8ac85d",
"name": "content_type",
"type": "string",
"value": "={{ $json['Content Type'] || $json['Type de contenu'] || $json['Tipo di contenuto'] || $json['Tipo de contenido'] || $json['Inhaltstyp']}}"
},
{
"id": "c874ba1a-769e-43d3-9555-8c9914ca9b76",
"name": "word_count",
"type": "string",
"value": "={{ $json['Word Count'] || $json['Nombre de mots'] || $json['Conteggio delle parole'] || $json['Conteggio delle parole'] || $json['Recuento de palabras'] || $json['Wortanzahl'] }}"
}
]
}
},
"typeVersion": 3.399999999999999911182158029987476766109466552734375
},
{
"id": "1a9af7a0-d2d5-44cb-9770-2d5a1e5706f4",
"name": "Text Classifier",
"type": "@n8n\/n8n-nodes-langchain.textClassifier",
"disabled": true,
"position": [
2260,
60
],
"parameters": {
"options": [],
"inputText": "=url : {{ $json.url }}\ntitle : {{ $json.title }}\ndescription : {{ $json.description }}\nwords count : {{ $json.word_count }}",
"categories": {
"categories": [
{
"category": "useful_content",
"description": "Pages that are likely to contain high-quality content, making them suitable for inclusion in a file that aids content discovery for an LLM. "
},
{
"category": "other_content",
"description": "Pages that should not be included (e.g., pagination, or low-value content)."
}
]
}
},
"typeVersion": 1
},
{
"id": "74a4e378-4228-4142-92ca-e541efde2b15",
"name": "OpenAI Chat Model",
"type": "@n8n\/n8n-nodes-langchain.lmChatOpenAi",
"position": [
2180,
240
],
"parameters": {
"model": {
"__rl": true,
"mode": "list",
"value": "gpt-4o-mini"
},
"options": []
},
"credentials": {
"openAiApi": {
"id": "",
"name": "OpenAi Connection"
}
},
"typeVersion": 1.1999999999999999555910790149937383830547332763671875
},
{
"id": "63dc6cfe-bc73-43b5-8c7d-4f5fd6501d3b",
"name": "No Operation, do nothing",
"type": "n8n-nodes-base.noOp",
"position": [
2580,
200
],
"parameters": [],
"typeVersion": 1
},
{
"id": "cb555b99-9e63-4b6b-a1fc-512b5467d666",
"name": "Sticky Note3",
"type": "n8n-nodes-base.stickyNote",
"position": [
1620,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Filter URLs \n\nThis **filter node** is used to keep only the URLs that meet the following conditions: \n- `status` = **200** \n- `indexability` = **indexable** \n- `content_type` contains **text\/html** \n\n\nThese filters are even **more useful** if the uploaded file is an **internal_all.csv** instead of an **internal_html.csv**. \n\n### **Tips:** \nYou can **add more filters** to refine the URLs included in your `llms.txt` file. \n\n\ud83d\udca1 **Examples:** \n- **Filter by word count** \u2192 Ensure pages contain **enough text content**. \n- **Filter by URL path** \u2192 Keep only **specific folders or categories** in the `llms.txt` file. \n- **Filter by meta description** \u2192 Exclude URLs **without a meta description**, as this field will be used in the `llms.txt` file to describe each piece of content. \n"
},
"typeVersion": 1
},
{
"id": "e34e56e2-5cc8-4e50-bfb0-3aa2e5e04abf",
"name": "Filter URLs",
"type": "n8n-nodes-base.filter",
"position": [
1740,
60
],
"parameters": {
"options": [],
"conditions": {
"options": {
"version": 2,
"leftValue": "",
"caseSensitive": true,
"typeValidation": "strict"
},
"combinator": "and",
"conditions": [
{
"id": "cef4feaa-1c46-45b1-92b7-f5c2051b1dc5",
"operator": {
"type": "number",
"operation": "equals"
},
"leftValue": "={{ Number($json.statut) }}",
"rightValue": 200
},
{
"id": "bb821656-9740-4da4-8aa9-f65ad098c470",
"operator": {
"type": "boolean",
"operation": "true",
"singleValue": true
},
"leftValue": "={{ [\"Indexable\", \"Indicizzabile\", \"Indexierbar\"].includes($json.indexability) }}",
"rightValue": "={{ \"Indexable\" || \"Indicizzabile\" }}"
},
{
"id": "5c93ddb8-8091-406a-bc04-fa14e8b73fb9",
"operator": {
"type": "string",
"operation": "contains"
},
"leftValue": "={{ $json.content_type }}",
"rightValue": "text\/html"
}
]
}
},
"typeVersion": 2.20000000000000017763568394002504646778106689453125
},
{
"id": "b98f19a8-afd3-4d26-8063-dee3ee75055f",
"name": "Sticky Note4",
"type": "n8n-nodes-base.stickyNote",
"position": [
2040,
-800
],
"parameters": {
"color": 2,
"width": 740,
"height": 1160,
"content": "## Text Classifier\n\n\ud83d\udeab **This node is deactivated by default** in the template. \n\nYou can **enable it** if you want to add a more **\"intelligent\" \ud83e\udd13 filter** to refine the URLs included in the `llms.txt` file, helping LLMs discover and prioritize valuable content.\n\n### How It Works:\nThis node has **two outputs**: \n- **`useful_content`** \u2192 Pages that are **likely to contain high-quality content**, making them suitable for inclusion in a file that **aids content discovery for an LLM**. \n- **`other_content`** \u2192 Pages that should **not** be included (e.g., pagination or low-value content). \n\n\nYou can **modify the description** in the node to fine-tune the classification according to your needs. \n\n### Input Fields:\n- **url** \u2192 `{{ $json.url }}` \n- **title** \u2192 `{{ $json.title }}` \n- **description** \u2192 `{{ $json.description }}` \n- **word_count** \u2192 `{{ $json.word_count }}` \n\n### Why use an LLM? \nA **language model (LLM)** can **analyze** the **URL, title, and description** to identify pages that **most likely contain meaningful and relevant content**. \nThis allows it to **prioritize valuable pages** and structure the data for **better content discovery and training purposes**. \n\n### **For large websites** \nIf you have a **very large website**, consider using a **Loop Over Items** node to make the workflow **more robust** and ensure all pages are processed. \nAlso, using a **Loop Over Items** node make it **easier** to handle: \n- **Timeouts** \n- **API quotas** \n- **Other scalability issues**\n\n### Tokens usage\nFinally, keep in mind that **more pages mean more tokens and more billed LLM API calls**.\n\n\n\n\n\n\n\n"
},
"typeVersion": 1
},
{
"id": "63e3ea7a-cec3-442c-9812-771def0a9949",
"name": "Sticky Note5",
"type": "n8n-nodes-base.stickyNote",
"position": [
2840,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Set Field - llms.txt Row\n\nThis node **sets** the row format for the `llms.txt` file. \n\n### Row Structure:\nEach row follows this format: \n\n- `- [title](link): description` \n\nIf the URL **has no description** (from the **Meta Description** in the Screaming Frog export), the row will be: \n\n- `- [title](link)` \n"
},
"typeVersion": 1
},
{
"id": "78f58220-feb5-4044-b994-39a0e4f1e9e4",
"name": "Sticky Note6",
"type": "n8n-nodes-base.stickyNote",
"position": [
3260,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Summarize - Concatenate\n\nThis node concatenates all the output from the previous node, ensuring each row is on a separate line."
},
"typeVersion": 1
},
{
"id": "7a119633-7cd3-4de5-a1cd-7f708e1abf4a",
"name": "Sticky Note7",
"type": "n8n-nodes-base.stickyNote",
"position": [
3680,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Set Fields - llms.txt Content\n\nThis node sets the content of the `llms.txt` file using:\n\n- The **website title** provided in the form (first node).\n- The **website description** provided in the form (first node).\n- The output from the previous node, which includes all the URLs, their titles, and their descriptions that will appear in the `llms.txt` file.\n"
},
"typeVersion": 1
},
{
"id": "554f6858-68e8-4b35-a6c4-21bed6832323",
"name": "Sticky Note8",
"type": "n8n-nodes-base.stickyNote",
"position": [
4100,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## Generate llms.txt file\n\nThis node **creates** the `llms.txt` file, which can be **downloaded directly** within n8n. \n"
},
"typeVersion": 1
},
{
"id": "24bdefba-e2f2-41f0-93e7-9f8d2fc11f43",
"name": "Sticky Note9",
"type": "n8n-nodes-base.stickyNote",
"position": [
4520,
-500
],
"parameters": {
"color": 7,
"width": 360,
"height": 860,
"content": "## upload file anywhere\n\nInstead of downloading the file directly from the n8n workflow, you can **replace this node node** with a Drive node (e.g., **Google Drive** or **OneDrive**) to upload the `llms.txt` file to a folder of your choice. \n \n**Name the file properly** (e.g., include the website name) to make it easier to find and distinguish between files when working on multiple websites. \n"
},
"typeVersion": 1
},
{
"id": "a3be51e3-810c-40a7-a996-98a3d383c2b9",
"name": "Summarize - Concatenate",
"type": "n8n-nodes-base.summarize",
"position": [
3380,
40
],
"parameters": {
"options": [],
"fieldsToSummarize": {
"values": [
{
"field": "llmTxtRow",
"separateBy": "\n",
"aggregation": "concatenate"
}
]
}
},
"typeVersion": 1.100000000000000088817841970012523233890533447265625
},
{
"id": "8d3a892a-3d11-4d8a-8ec6-84f8f3af1183",
"name": "Set Fields - llms.txt Content",
"type": "n8n-nodes-base.set",
"position": [
3820,
40
],
"parameters": {
"options": [],
"assignments": {
"assignments": [
{
"id": "97062a99-e944-4e1e-89b1-62cf9e3462dd",
"name": "llmsTxtFile",
"type": "string",
"value": "=# {{ $('Form - Screaming frog internal_html.csv upload').item.json['What is the name of your website?'] }}\n> {{ $('Form - Screaming frog internal_html.csv upload').item.json['Can you provide a short description of your website? (in the language of the website)'] }}\n\n{{ $json.concatenated_llmTxtRow }}"
}
]
}
},
"typeVersion": 3.399999999999999911182158029987476766109466552734375
},
{
"id": "bc2a692a-47ea-4bf1-a102-e607fd544158",
"name": "upload file anywhere",
"type": "n8n-nodes-base.noOp",
"position": [
4640,
40
],
"parameters": [],
"typeVersion": 1
},
{
"id": "404510a2-35b2-44cf-9d02-eb0abcf4e9b3",
"name": "Set Field - llms.txt Row",
"type": "n8n-nodes-base.set",
"position": [
2960,
40
],
"parameters": {
"options": [],
"assignments": {
"assignments": [
{
"id": "95e75caa-8110-476b-9cb1-73c15361fa56",
"name": "llmTxtRow",
"type": "string",
"value": "=- [{{ $json.title }}]({{ $json.url }}){{ $json.description ? ': ' + $json.description : '' }}"
}
]
}
},
"typeVersion": 3.399999999999999911182158029987476766109466552734375
},
{
"id": "f54d51f2-17bc-4c58-b177-0e77e16f7b72",
"name": "Sticky Note10",
"type": "n8n-nodes-base.stickyNote",
"position": [
-420,
-1020
],
"parameters": {
"color": 5,
"width": 700,
"height": 1380,
"content": "# Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls \n\nThis workflow helps you generate an **llms.txt** file (if you're unfamiliar with it, check out [this article](https:\/\/towardsdatascience.com\/llms-txt-414d5121bcb3\/)) using a **Screaming Frog export**. \n\n[Screaming Frog](https:\/\/www.screamingfrog.co.uk\/seo-spider\/) is a well-known website crawler. \nYou can easily crawl a website. Then, export the **\"internal_html\"** section in CSV format. \n\n## How It Works: \n\nA **form** allows you to enter: \n- The **name of the website** \n- A **short description** \n- The **internal_html.csv** file from your Screaming Frog export \n\n\nOnce the form is submitted, the **workflow is triggered automatically**, and you can **download the llms.txt file directly from n8n**. \n\n## Downloading the File\nSince the last node in this workflow is **\"Convert to File\"**, you will need to **download the file directly from the n8n UI**. \nHowever, you can easily **add a node** (e.g., Google Drive, OneDrive) to automatically upload the file **wherever you want**. \n\n## AI-Powered Filtering (Optional): \nThis workflow includes a **text classifier node**, which is **deactivated by default**. \n- You can **activate it** to apply a more **intelligent filter** to select URLs for the `llms.txt` file. \n- Consider modifying the **description** in the classifier node to specify the type of URLs you want to include. \n\n## How to Use This Workflow \n\n1. **Crawl the website** you want to generate an `llms.txt` file for using **Screaming Frog**. \n2. **Export the \"internal_html\"** section in CSV format. \n  \n3. In **n8n**, click **\"Test Workflow\"**, fill in the form, and **upload** the `internal_html.csv` file. \n4. Once the workflow is complete, go to the **\"Export to File\"** node and **download the output**. \n\n**That's it! You now have your llms.txt file!** \n\n\n\n**Recommended Usage:** \nUse this workflow **directly in the n8n UI by clicking** 'Test Workflow' and uploading the file in the form."
},
"typeVersion": 1
},
{
"id": "e33104af-802a-43f2-b26d-f368f7de2fd7",
"name": "Form - Screaming frog internal_html.csv upload",
"type": "n8n-nodes-base.formTrigger",
"position": [
460,
60
],
"webhookId": "8791f39a-3d81-405c-b177-0a733ebf74cb",
"parameters": {
"options": {
"buttonLabel": "Get the llms.txt file"
},
"formTitle": "llms.txt Generator - From Screaming Frog export",
"formFields": {
"values": [
{
"fieldLabel": "What is the name of your website?",
"placeholder": "Example : The best website ever",
"requiredField": true
},
{
"fieldLabel": "Can you provide a short description of your website? (in the language of the website)",
"placeholder": "Example : This is the best website ever because all the content is engaging and valuable.",
"requiredField": true
},
{
"fieldType": "file",
"fieldLabel": "screaming_frog_export",
"multipleFiles": false,
"requiredField": true,
"acceptFileTypes": ".csv"
}
]
},
"responseMode": "lastNode",
"formDescription": "Generate a simple llms.txt file from a Screaming Frog Export\nIt is recommended to use the internal_html.csv export, although internal_all.csv will also work.\n\nFill in the fields in this form.Just fill in the fields in this form \ud83d\ude04"
},
"typeVersion": 2.20000000000000017763568394002504646778106689453125
},
{
"id": "f6b17fdd-a098-411e-8d53-3f6e638cc3ba",
"name": "Extract data from Screaming Frog file",
"type": "n8n-nodes-base.extractFromFile",
"position": [
900,
60
],
"parameters": {
"options": [],
"operation": "xls",
"binaryPropertyName": "screaming_frog_export"
},
"typeVersion": 1
},
{
"id": "6bbd8d1f-3322-4c6d-af08-c842386239ce",
"name": "Generate llms.txt file",
"type": "n8n-nodes-base.convertToFile",
"position": [
4220,
40
],
"parameters": {
"options": {
"encoding": "utf8",
"fileName": "llms.txt"
},
"operation": "toText",
"sourceProperty": "llmsTxtFile"
},
"typeVersion": 1.100000000000000088817841970012523233890533447265625
}
],
"active": false,
"pinData": [],
"settings": {
"executionOrder": "v1"
},
"versionId": "",
"connections": {
"Filter URLs": {
"main": [
[
{
"node": "Text Classifier",
"type": "main",
"index": 0
}
]
]
},
"Text Classifier": {
"main": [
[
{
"node": "Set Field - llms.txt Row",
"type": "main",
"index": 0
}
],
[
{
"node": "No Operation, do nothing",
"type": "main",
"index": 0
}
]
]
},
"OpenAI Chat Model": {
"ai_languageModel": [
[
{
"node": "Text Classifier",
"type": "ai_languageModel",
"index": 0
}
]
]
},
"Set useful fields": {
"main": [
[
{
"node": "Filter URLs",
"type": "main",
"index": 0
}
]
]
},
"Generate llms.txt file": {
"main": [
[]
]
},
"Summarize - Concatenate": {
"main": [
[
{
"node": "Set Fields - llms.txt Content",
"type": "main",
"index": 0
}
]
]
},
"Set Field - llms.txt Row": {
"main": [
[
{
"node": "Summarize - Concatenate",
"type": "main",
"index": 0
}
]
]
},
"Set Fields - llms.txt Content": {
"main": [
[
{
"node": "Generate llms.txt file",
"type": "main",
"index": 0
}
]
]
},
"Extract data from Screaming Frog file": {
"main": [
[
{
"node": "Set useful fields",
"type": "main",
"index": 0
}
]
]
},
"Form - Screaming frog internal_html.csv upload": {
"main": [
[
{
"node": "Extract data from Screaming Frog file",
"type": "main",
"index": 0
}
]
]
}
}
}