Workflow: Filter Summarize Create

Workflow Details

{
    "id": "",
    "meta": {
        "instanceId": "",
        "templateCredsSetupCompleted": true
    },
    "name": "Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls",
    "tags": [],
    "nodes": [
        {
            "id": "ca701618-b2d5-48ee-a503-d3513d018a65",
            "name": "Sticky Note",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                360,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Form - Screaming Frog internal_html.csv upload  \n\nThis form node is used to trigger the workflow.  \n\nIt contains **three input fields**:  \n- Name of the website  \n- Short description of the website  \n- **Screaming Frog** export containing the internal URLs  \n\n\n\nIt is recommended to use the **internal_html.csv** export, but **internal_all.csv** will also work, as the workflow includes a filter to process only indexable URLs.\n"
            },
            "typeVersion": 1
        },
        {
            "id": "bc040ca0-f38d-4458-a60c-17f71dbfd1ea",
            "name": "Sticky Note1",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                780,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Extract data from Screaming Frog file\n\nThis node extracts data from the **CSV file** provided by the user.  \n\nIt produces an output that is **easily usable** in the following nodes.  \n\n\u26a0\ufe0f **Caution:**  \nIf the uploaded file is **not** the expected Screaming Frog export, the workflow will still proceed but will likely **fail in the next steps** due to missing required fields.  \n\n"
            },
            "typeVersion": 1
        },
        {
            "id": "f71a7d10-847d-48e7-8820-ec0c1e7ea055",
            "name": "Sticky Note2",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                1200,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Set Useful Fields  \n\nThis node sets **7 key fields** from the Screaming Frog export:  \n\n- `url` \u2192 from the **\"Address\"** column  \n- `title` \u2192 from the **\"Title 1\"** column  \n- `description` \u2192 from the **\"Meta Description 1\"** column  \n- `status` \u2192 from the **\"Status Code\"** column  \n- `indexability` \u2192 from the **\"Indexability\"** column  \n- `content_type` \u2192 from the **\"Content Type\"** column  \n- `word_count` \u2192 from the **\"Word Count\"** column  \n\n\n**Multi-language compatibility**  \nIf you're using Screaming Frog in **French, Italian, German, or Spanish**, the column names will be different.  \nHowever, the workflow is designed to handle this, so it will **still work correctly**! \ud83e\udd73\n"
            },
            "typeVersion": 1
        },
        {
            "id": "6f6546b8-adeb-4998-ae19-d93525337eb7",
            "name": "Set useful fields",
            "type": "n8n-nodes-base.set",
            "position": [
                1340,
                60
            ],
            "parameters": {
                "options": [],
                "assignments": {
                    "assignments": [
                        {
                            "id": "0e7d4a06-83fc-4834-93fe-2e758cbe2307",
                            "name": "url",
                            "type": "string",
                            "value": "={{ $json.Address || $json.Adresse || $json.Direcci\u00f3n || $json.Indirizzo }}"
                        },
                        {
                            "id": "c82f4d4c-9d0b-4c7d-9647-5d0240b58643",
                            "name": "title",
                            "type": "string",
                            "value": "={{ $json['Title 1'] || $json['Titolo 1'] || $json['Titolo 1'] || $json['T\u00edtulo 1'] || $json['Titel 1'] }}"
                        },
                        {
                            "id": "abea81db-ce3b-4ac1-bd21-09ccfffb567a",
                            "name": "description",
                            "type": "string",
                            "value": "={{ $json['Meta Description 1'] || $json['Meta description 1'] }}"
                        },
                        {
                            "id": "2ca75d74-70f8-400b-b862-9da186135915",
                            "name": "statut",
                            "type": "string",
                            "value": "={{ $json['Status Code'] || $json['Code HTTP'] || $json['Status-Code'] || $json['C\u00f3digo de respuesta'] || $json['Codice di stato']}}"
                        },
                        {
                            "id": "754d3202-38b0-4d79-ba24-8078b3244307",
                            "name": "indexability",
                            "type": "string",
                            "value": "={{ $json.Indexability || $json.Indexabilit\u00e9 || $json.Indicizzabilit\u00e0 || $json.Indexabilidad || $json.Indexierbarkeit}}"
                        },
                        {
                            "id": "8bc6583d-bb34-4d22-b310-fe79bb8ac85d",
                            "name": "content_type",
                            "type": "string",
                            "value": "={{ $json['Content Type'] || $json['Type de contenu'] || $json['Tipo di contenuto'] || $json['Tipo de contenido'] || $json['Inhaltstyp']}}"
                        },
                        {
                            "id": "c874ba1a-769e-43d3-9555-8c9914ca9b76",
                            "name": "word_count",
                            "type": "string",
                            "value": "={{ $json['Word Count'] || $json['Nombre de mots'] || $json['Conteggio delle parole'] || $json['Conteggio delle parole'] || $json['Recuento de palabras'] || $json['Wortanzahl'] }}"
                        }
                    ]
                }
            },
            "typeVersion": 3.399999999999999911182158029987476766109466552734375
        },
        {
            "id": "1a9af7a0-d2d5-44cb-9770-2d5a1e5706f4",
            "name": "Text Classifier",
            "type": "@n8n\/n8n-nodes-langchain.textClassifier",
            "disabled": true,
            "position": [
                2260,
                60
            ],
            "parameters": {
                "options": [],
                "inputText": "=url : {{ $json.url }}\ntitle : {{ $json.title }}\ndescription : {{ $json.description }}\nwords count : {{ $json.word_count }}",
                "categories": {
                    "categories": [
                        {
                            "category": "useful_content",
                            "description": "Pages that are likely to contain high-quality content, making them suitable for inclusion in a file that aids content discovery for an LLM. "
                        },
                        {
                            "category": "other_content",
                            "description": "Pages that should not be included (e.g., pagination, or low-value content)."
                        }
                    ]
                }
            },
            "typeVersion": 1
        },
        {
            "id": "74a4e378-4228-4142-92ca-e541efde2b15",
            "name": "OpenAI Chat Model",
            "type": "@n8n\/n8n-nodes-langchain.lmChatOpenAi",
            "position": [
                2180,
                240
            ],
            "parameters": {
                "model": {
                    "__rl": true,
                    "mode": "list",
                    "value": "gpt-4o-mini"
                },
                "options": []
            },
            "credentials": {
                "openAiApi": {
                    "id": "",
                    "name": "OpenAi Connection"
                }
            },
            "typeVersion": 1.1999999999999999555910790149937383830547332763671875
        },
        {
            "id": "63dc6cfe-bc73-43b5-8c7d-4f5fd6501d3b",
            "name": "No Operation, do nothing",
            "type": "n8n-nodes-base.noOp",
            "position": [
                2580,
                200
            ],
            "parameters": [],
            "typeVersion": 1
        },
        {
            "id": "cb555b99-9e63-4b6b-a1fc-512b5467d666",
            "name": "Sticky Note3",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                1620,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Filter URLs \n\nThis **filter node** is used to keep only the URLs that meet the following conditions:  \n- `status` = **200**  \n- `indexability` = **indexable**  \n- `content_type` contains **text\/html**  \n\n\nThese filters are even **more useful** if the uploaded file is an **internal_all.csv** instead of an **internal_html.csv**.  \n\n### **Tips:**  \nYou can **add more filters** to refine the URLs included in your `llms.txt` file.  \n\n\ud83d\udca1 **Examples:**  \n- **Filter by word count** \u2192 Ensure pages contain **enough text content**.  \n- **Filter by URL path** \u2192 Keep only **specific folders or categories** in the `llms.txt` file.  \n- **Filter by meta description** \u2192 Exclude URLs **without a meta description**, as this field will be used in the `llms.txt` file to describe each piece of content.  \n"
            },
            "typeVersion": 1
        },
        {
            "id": "e34e56e2-5cc8-4e50-bfb0-3aa2e5e04abf",
            "name": "Filter URLs",
            "type": "n8n-nodes-base.filter",
            "position": [
                1740,
                60
            ],
            "parameters": {
                "options": [],
                "conditions": {
                    "options": {
                        "version": 2,
                        "leftValue": "",
                        "caseSensitive": true,
                        "typeValidation": "strict"
                    },
                    "combinator": "and",
                    "conditions": [
                        {
                            "id": "cef4feaa-1c46-45b1-92b7-f5c2051b1dc5",
                            "operator": {
                                "type": "number",
                                "operation": "equals"
                            },
                            "leftValue": "={{ Number($json.statut) }}",
                            "rightValue": 200
                        },
                        {
                            "id": "bb821656-9740-4da4-8aa9-f65ad098c470",
                            "operator": {
                                "type": "boolean",
                                "operation": "true",
                                "singleValue": true
                            },
                            "leftValue": "={{ [\"Indexable\", \"Indicizzabile\", \"Indexierbar\"].includes($json.indexability) }}",
                            "rightValue": "={{ \"Indexable\" || \"Indicizzabile\" }}"
                        },
                        {
                            "id": "5c93ddb8-8091-406a-bc04-fa14e8b73fb9",
                            "operator": {
                                "type": "string",
                                "operation": "contains"
                            },
                            "leftValue": "={{ $json.content_type }}",
                            "rightValue": "text\/html"
                        }
                    ]
                }
            },
            "typeVersion": 2.20000000000000017763568394002504646778106689453125
        },
        {
            "id": "b98f19a8-afd3-4d26-8063-dee3ee75055f",
            "name": "Sticky Note4",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                2040,
                -800
            ],
            "parameters": {
                "color": 2,
                "width": 740,
                "height": 1160,
                "content": "## Text Classifier\n\n\ud83d\udeab **This node is deactivated by default** in the template.  \n\nYou can **enable it** if you want to add a more **\"intelligent\" \ud83e\udd13 filter** to refine the URLs included in the `llms.txt` file, helping LLMs discover and prioritize valuable content.\n\n### How It Works:\nThis node has **two outputs**:  \n- **`useful_content`** \u2192 Pages that are **likely to contain high-quality content**, making them suitable for inclusion in a file that **aids content discovery for an LLM**.  \n- **`other_content`** \u2192 Pages that should **not** be included (e.g., pagination or low-value content).  \n\n\nYou can **modify the description** in the node to fine-tune the classification according to your needs.  \n\n### Input Fields:\n- **url** \u2192 `{{ $json.url }}`  \n- **title** \u2192 `{{ $json.title }}`  \n- **description** \u2192 `{{ $json.description }}`  \n- **word_count** \u2192 `{{ $json.word_count }}`  \n\n### Why use an LLM?  \nA **language model (LLM)** can **analyze** the **URL, title, and description** to identify pages that **most likely contain meaningful and relevant content**.  \nThis allows it to **prioritize valuable pages** and structure the data for **better content discovery and training purposes**. \n\n### **For large websites**  \nIf you have a **very large website**, consider using a **Loop Over Items** node to make the workflow **more robust** and ensure all pages are processed.  \nAlso, using a **Loop Over Items** node make it **easier** to handle:  \n- **Timeouts** \n- **API quotas** \n- **Other scalability issues**\n\n### Tokens usage\nFinally, keep in mind that **more pages mean more tokens and more billed LLM API calls**.\n\n\n\n\n\n\n\n"
            },
            "typeVersion": 1
        },
        {
            "id": "63e3ea7a-cec3-442c-9812-771def0a9949",
            "name": "Sticky Note5",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                2840,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Set Field - llms.txt Row\n\nThis node **sets** the row format for the `llms.txt` file.  \n\n### Row Structure:\nEach row follows this format:  \n\n- `- [title](link): description`  \n\nIf the URL **has no description** (from the **Meta Description** in the Screaming Frog export), the row will be:  \n\n- `- [title](link)`  \n"
            },
            "typeVersion": 1
        },
        {
            "id": "78f58220-feb5-4044-b994-39a0e4f1e9e4",
            "name": "Sticky Note6",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                3260,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Summarize - Concatenate\n\nThis node concatenates all the output from the previous node, ensuring each row is on a separate line."
            },
            "typeVersion": 1
        },
        {
            "id": "7a119633-7cd3-4de5-a1cd-7f708e1abf4a",
            "name": "Sticky Note7",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                3680,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Set Fields - llms.txt Content\n\nThis node sets the content of the `llms.txt` file using:\n\n- The **website title** provided in the form (first node).\n- The **website description** provided in the form (first node).\n- The output from the previous node, which includes all the URLs, their titles, and their descriptions that will appear in the `llms.txt` file.\n"
            },
            "typeVersion": 1
        },
        {
            "id": "554f6858-68e8-4b35-a6c4-21bed6832323",
            "name": "Sticky Note8",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                4100,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Generate llms.txt file\n\nThis node **creates** the `llms.txt` file, which can be **downloaded directly** within n8n. \n"
            },
            "typeVersion": 1
        },
        {
            "id": "24bdefba-e2f2-41f0-93e7-9f8d2fc11f43",
            "name": "Sticky Note9",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                4520,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## upload file anywhere\n\nInstead of downloading the file directly from the n8n workflow, you can **replace this node node** with a Drive node (e.g., **Google Drive** or **OneDrive**) to upload the `llms.txt` file to a folder of your choice.  \n  \n**Name the file properly** (e.g., include the website name) to make it easier to find and distinguish between files when working on multiple websites.  \n"
            },
            "typeVersion": 1
        },
        {
            "id": "a3be51e3-810c-40a7-a996-98a3d383c2b9",
            "name": "Summarize - Concatenate",
            "type": "n8n-nodes-base.summarize",
            "position": [
                3380,
                40
            ],
            "parameters": {
                "options": [],
                "fieldsToSummarize": {
                    "values": [
                        {
                            "field": "llmTxtRow",
                            "separateBy": "\n",
                            "aggregation": "concatenate"
                        }
                    ]
                }
            },
            "typeVersion": 1.100000000000000088817841970012523233890533447265625
        },
        {
            "id": "8d3a892a-3d11-4d8a-8ec6-84f8f3af1183",
            "name": "Set Fields - llms.txt Content",
            "type": "n8n-nodes-base.set",
            "position": [
                3820,
                40
            ],
            "parameters": {
                "options": [],
                "assignments": {
                    "assignments": [
                        {
                            "id": "97062a99-e944-4e1e-89b1-62cf9e3462dd",
                            "name": "llmsTxtFile",
                            "type": "string",
                            "value": "=# {{ $('Form - Screaming frog internal_html.csv upload').item.json['What is the name of your website?'] }}\n> {{ $('Form - Screaming frog internal_html.csv upload').item.json['Can you provide a short description of your website? (in the language of the website)'] }}\n\n{{ $json.concatenated_llmTxtRow }}"
                        }
                    ]
                }
            },
            "typeVersion": 3.399999999999999911182158029987476766109466552734375
        },
        {
            "id": "bc2a692a-47ea-4bf1-a102-e607fd544158",
            "name": "upload file anywhere",
            "type": "n8n-nodes-base.noOp",
            "position": [
                4640,
                40
            ],
            "parameters": [],
            "typeVersion": 1
        },
        {
            "id": "404510a2-35b2-44cf-9d02-eb0abcf4e9b3",
            "name": "Set Field - llms.txt Row",
            "type": "n8n-nodes-base.set",
            "position": [
                2960,
                40
            ],
            "parameters": {
                "options": [],
                "assignments": {
                    "assignments": [
                        {
                            "id": "95e75caa-8110-476b-9cb1-73c15361fa56",
                            "name": "llmTxtRow",
                            "type": "string",
                            "value": "=- [{{ $json.title }}]({{ $json.url }}){{ $json.description ? ': ' + $json.description : '' }}"
                        }
                    ]
                }
            },
            "typeVersion": 3.399999999999999911182158029987476766109466552734375
        },
        {
            "id": "f54d51f2-17bc-4c58-b177-0e77e16f7b72",
            "name": "Sticky Note10",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                -420,
                -1020
            ],
            "parameters": {
                "color": 5,
                "width": 700,
                "height": 1380,
                "content": "# Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls  \n\nThis workflow helps you generate an **llms.txt** file (if you're unfamiliar with it, check out [this article](https:\/\/towardsdatascience.com\/llms-txt-414d5121bcb3\/)) using a **Screaming Frog export**.  \n\n[Screaming Frog](https:\/\/www.screamingfrog.co.uk\/seo-spider\/) is a well-known website crawler.  \nYou can easily crawl a website. Then, export the **\"internal_html\"** section in CSV format.  \n\n## How It Works: \n\nA **form** allows you to enter:  \n- The **name of the website**  \n- A **short description**  \n- The **internal_html.csv** file from your Screaming Frog export  \n\n\nOnce the form is submitted, the **workflow is triggered automatically**, and you can **download the llms.txt file directly from n8n**. \n\n## Downloading the File\nSince the last node in this workflow is **\"Convert to File\"**, you will need to **download the file directly from the n8n UI**.  \nHowever, you can easily **add a node** (e.g., Google Drive, OneDrive) to automatically upload the file **wherever you want**.  \n\n## AI-Powered Filtering (Optional):  \nThis workflow includes a **text classifier node**, which is **deactivated by default**.  \n- You can **activate it** to apply a more **intelligent filter** to select URLs for the `llms.txt` file.  \n- Consider modifying the **description** in the classifier node to specify the type of URLs you want to include.  \n\n## How to Use This Workflow  \n\n1. **Crawl the website** you want to generate an `llms.txt` file for using **Screaming Frog**.  \n2. **Export the \"internal_html\"** section in CSV format.  \n   ![Screaming Frog internal html export](https:\/\/i.imgur.com\/M0nJQiV.png)  \n3. In **n8n**, click **\"Test Workflow\"**, fill in the form, and **upload** the `internal_html.csv` file.  \n4. Once the workflow is complete, go to the **\"Export to File\"** node and **download the output**.  \n\n**That's it! You now have your llms.txt file!**  \n\n\n\n**Recommended Usage:**  \nUse this workflow **directly in the n8n UI by clicking** 'Test Workflow' and uploading the file in the form."
            },
            "typeVersion": 1
        },
        {
            "id": "e33104af-802a-43f2-b26d-f368f7de2fd7",
            "name": "Form - Screaming frog internal_html.csv upload",
            "type": "n8n-nodes-base.formTrigger",
            "position": [
                460,
                60
            ],
            "webhookId": "8791f39a-3d81-405c-b177-0a733ebf74cb",
            "parameters": {
                "options": {
                    "buttonLabel": "Get the llms.txt file"
                },
                "formTitle": "llms.txt Generator - From Screaming Frog export",
                "formFields": {
                    "values": [
                        {
                            "fieldLabel": "What is the name of your website?",
                            "placeholder": "Example : The best website ever",
                            "requiredField": true
                        },
                        {
                            "fieldLabel": "Can you provide a short description of your website? (in the language of the website)",
                            "placeholder": "Example : This is the best website ever because all the content is engaging and valuable.",
                            "requiredField": true
                        },
                        {
                            "fieldType": "file",
                            "fieldLabel": "screaming_frog_export",
                            "multipleFiles": false,
                            "requiredField": true,
                            "acceptFileTypes": ".csv"
                        }
                    ]
                },
                "responseMode": "lastNode",
                "formDescription": "Generate a simple llms.txt file from a Screaming Frog Export\nIt is recommended to use the internal_html.csv export, although internal_all.csv will also work.\n\nFill in the fields in this form.Just fill in the fields in this form  \ud83d\ude04"
            },
            "typeVersion": 2.20000000000000017763568394002504646778106689453125
        },
        {
            "id": "f6b17fdd-a098-411e-8d53-3f6e638cc3ba",
            "name": "Extract data from Screaming Frog file",
            "type": "n8n-nodes-base.extractFromFile",
            "position": [
                900,
                60
            ],
            "parameters": {
                "options": [],
                "operation": "xls",
                "binaryPropertyName": "screaming_frog_export"
            },
            "typeVersion": 1
        },
        {
            "id": "6bbd8d1f-3322-4c6d-af08-c842386239ce",
            "name": "Generate llms.txt file",
            "type": "n8n-nodes-base.convertToFile",
            "position": [
                4220,
                40
            ],
            "parameters": {
                "options": {
                    "encoding": "utf8",
                    "fileName": "llms.txt"
                },
                "operation": "toText",
                "sourceProperty": "llmsTxtFile"
            },
            "typeVersion": 1.100000000000000088817841970012523233890533447265625
        }
    ],
    "active": false,
    "pinData": [],
    "settings": {
        "executionOrder": "v1"
    },
    "versionId": "",
    "connections": {
        "Filter URLs": {
            "main": [
                [
                    {
                        "node": "Text Classifier",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Text Classifier": {
            "main": [
                [
                    {
                        "node": "Set Field - llms.txt Row",
                        "type": "main",
                        "index": 0
                    }
                ],
                [
                    {
                        "node": "No Operation, do nothing",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "OpenAI Chat Model": {
            "ai_languageModel": [
                [
                    {
                        "node": "Text Classifier",
                        "type": "ai_languageModel",
                        "index": 0
                    }
                ]
            ]
        },
        "Set useful fields": {
            "main": [
                [
                    {
                        "node": "Filter URLs",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Generate llms.txt file": {
            "main": [
                []
            ]
        },
        "Summarize - Concatenate": {
            "main": [
                [
                    {
                        "node": "Set Fields - llms.txt Content",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Set Field - llms.txt Row": {
            "main": [
                [
                    {
                        "node": "Summarize - Concatenate",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Set Fields - llms.txt Content": {
            "main": [
                [
                    {
                        "node": "Generate llms.txt file",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Extract data from Screaming Frog file": {
            "main": [
                [
                    {
                        "node": "Set useful fields",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Form - Screaming frog internal_html.csv upload": {
            "main": [
                [
                    {
                        "node": "Extract data from Screaming Frog file",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        }
    }
}
Back to Workflows
Related Workflows

Wait Splitout Create Webhook

View
Manual Dropbox Automation Webhook

View
Code Postgres Automate Triggered

View
Functionitem Zendesk Create Scheduled

View
[n8n] - Shopify Orders to D365 Business Central Sales Orders / Sales Invoices

View
Splitout Schedule Create Scheduled

View
Stopanderror Extractfromfile Send Webhook

View
🤖 AI Powered RAG Chatbot for Your Docs + Google Drive + Gemini + Qdrant

View
Automation

View
Open Deep Research - AI-Powered Autonomous Research Workflow

View