Workflow: Filter Summarize Create

Workflow Details

Download Workflow
{
    "id": "",
    "meta": {
        "instanceId": "",
        "templateCredsSetupCompleted": true
    },
    "name": "Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls",
    "tags": [],
    "nodes": [
        {
            "id": "ca701618-b2d5-48ee-a503-d3513d018a65",
            "name": "Sticky Note",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                360,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Form - Screaming Frog internal_html.csv upload  \n\nThis form node is used to trigger the workflow.  \n\nIt contains **three input fields**:  \n- Name of the website  \n- Short description of the website  \n- **Screaming Frog** export containing the internal URLs  \n\n\n\nIt is recommended to use the **internal_html.csv** export, but **internal_all.csv** will also work, as the workflow includes a filter to process only indexable URLs.\n"
            },
            "typeVersion": 1
        },
        {
            "id": "bc040ca0-f38d-4458-a60c-17f71dbfd1ea",
            "name": "Sticky Note1",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                780,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Extract data from Screaming Frog file\n\nThis node extracts data from the **CSV file** provided by the user.  \n\nIt produces an output that is **easily usable** in the following nodes.  \n\n\u26a0\ufe0f **Caution:**  \nIf the uploaded file is **not** the expected Screaming Frog export, the workflow will still proceed but will likely **fail in the next steps** due to missing required fields.  \n\n"
            },
            "typeVersion": 1
        },
        {
            "id": "f71a7d10-847d-48e7-8820-ec0c1e7ea055",
            "name": "Sticky Note2",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                1200,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Set Useful Fields  \n\nThis node sets **7 key fields** from the Screaming Frog export:  \n\n- `url` \u2192 from the **\"Address\"** column  \n- `title` \u2192 from the **\"Title 1\"** column  \n- `description` \u2192 from the **\"Meta Description 1\"** column  \n- `status` \u2192 from the **\"Status Code\"** column  \n- `indexability` \u2192 from the **\"Indexability\"** column  \n- `content_type` \u2192 from the **\"Content Type\"** column  \n- `word_count` \u2192 from the **\"Word Count\"** column  \n\n\n**Multi-language compatibility**  \nIf you're using Screaming Frog in **French, Italian, German, or Spanish**, the column names will be different.  \nHowever, the workflow is designed to handle this, so it will **still work correctly**! \ud83e\udd73\n"
            },
            "typeVersion": 1
        },
        {
            "id": "6f6546b8-adeb-4998-ae19-d93525337eb7",
            "name": "Set useful fields",
            "type": "n8n-nodes-base.set",
            "position": [
                1340,
                60
            ],
            "parameters": {
                "options": [],
                "assignments": {
                    "assignments": [
                        {
                            "id": "0e7d4a06-83fc-4834-93fe-2e758cbe2307",
                            "name": "url",
                            "type": "string",
                            "value": "={{ $json.Address || $json.Adresse || $json.Direcci\u00f3n || $json.Indirizzo }}"
                        },
                        {
                            "id": "c82f4d4c-9d0b-4c7d-9647-5d0240b58643",
                            "name": "title",
                            "type": "string",
                            "value": "={{ $json['Title 1'] || $json['Titolo 1'] || $json['Titolo 1'] || $json['T\u00edtulo 1'] || $json['Titel 1'] }}"
                        },
                        {
                            "id": "abea81db-ce3b-4ac1-bd21-09ccfffb567a",
                            "name": "description",
                            "type": "string",
                            "value": "={{ $json['Meta Description 1'] || $json['Meta description 1'] }}"
                        },
                        {
                            "id": "2ca75d74-70f8-400b-b862-9da186135915",
                            "name": "statut",
                            "type": "string",
                            "value": "={{ $json['Status Code'] || $json['Code HTTP'] || $json['Status-Code'] || $json['C\u00f3digo de respuesta'] || $json['Codice di stato']}}"
                        },
                        {
                            "id": "754d3202-38b0-4d79-ba24-8078b3244307",
                            "name": "indexability",
                            "type": "string",
                            "value": "={{ $json.Indexability || $json.Indexabilit\u00e9 || $json.Indicizzabilit\u00e0 || $json.Indexabilidad || $json.Indexierbarkeit}}"
                        },
                        {
                            "id": "8bc6583d-bb34-4d22-b310-fe79bb8ac85d",
                            "name": "content_type",
                            "type": "string",
                            "value": "={{ $json['Content Type'] || $json['Type de contenu'] || $json['Tipo di contenuto'] || $json['Tipo de contenido'] || $json['Inhaltstyp']}}"
                        },
                        {
                            "id": "c874ba1a-769e-43d3-9555-8c9914ca9b76",
                            "name": "word_count",
                            "type": "string",
                            "value": "={{ $json['Word Count'] || $json['Nombre de mots'] || $json['Conteggio delle parole'] || $json['Conteggio delle parole'] || $json['Recuento de palabras'] || $json['Wortanzahl'] }}"
                        }
                    ]
                }
            },
            "typeVersion": 3.399999999999999911182158029987476766109466552734375
        },
        {
            "id": "1a9af7a0-d2d5-44cb-9770-2d5a1e5706f4",
            "name": "Text Classifier",
            "type": "@n8n\/n8n-nodes-langchain.textClassifier",
            "disabled": true,
            "position": [
                2260,
                60
            ],
            "parameters": {
                "options": [],
                "inputText": "=url : {{ $json.url }}\ntitle : {{ $json.title }}\ndescription : {{ $json.description }}\nwords count : {{ $json.word_count }}",
                "categories": {
                    "categories": [
                        {
                            "category": "useful_content",
                            "description": "Pages that are likely to contain high-quality content, making them suitable for inclusion in a file that aids content discovery for an LLM. "
                        },
                        {
                            "category": "other_content",
                            "description": "Pages that should not be included (e.g., pagination, or low-value content)."
                        }
                    ]
                }
            },
            "typeVersion": 1
        },
        {
            "id": "74a4e378-4228-4142-92ca-e541efde2b15",
            "name": "OpenAI Chat Model",
            "type": "@n8n\/n8n-nodes-langchain.lmChatOpenAi",
            "position": [
                2180,
                240
            ],
            "parameters": {
                "model": {
                    "__rl": true,
                    "mode": "list",
                    "value": "gpt-4o-mini"
                },
                "options": []
            },
            "credentials": {
                "openAiApi": {
                    "id": "",
                    "name": "OpenAi Connection"
                }
            },
            "typeVersion": 1.1999999999999999555910790149937383830547332763671875
        },
        {
            "id": "63dc6cfe-bc73-43b5-8c7d-4f5fd6501d3b",
            "name": "No Operation, do nothing",
            "type": "n8n-nodes-base.noOp",
            "position": [
                2580,
                200
            ],
            "parameters": [],
            "typeVersion": 1
        },
        {
            "id": "cb555b99-9e63-4b6b-a1fc-512b5467d666",
            "name": "Sticky Note3",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                1620,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Filter URLs \n\nThis **filter node** is used to keep only the URLs that meet the following conditions:  \n- `status` = **200**  \n- `indexability` = **indexable**  \n- `content_type` contains **text\/html**  \n\n\nThese filters are even **more useful** if the uploaded file is an **internal_all.csv** instead of an **internal_html.csv**.  \n\n### **Tips:**  \nYou can **add more filters** to refine the URLs included in your `llms.txt` file.  \n\n\ud83d\udca1 **Examples:**  \n- **Filter by word count** \u2192 Ensure pages contain **enough text content**.  \n- **Filter by URL path** \u2192 Keep only **specific folders or categories** in the `llms.txt` file.  \n- **Filter by meta description** \u2192 Exclude URLs **without a meta description**, as this field will be used in the `llms.txt` file to describe each piece of content.  \n"
            },
            "typeVersion": 1
        },
        {
            "id": "e34e56e2-5cc8-4e50-bfb0-3aa2e5e04abf",
            "name": "Filter URLs",
            "type": "n8n-nodes-base.filter",
            "position": [
                1740,
                60
            ],
            "parameters": {
                "options": [],
                "conditions": {
                    "options": {
                        "version": 2,
                        "leftValue": "",
                        "caseSensitive": true,
                        "typeValidation": "strict"
                    },
                    "combinator": "and",
                    "conditions": [
                        {
                            "id": "cef4feaa-1c46-45b1-92b7-f5c2051b1dc5",
                            "operator": {
                                "type": "number",
                                "operation": "equals"
                            },
                            "leftValue": "={{ Number($json.statut) }}",
                            "rightValue": 200
                        },
                        {
                            "id": "bb821656-9740-4da4-8aa9-f65ad098c470",
                            "operator": {
                                "type": "boolean",
                                "operation": "true",
                                "singleValue": true
                            },
                            "leftValue": "={{ [\"Indexable\", \"Indicizzabile\", \"Indexierbar\"].includes($json.indexability) }}",
                            "rightValue": "={{ \"Indexable\" || \"Indicizzabile\" }}"
                        },
                        {
                            "id": "5c93ddb8-8091-406a-bc04-fa14e8b73fb9",
                            "operator": {
                                "type": "string",
                                "operation": "contains"
                            },
                            "leftValue": "={{ $json.content_type }}",
                            "rightValue": "text\/html"
                        }
                    ]
                }
            },
            "typeVersion": 2.20000000000000017763568394002504646778106689453125
        },
        {
            "id": "b98f19a8-afd3-4d26-8063-dee3ee75055f",
            "name": "Sticky Note4",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                2040,
                -800
            ],
            "parameters": {
                "color": 2,
                "width": 740,
                "height": 1160,
                "content": "## Text Classifier\n\n\ud83d\udeab **This node is deactivated by default** in the template.  \n\nYou can **enable it** if you want to add a more **\"intelligent\" \ud83e\udd13 filter** to refine the URLs included in the `llms.txt` file, helping LLMs discover and prioritize valuable content.\n\n### How It Works:\nThis node has **two outputs**:  \n- **`useful_content`** \u2192 Pages that are **likely to contain high-quality content**, making them suitable for inclusion in a file that **aids content discovery for an LLM**.  \n- **`other_content`** \u2192 Pages that should **not** be included (e.g., pagination or low-value content).  \n\n\nYou can **modify the description** in the node to fine-tune the classification according to your needs.  \n\n### Input Fields:\n- **url** \u2192 `{{ $json.url }}`  \n- **title** \u2192 `{{ $json.title }}`  \n- **description** \u2192 `{{ $json.description }}`  \n- **word_count** \u2192 `{{ $json.word_count }}`  \n\n### Why use an LLM?  \nA **language model (LLM)** can **analyze** the **URL, title, and description** to identify pages that **most likely contain meaningful and relevant content**.  \nThis allows it to **prioritize valuable pages** and structure the data for **better content discovery and training purposes**. \n\n### **For large websites**  \nIf you have a **very large website**, consider using a **Loop Over Items** node to make the workflow **more robust** and ensure all pages are processed.  \nAlso, using a **Loop Over Items** node make it **easier** to handle:  \n- **Timeouts** \n- **API quotas** \n- **Other scalability issues**\n\n### Tokens usage\nFinally, keep in mind that **more pages mean more tokens and more billed LLM API calls**.\n\n\n\n\n\n\n\n"
            },
            "typeVersion": 1
        },
        {
            "id": "63e3ea7a-cec3-442c-9812-771def0a9949",
            "name": "Sticky Note5",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                2840,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Set Field - llms.txt Row\n\nThis node **sets** the row format for the `llms.txt` file.  \n\n### Row Structure:\nEach row follows this format:  \n\n- `- [title](link): description`  \n\nIf the URL **has no description** (from the **Meta Description** in the Screaming Frog export), the row will be:  \n\n- `- [title](link)`  \n"
            },
            "typeVersion": 1
        },
        {
            "id": "78f58220-feb5-4044-b994-39a0e4f1e9e4",
            "name": "Sticky Note6",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                3260,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Summarize - Concatenate\n\nThis node concatenates all the output from the previous node, ensuring each row is on a separate line."
            },
            "typeVersion": 1
        },
        {
            "id": "7a119633-7cd3-4de5-a1cd-7f708e1abf4a",
            "name": "Sticky Note7",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                3680,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Set Fields - llms.txt Content\n\nThis node sets the content of the `llms.txt` file using:\n\n- The **website title** provided in the form (first node).\n- The **website description** provided in the form (first node).\n- The output from the previous node, which includes all the URLs, their titles, and their descriptions that will appear in the `llms.txt` file.\n"
            },
            "typeVersion": 1
        },
        {
            "id": "554f6858-68e8-4b35-a6c4-21bed6832323",
            "name": "Sticky Note8",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                4100,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## Generate llms.txt file\n\nThis node **creates** the `llms.txt` file, which can be **downloaded directly** within n8n. \n"
            },
            "typeVersion": 1
        },
        {
            "id": "24bdefba-e2f2-41f0-93e7-9f8d2fc11f43",
            "name": "Sticky Note9",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                4520,
                -500
            ],
            "parameters": {
                "color": 7,
                "width": 360,
                "height": 860,
                "content": "## upload file anywhere\n\nInstead of downloading the file directly from the n8n workflow, you can **replace this node node** with a Drive node (e.g., **Google Drive** or **OneDrive**) to upload the `llms.txt` file to a folder of your choice.  \n  \n**Name the file properly** (e.g., include the website name) to make it easier to find and distinguish between files when working on multiple websites.  \n"
            },
            "typeVersion": 1
        },
        {
            "id": "a3be51e3-810c-40a7-a996-98a3d383c2b9",
            "name": "Summarize - Concatenate",
            "type": "n8n-nodes-base.summarize",
            "position": [
                3380,
                40
            ],
            "parameters": {
                "options": [],
                "fieldsToSummarize": {
                    "values": [
                        {
                            "field": "llmTxtRow",
                            "separateBy": "\n",
                            "aggregation": "concatenate"
                        }
                    ]
                }
            },
            "typeVersion": 1.100000000000000088817841970012523233890533447265625
        },
        {
            "id": "8d3a892a-3d11-4d8a-8ec6-84f8f3af1183",
            "name": "Set Fields - llms.txt Content",
            "type": "n8n-nodes-base.set",
            "position": [
                3820,
                40
            ],
            "parameters": {
                "options": [],
                "assignments": {
                    "assignments": [
                        {
                            "id": "97062a99-e944-4e1e-89b1-62cf9e3462dd",
                            "name": "llmsTxtFile",
                            "type": "string",
                            "value": "=# {{ $('Form - Screaming frog internal_html.csv upload').item.json['What is the name of your website?'] }}\n> {{ $('Form - Screaming frog internal_html.csv upload').item.json['Can you provide a short description of your website? (in the language of the website)'] }}\n\n{{ $json.concatenated_llmTxtRow }}"
                        }
                    ]
                }
            },
            "typeVersion": 3.399999999999999911182158029987476766109466552734375
        },
        {
            "id": "bc2a692a-47ea-4bf1-a102-e607fd544158",
            "name": "upload file anywhere",
            "type": "n8n-nodes-base.noOp",
            "position": [
                4640,
                40
            ],
            "parameters": [],
            "typeVersion": 1
        },
        {
            "id": "404510a2-35b2-44cf-9d02-eb0abcf4e9b3",
            "name": "Set Field - llms.txt Row",
            "type": "n8n-nodes-base.set",
            "position": [
                2960,
                40
            ],
            "parameters": {
                "options": [],
                "assignments": {
                    "assignments": [
                        {
                            "id": "95e75caa-8110-476b-9cb1-73c15361fa56",
                            "name": "llmTxtRow",
                            "type": "string",
                            "value": "=- [{{ $json.title }}]({{ $json.url }}){{ $json.description ? ': ' + $json.description : '' }}"
                        }
                    ]
                }
            },
            "typeVersion": 3.399999999999999911182158029987476766109466552734375
        },
        {
            "id": "f54d51f2-17bc-4c58-b177-0e77e16f7b72",
            "name": "Sticky Note10",
            "type": "n8n-nodes-base.stickyNote",
            "position": [
                -420,
                -1020
            ],
            "parameters": {
                "color": 5,
                "width": 700,
                "height": 1380,
                "content": "# Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls  \n\nThis workflow helps you generate an **llms.txt** file (if you're unfamiliar with it, check out [this article](https:\/\/towardsdatascience.com\/llms-txt-414d5121bcb3\/)) using a **Screaming Frog export**.  \n\n[Screaming Frog](https:\/\/www.screamingfrog.co.uk\/seo-spider\/) is a well-known website crawler.  \nYou can easily crawl a website. Then, export the **\"internal_html\"** section in CSV format.  \n\n## How It Works: \n\nA **form** allows you to enter:  \n- The **name of the website**  \n- A **short description**  \n- The **internal_html.csv** file from your Screaming Frog export  \n\n\nOnce the form is submitted, the **workflow is triggered automatically**, and you can **download the llms.txt file directly from n8n**. \n\n## Downloading the File\nSince the last node in this workflow is **\"Convert to File\"**, you will need to **download the file directly from the n8n UI**.  \nHowever, you can easily **add a node** (e.g., Google Drive, OneDrive) to automatically upload the file **wherever you want**.  \n\n## AI-Powered Filtering (Optional):  \nThis workflow includes a **text classifier node**, which is **deactivated by default**.  \n- You can **activate it** to apply a more **intelligent filter** to select URLs for the `llms.txt` file.  \n- Consider modifying the **description** in the classifier node to specify the type of URLs you want to include.  \n\n## How to Use This Workflow  \n\n1. **Crawl the website** you want to generate an `llms.txt` file for using **Screaming Frog**.  \n2. **Export the \"internal_html\"** section in CSV format.  \n   ![Screaming Frog internal html export](https:\/\/i.imgur.com\/M0nJQiV.png)  \n3. In **n8n**, click **\"Test Workflow\"**, fill in the form, and **upload** the `internal_html.csv` file.  \n4. Once the workflow is complete, go to the **\"Export to File\"** node and **download the output**.  \n\n**That's it! You now have your llms.txt file!**  \n\n\n\n**Recommended Usage:**  \nUse this workflow **directly in the n8n UI by clicking** 'Test Workflow' and uploading the file in the form."
            },
            "typeVersion": 1
        },
        {
            "id": "e33104af-802a-43f2-b26d-f368f7de2fd7",
            "name": "Form - Screaming frog internal_html.csv upload",
            "type": "n8n-nodes-base.formTrigger",
            "position": [
                460,
                60
            ],
            "webhookId": "8791f39a-3d81-405c-b177-0a733ebf74cb",
            "parameters": {
                "options": {
                    "buttonLabel": "Get the llms.txt file"
                },
                "formTitle": "llms.txt Generator - From Screaming Frog export",
                "formFields": {
                    "values": [
                        {
                            "fieldLabel": "What is the name of your website?",
                            "placeholder": "Example : The best website ever",
                            "requiredField": true
                        },
                        {
                            "fieldLabel": "Can you provide a short description of your website? (in the language of the website)",
                            "placeholder": "Example : This is the best website ever because all the content is engaging and valuable.",
                            "requiredField": true
                        },
                        {
                            "fieldType": "file",
                            "fieldLabel": "screaming_frog_export",
                            "multipleFiles": false,
                            "requiredField": true,
                            "acceptFileTypes": ".csv"
                        }
                    ]
                },
                "responseMode": "lastNode",
                "formDescription": "Generate a simple llms.txt file from a Screaming Frog Export\nIt is recommended to use the internal_html.csv export, although internal_all.csv will also work.\n\nFill in the fields in this form.Just fill in the fields in this form  \ud83d\ude04"
            },
            "typeVersion": 2.20000000000000017763568394002504646778106689453125
        },
        {
            "id": "f6b17fdd-a098-411e-8d53-3f6e638cc3ba",
            "name": "Extract data from Screaming Frog file",
            "type": "n8n-nodes-base.extractFromFile",
            "position": [
                900,
                60
            ],
            "parameters": {
                "options": [],
                "operation": "xls",
                "binaryPropertyName": "screaming_frog_export"
            },
            "typeVersion": 1
        },
        {
            "id": "6bbd8d1f-3322-4c6d-af08-c842386239ce",
            "name": "Generate llms.txt file",
            "type": "n8n-nodes-base.convertToFile",
            "position": [
                4220,
                40
            ],
            "parameters": {
                "options": {
                    "encoding": "utf8",
                    "fileName": "llms.txt"
                },
                "operation": "toText",
                "sourceProperty": "llmsTxtFile"
            },
            "typeVersion": 1.100000000000000088817841970012523233890533447265625
        }
    ],
    "active": false,
    "pinData": [],
    "settings": {
        "executionOrder": "v1"
    },
    "versionId": "",
    "connections": {
        "Filter URLs": {
            "main": [
                [
                    {
                        "node": "Text Classifier",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Text Classifier": {
            "main": [
                [
                    {
                        "node": "Set Field - llms.txt Row",
                        "type": "main",
                        "index": 0
                    }
                ],
                [
                    {
                        "node": "No Operation, do nothing",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "OpenAI Chat Model": {
            "ai_languageModel": [
                [
                    {
                        "node": "Text Classifier",
                        "type": "ai_languageModel",
                        "index": 0
                    }
                ]
            ]
        },
        "Set useful fields": {
            "main": [
                [
                    {
                        "node": "Filter URLs",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Generate llms.txt file": {
            "main": [
                []
            ]
        },
        "Summarize - Concatenate": {
            "main": [
                [
                    {
                        "node": "Set Fields - llms.txt Content",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Set Field - llms.txt Row": {
            "main": [
                [
                    {
                        "node": "Summarize - Concatenate",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Set Fields - llms.txt Content": {
            "main": [
                [
                    {
                        "node": "Generate llms.txt file",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Extract data from Screaming Frog file": {
            "main": [
                [
                    {
                        "node": "Set useful fields",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        },
        "Form - Screaming frog internal_html.csv upload": {
            "main": [
                [
                    {
                        "node": "Extract data from Screaming Frog file",
                        "type": "main",
                        "index": 0
                    }
                ]
            ]
        }
    }
}
Back to Workflows

Related Workflows

AI-Driven WooCommerce Product Importer with SEO
View
Compare 2 SQL datasets
View
(G) - Email Classification
View
RAG AI Agent with Milvus and Cohere
View
Code Schedule Create Scheduled
View
Manual Copper Automate Triggered
View
Pipedrive Spreadsheetfile Create Triggered
View
Business WhatsApp AI RAG Chatbot
View
GoogleSheets Webhook Automate Webhook
View
HTTP Schedule Send Scheduled
View