{"id":81,"date":"2023-11-06T17:44:12","date_gmt":"2023-11-06T17:44:12","guid":{"rendered":"https:\/\/blog.dataplatform.lt\/?p=81"},"modified":"2023-11-06T17:44:12","modified_gmt":"2023-11-06T17:44:12","slug":"index-and-crawling-pipelines","status":"publish","type":"post","link":"https:\/\/blog.dataplatform.lt\/?p=81","title":{"rendered":"Index and crawling pipelines"},"content":{"rendered":"\n<p>For use cases like listing portals, e-commerce (eshops), typical pattern is having very many pages in the system. Write one pipeline which crawls entire website and produces urls, while other populates data using those urls.<\/p>\n\n\n\n<p>In a lot of cases even in indexing stage, you can find most of required information, so it would be faster.\u00a0 Furthermore, this will require less load on target website and decrease chance of being blocked.<\/p>\n\n\n\n<p class=\"has-light-blue-background-color has-background\">Tip: please refer to section how to reuse data from another pipeline in our plaform.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For use cases like listing portals, e-commerce (eshops), typical pattern is having very many pages in the system. Write one pipeline which crawls entire website and produces urls, while other populates data using those urls. In a lot of cases even in indexing stage, you can find most of required information, so it would be &hellip; <a href=\"https:\/\/blog.dataplatform.lt\/?p=81\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Index and crawling pipelines&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[],"class_list":["post-81","post","type-post","status-publish","format-standard","hentry","category-patterns-and-best-practises"],"_links":{"self":[{"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/posts\/81","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=81"}],"version-history":[{"count":2,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/posts\/81\/revisions"}],"predecessor-version":[{"id":84,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/posts\/81\/revisions\/84"}],"wp:attachment":[{"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=81"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=81"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=81"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}