{"id":109,"date":"2023-11-27T09:48:25","date_gmt":"2023-11-27T09:48:25","guid":{"rendered":"https:\/\/blog.dataplatform.lt\/?p=109"},"modified":"2023-11-27T09:48:25","modified_gmt":"2023-11-27T09:48:25","slug":"reading-xls-files","status":"publish","type":"post","link":"https:\/\/blog.dataplatform.lt\/?p=109","title":{"rendered":"Reading XLS files"},"content":{"rendered":"\n<p>Sometimes data is provided as the XLS file.<\/p>\n\n\n\n<p>Example: Borrowing rates at Lithuanian Bank is provided as <a href=\"https:\/\/www.lb.lt\/lt\/paskolu-palukanu-normos\">statistics page.<\/a> It<br>has data structured as table, but it is always recommended to use structured data, in this case download as XLS.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"626\" src=\"https:\/\/blog.dataplatform.lt\/wp-content\/uploads\/2023\/11\/download-xls-lb-1024x626.png\" alt=\"\" class=\"wp-image-110\" srcset=\"https:\/\/blog.dataplatform.lt\/wp-content\/uploads\/2023\/11\/download-xls-lb-1024x626.png 1024w, https:\/\/blog.dataplatform.lt\/wp-content\/uploads\/2023\/11\/download-xls-lb-300x184.png 300w, https:\/\/blog.dataplatform.lt\/wp-content\/uploads\/2023\/11\/download-xls-lb-768x470.png 768w, https:\/\/blog.dataplatform.lt\/wp-content\/uploads\/2023\/11\/download-xls-lb-1200x734.png 1200w, https:\/\/blog.dataplatform.lt\/wp-content\/uploads\/2023\/11\/download-xls-lb.png 1347w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/figure>\n\n\n\n<p>The best strategy to fetch data is to use Python pandas library, as it has quite advanced XLS reading capabilities. Here is example how to read structured data for this example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import requests\nimport json\nimport pandas as pd\nfrom pprint import pprint\n\nexcel_url = \"https:\/\/www.lb.lt\/lt\/m_statistika\/t-paskolu-palukanu-normos\/?export=xlsx\"\ndf = pd.read_excel(excel_url)\n\nselected_data = df.iloc&#91;12:, &#91;0, 6]]\n\nselected_data.columns = &#91;'date', 'value']\njson_data = selected_data.to_dict(orient='records')\n\nresults = &#91;{\n  'date': row.get('date').replace(' ', '') + '-01',\n  'value': row.get('value'),\n} for row in json_data]\n\nprint(json.dumps(results, indent=2))                                        <\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Reading structured data from XLS format<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[13],"class_list":["post-109","post","type-post","status-publish","format-standard","hentry","category-code-examples","tag-xls"],"_links":{"self":[{"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/posts\/109","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=109"}],"version-history":[{"count":1,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/posts\/109\/revisions"}],"predecessor-version":[{"id":111,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=\/wp\/v2\/posts\/109\/revisions\/111"}],"wp:attachment":[{"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=109"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=109"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.dataplatform.lt\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=109"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}