{"id":640,"date":"2020-04-13T19:33:28","date_gmt":"2020-04-13T19:33:28","guid":{"rendered":"http:\/\/jackofalltradesmasterofsome.com\/blog\/?p=640"},"modified":"2020-04-14T17:27:05","modified_gmt":"2020-04-14T17:27:05","slug":"modern-data-architecture-part-7-staging-data-into-data-lake","status":"publish","type":"post","link":"https:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/13\/modern-data-architecture-part-7-staging-data-into-data-lake\/","title":{"rendered":"Modern Data Architecture \u2013 Part 7 \u2013 Staging Data into Data Lake"},"content":{"rendered":"\n<p>Modern Data Architecture \u2013 Part 7 \u2013 Staging Data into Data Lake<\/p>\n\n\n\n<p>In Lab 5, we demonstrated loaded data from our sample SQL server into our data lake. Although this tool is very handy, it creates a new service and set of tools that need to be provisioned and monitored. It is best to use ADF more as an workflow orchestration tool and use a single tool to handle all of your ETL. Data Brinks provides a robust tool to handle this task. We will investigate how to query data from your SQL Server and load your data lake. <\/p>\n\n\n\n<p>For this lab we\nwill be hard coding the access keys and credentials directly into the code. The\noptional section at the bottom will demonstrate how to configure Azure\ncredentials secret keys to configure this the correct way as you would in\nproduction. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Connect and Query SQL Server<\/h3>\n\n\n\n<ol class=\"wp-block-list\"><li>Create a new workbook in your Data Bricks and name it \u201cstaging_customers\u201d<\/li><\/ol>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"408\" height=\"226\" src=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-64.png\" alt=\"\" class=\"wp-image-714\" srcset=\"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-64.png 408w, https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-64-300x166.png 300w\" sizes=\"auto, (max-width: 408px) 100vw, 408px\" \/><\/figure>\n\n\n\n<p>Copy the Query below and paste it\ninto your Data Bricks Window. Replace the highlighted sections with your\ndatabase information. Most of our code will be writing using Scala to start. We\nwill place each piece of code into its own cells to make the job a bit easer to\nread and test. Copy and paste the following code into your notebook. We will be\nusing hard coded credentials and keys for the lab but in real world, be sure to\nconfigure secret key access only.<\/p>\n\n\n\n<p><strong>Example<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"506\" height=\"380\" src=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-65.png\" alt=\"\" class=\"wp-image-715\" srcset=\"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-65.png 506w, https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-65-300x225.png 300w\" sizes=\"auto, (max-width: 506px) 100vw, 506px\" \/><\/figure>\n\n\n\n<p><strong>Block 1<\/strong> <\/p>\n\n\n\n<p><strong>Description &#8211;<\/strong> This sets the\ninitial driver for our database connection<\/p>\n\n\n\n<p><strong>Code<\/strong><\/p>\n\n\n\n<p>%scala <\/p>\n\n\n\n<p>Class.forName(&#8220;com.microsoft.sqlserver.jdbc.SQLServerDriver&#8221;)<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"538\" height=\"133\" src=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-66.png\" alt=\"\" class=\"wp-image-716\" srcset=\"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-66.png 538w, https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-66-300x74.png 300w\" sizes=\"auto, (max-width: 538px) 100vw, 538px\" \/><\/figure>\n\n\n\n<p><strong>Block 2<\/strong><\/p>\n\n\n\n<p><strong>Description \u2013<\/strong> We configure\nand test the connection. Replace the items highlighted below with you SQL\nServer name, database name, username and password respectively. <\/p>\n\n\n\n<p><strong>Code<\/strong><\/p>\n\n\n\n<p>%scala <\/p>\n\n\n\n<p>val jdbcHostname = &#8220;training-sqlserver-vimal.database.windows.net&#8221;&nbsp; <\/p>\n\n\n\n<p>val jdbcPort = 1433&nbsp; <\/p>\n\n\n\n<p>val jdbcDatabase = &#8220;training_sqldatabase_vimal&#8221;&nbsp; <\/p>\n\n\n\n<p>\/\/ Create the JDBC URL without\npassing in the user and password parameters.&nbsp;\n<\/p>\n\n\n\n<p>val jdbcUrl =\ns&#8221;jdbc:sqlserver:\/\/${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}&#8221;&nbsp; <\/p>\n\n\n\n<p>import java.util.Properties&nbsp; <\/p>\n\n\n\n<p>val connectionProperties = new\nProperties()&nbsp; <\/p>\n\n\n\n<p>connectionProperties.put(&#8220;user&#8221;,\n&#8220;sqlserveradmin&#8221;)&nbsp; <\/p>\n\n\n\n<p>connectionProperties.put(&#8220;password&#8221;,\n&#8220;Password123&#8221;)<\/p>\n\n\n\n<p>val driverClass =\n&#8220;com.microsoft.sqlserver.jdbc.SQLServerDriver&#8221;&nbsp; <\/p>\n\n\n\n<p>connectionProperties.setProperty(&#8220;Driver&#8221;, driverClass)<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"291\" src=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-67.png\" alt=\"\" class=\"wp-image-717\" srcset=\"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-67.png 624w, https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-67-300x140.png 300w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><\/figure>\n\n\n\n<p><strong>Block 4<\/strong><\/p>\n\n\n\n<p><strong>Description \u2013<\/strong> This allows\nus to review and test the connection to our sales.customers table in the\ndatabase<\/p>\n\n\n\n<p><strong>Code<\/strong><\/p>\n\n\n\n<p>%scala <\/p>\n\n\n\n<p>val customer = spark.read.jdbc(jdbcUrl, &#8220;sales.customers&#8221;, connectionProperties)<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"704\" height=\"260\" src=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-68.png\" alt=\"\" class=\"wp-image-718\" srcset=\"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-68.png 704w, https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-68-300x111.png 300w\" sizes=\"auto, (max-width: 704px) 100vw, 704px\" \/><\/figure>\n\n\n\n<p><strong>Block 5<\/strong><\/p>\n\n\n\n<p><strong>Description \u2013<\/strong> Select the\ndata from the table for display<\/p>\n\n\n\n<p><strong>Code<\/strong><\/p>\n\n\n\n<p>%scala <\/p>\n\n\n\n<p>customer.select(&#8220;*&#8221;).show()<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"204\" src=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-69.png\" alt=\"\" class=\"wp-image-719\" srcset=\"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-69.png 624w, https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-69-300x98.png 300w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><\/figure>\n\n\n\n<p><strong>Block 6<\/strong><\/p>\n\n\n\n<p><strong>Description \u2013<\/strong> This section\nconnects your data bricks storage to your data lake so that we can push data to\nthe data lake in the next step. &nbsp;Insert\nyour blob name, your storage account name, the destionation folder, the\ndestination folder, the storage account name and your access key respectively. <\/p>\n\n\n\n<p><strong>Code<\/strong><\/p>\n\n\n\n<p>%scala <\/p>\n\n\n\n<p>dbutils.fs.mount(&nbsp; <\/p>\n\n\n\n<p>source = &#8220;wasbs:\/\/root@trainingsavimal.blob.core.windows.net\/raw&#8221;,&nbsp; <\/p>\n\n\n\n<p>mountPoint = &#8220;\/mnt\/raw&#8221;,&nbsp; <\/p>\n\n\n\n<p>extraConfigs =\nMap(&#8220;fs.azure.account.key.trainingsavimal.blob.core.windows.net&#8221; -&gt; &#8220;<em> youraccesskey<\/em>\n&#8220;))<\/p>\n\n\n\n<p>To access \u201c<em>youraccesskey<\/em>\u201d, we are going to need some additional information. From the azure portal, navigate back to your Storage Account \u201ctrainingsa<em>yourname<\/em>\u201d and select \u201cAccess Keys\u201d<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"360\" height=\"295\" src=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-70.png\" alt=\"\" class=\"wp-image-720\" srcset=\"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-70.png 360w, https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-70-300x246.png 300w\" sizes=\"auto, (max-width: 360px) 100vw, 360px\" \/><figcaption><br><\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"623\" height=\"127\" src=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-71.png\" alt=\"\" class=\"wp-image-721\" srcset=\"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-71.png 623w, https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-71-300x61.png 300w\" sizes=\"auto, (max-width: 623px) 100vw, 623px\" \/><\/figure>\n\n\n\n<p><strong>Block 7<\/strong><\/p>\n\n\n\n<p><strong>Description \u2013<\/strong> This last\nsection pushed your data to a csv on to the data lake. Upon completion, you can\nnow see you new data file on your storage via the storage explorer. There will\nalso be a folder containing some of the data tranfer metadata.<\/p>\n\n\n\n<p><strong>Code<\/strong><\/p>\n\n\n\n<p>%scala <\/p>\n\n\n\n<p>customer.coalesce(1)&nbsp; <\/p>\n\n\n\n<p>.write.format(&#8220;com.databricks.spark.csv&#8221;)&nbsp; <\/p>\n\n\n\n<p>.option(&#8220;header&#8221;,\n&#8220;true&#8221;)&nbsp; <\/p>\n\n\n\n<p>.save(&#8220;\/mnt\/raw\/customers.csv&#8221;)<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"365\" height=\"168\" src=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-72.png\" alt=\"\" class=\"wp-image-722\" srcset=\"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-72.png 365w, https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-content\/uploads\/2020\/04\/image-72-300x138.png 300w\" sizes=\"auto, (max-width: 365px) 100vw, 365px\" \/><\/figure>\n\n\n\n<p> Be sure to check out my full online class on the topic. A hands on walk through of a Modern Data Architecture using Microsoft Azure. For beginners and experienced business intelligence experts alike, learn the basic of navigating the Azure Portal to building an end to end solution of a modern data warehouse using popular technologies such as SQL\u00a0Database, Data Lake, Data Factory, Data Bricks, Azure Synapse Data Warehouse and Power BI. Link to the class can be <a href=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/08\/modern-data-architecture-using-microsoft-azure-online-class-and-free-ebook\/\">found here<\/a> or directly <a href=\"https:\/\/www.udemy.com\/course\/modern-data-architecture-using-microsoft-azure\/learn\/lecture\/18527998#overview\">here<\/a>.   <\/p>\n\n\n<p><br>\n<br>\n<!--StartFragment--><\/p>\n\n\n<p><a href=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/08\/modern-data-architecture-part-1-navigating-the-azure-portal\/\">Part 1 &#8211; Navigating the Azure Portal<\/a><\/p>\n\n\n\n<p><a href=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/13\/modern-data-architecture-part-2-resource-groups-and-subscriptions\/\">Part 2 &#8211; Resource Groups and Subscriptions<\/a><\/p>\n\n\n\n<p><a href=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/13\/modern-data-architecture-part-3-creating-data-lake-storage\/\">Part 3 &#8211; Creating Data Lake Storage<\/a><\/p>\n\n\n\n<p><a href=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/13\/modern-data-architecture-part-4-setting-up-a-sql-server\/\">Part 4 &#8211; Setting up an Azure SQL Server<\/a><\/p>\n\n\n\n<p><a href=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/13\/modern-data-architecture-part-5-loading-data-lake-with-data-factory\/\">Part 5 &#8211; Loading Data Lake with Azure Data Factory<\/a><\/p>\n\n\n\n<p><a href=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/13\/modern-data-architecture-part-6-configuring-and-setting-up-data-bricks\/\">Part 6 &#8211; Configuring and Setting up Data Bricks<\/a><\/p>\n\n\n\n<p><a href=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/13\/modern-data-architecture-part-7-staging-data-into-data-lake\/\">Part 7 &#8211; Staging data into Data Lake<\/a><\/p>\n\n\n\n<p><a href=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/14\/modern-data-architecture-part-8-provisioning-a-synapsis-sql-data-warehouse\/\">Part 8 = Provisioning a Synapse SQL Data Warehouse<\/a><\/p>\n\n\n\n<p><a href=\"http:\/\/jackofalltradesmasterofsome.com\/blog\/2020\/04\/14\/modern-data-architecture-part-9-loading-data-synapse-data-warehouse\/\">Part 9 &#8211; Loading Data into Azure Data Synapse Data Warehouse<\/a><\/p>\n\n\n<p><!--EndFragment--><br>\n<br>\n<\/p>\n\n\n<p>Modern Data Architecture \u2013 Part 7 \u2013 Staging Data into Data Lake<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Modern Data Architecture \u2013 Part 7 \u2013 Staging Data into Data Lake In Lab 5, we demonstrated loaded data from our sample SQL server into our data lake. Although this tool is very handy, it creates a new service and set of tools that need to be provisioned and monitored. It is best to use [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,28,27,11,85],"tags":[],"class_list":["post-640","post","type-post","status-publish","format-standard","hentry","category-analytics","category-azure","category-big-data","category-data-warehouse","category-sql-server"],"_links":{"self":[{"href":"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-json\/wp\/v2\/posts\/640","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-json\/wp\/v2\/comments?post=640"}],"version-history":[{"count":6,"href":"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-json\/wp\/v2\/posts\/640\/revisions"}],"predecessor-version":[{"id":770,"href":"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-json\/wp\/v2\/posts\/640\/revisions\/770"}],"wp:attachment":[{"href":"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-json\/wp\/v2\/media?parent=640"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-json\/wp\/v2\/categories?post=640"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jackofalltradesmasterofsome.com\/blog\/wp-json\/wp\/v2\/tags?post=640"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}