Big Data – jack of all trades master of some http://jackofalltradesmasterofsome.com/blog Consultant - Real Estate - Author - Business Intelligence Mon, 04 Oct 2021 02:42:09 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.5 Create a NodeJS Client Application to Submit Data to Event Hubs http://jackofalltradesmasterofsome.com/blog/2019/04/09/create-a-nodejs-client-application-to-submit-data-to-event-hubs/ Tue, 09 Apr 2019 03:19:15 +0000 http://jackofalltradesmasterofsome.com/blog/?p=385 Get Tools and Features” and from the window, select “node.js” development to add the appropriate libraries to your Visual Studio Instance. Start […]]]> Now that we have provisioned a Event Hub in Azure, let’s Create a NodeJS Client Application to Submit Data to Event Hubs.

Prerequisites

  • Visual Studio 2017
  • Install NodeJS SDK
  • From Visual Studio, select “tool->Get Tools and Features” and from the window, select “node.js” development to add the appropriate libraries to your Visual Studio Instance.
  1. Start a command prompt window in the location of where your script will reside. For this example we will use C:\EventHub
  2. Create a package for the application using “npm init”
    1. Accept all defaults in the set up and select “yes”
    2. You will be prompted with the package.json that will be created for you
  • You must install the Azure SDK Package by running the command “npm install azure-event-hubs”
  • Navigate back to Azure to the security profile you created and copy the connection key. Place this in the js script file for the connection. This script just intermittently sends data to event hubs using a json string.
  • Run the following application in command “node eventclient.js” to begin sending messages to the Event Hub.
  • If you navigate back to Azure, you will see the events being recorded in the Event Hub.

Reading Data in Event Hubs

  1. Follow the same commands from the previous section to set up node and the json file via command prompt.
    1. npm init
    2. npm install azure-event-hubs
  2. Update the script with the connection string that was set up as send and listen shared access policy.
  • Run the command “node eventreader.js” to being to read the messages going into Event Hub.

Part 1 – Provisioning Event Hubs

]]>
Provisioning an Azure Event Hub to capture real time streaming data http://jackofalltradesmasterofsome.com/blog/2019/04/09/provisioning-an-azure-event-hub-to-capture-real-time-streaming-data/ Tue, 09 Apr 2019 03:18:03 +0000 http://jackofalltradesmasterofsome.com/blog/?p=372 Provisioning an Azure Event Hub to capture real time streaming data is fairy easy once you have an Azure account. Event Hubs can be used to capture data from many different sources including databases or IoT devices. As we look at building a CDC streaming ETL, let’s take a look at the basics of Event Hubs

  1. Create a new Event Hubs in Azure, by finding it in the search bar.
  • Create a new namespace.
    • Name it something unique and use the basic pricing tier to limit cost since our needs are fairly limited and do not need the full horse power of Azure Event Hubs.
    • Select your basic subscription and create a resource group if you do not already have one.
    • Select “Create” to begin deployment process to create.
  • Once the deployment process is created, navigate to your new Namespace and select “Add Event Hub”
    • Give it a name and leave all settings as it to keep it small.
    • Once you hit create, the deployment process will start.


  • After completion you should now have an active event hub in your namespace.

Granting Access to the Event Hub

  1. On the right of the name space window, select “Shared Access Policies”
  • Add a new policy and give it “manage” access rights to it may send and listen to messages coming from and leaving the event hub. Different policies can be used to different applications as a best practice. Once created, this will create a key for access to be used to send and receive messages.

Part 2 – Build a client app in NodeJS to send data to Event Hubs

]]>
The Modern Data Warehouse; Azure Data Lake and U-SQL to combine data http://jackofalltradesmasterofsome.com/blog/2018/12/17/the-modern-data-warehouse-azure-data-lake-and-u-sql-to-combine-data/ Mon, 17 Dec 2018 15:10:04 +0000 http://jackofalltradesmasterofsome.com/blog/?p=210 The modern data warehouse will need to use Azure Data Lake and U-SQL to combine data. Begin by navigating to your Azure Portal and searching for the Data Lake Analytics Resource. Let’s start by creating a new Data Lake. Don’t worry, this service only charges on data in and out, not just remaining on like an HDInsights cluster so you should not be charged anything and we will not need to spin up and spin down services like we did earlier.

You will need to give it a unique name as well as tie it to a pay as you go subscription. A data lake will also need a data lake storage layer as well. You can keep the naming as is or rename it if you wish. It is recommended to keep the storage encrypted. Deployment may take a few minutes.

Let’s navigate to the data explore and create a new folder to put our sample data. Upload the same 3 files from the happiness index to the new folder.

Take a look through your data explorer at the tables. It should contain a master database similar to as if you were running a traditional SQL Server.

Create a new U-SQL Job that creates the new database in the catalog as well as a schema and a table.

CREATE DATABASE IF NOT EXISTS testdata;

USE DATABASE testdata;

CREATE SCHEMA IF NOT EXISTS happiness;

CREATE TABLE happiness.placeholderdata

(

    Region string,

       HappinessRank float,

       HappinessScore float,

       LowerConfidenceInterval float,

       UpperConfidenceInterval float,

       Economy_GDPperCapita float,

       Family float,

       Health_LifeExpectancy float,

       FreedomTrust_GovernmentCorruption  float,

       GenerosityDystopiaResidual float,

     INDEX clx_Region CLUSTERED(Region ASC) DISTRIBUTED BY HASH(Region)

);

While running and once completed, Azure will present the execution tree. With the new table created, we can now load data into that table.

USE DATABASE testdata;

@log =

EXTRACT Region string,

       HappinessRank float,

       HappinessScore float,

       LowerConfidenceInterval float,

       UpperConfidenceInterval float,

       Economy_GDPperCapita float,

       Family float,

       Health_LifeExpectancy float,

       FreedomTrust_GovernmentCorruption  float,

       GenerosityDystopiaResidual float

FROM “/sampledata/{*}.csv”

USING Extractors.Text(‘,’,silent:true);

INSERT INTO happiness.placeholderdata

SELECT * FROM @log;

Once the data is loaded, we can go query the data in the data explorer to see what it looks like. The script will run the select and output the data to a file to be browsed.

@table = SELECT * FROM [testdata].[happiness].[placeholderdata];

OUTPUT @table

    TO “/OUTPUTS/Sampledataquery.csv”

    USING Outputters.Csv();



Be sure to check out my full online class on the topic. A hands on walk through of a Modern Data Architecture using Microsoft Azure. For beginners and experienced business intelligence experts alike, learn the basic of navigating the Azure Portal to building an end to end solution of a modern data warehouse using popular technologies such as SQL Database, Data Lake, Data Factory, Data Bricks, Azure Synapse Data Warehouse and Power BI. Link to the class can be found here or directly here.

Part 1 – Navigating the Azure Portal

Part 2 – Resource Groups and Subscriptions

Part 3 – Creating Data Lake Storage

Part 4 – Setting up an Azure SQL Server

Part 5 – Loading Data Lake with Azure Data Factory

Part 6 – Configuring and Setting up Data Bricks

Part 7 – Staging data into Data Lake

Part 8 = Provisioning a Synapse SQL Data Warehouse

Part 9 – Loading Data into Azure Data Synapse Data Warehouse



]]>
The Modern Data Warehouse; Running Hive Queries in Visual Studio to combine data http://jackofalltradesmasterofsome.com/blog/2018/12/04/the-modern-data-warehouse-running-hive-queries-in-visual-studio-to-combine-data/ Tue, 04 Dec 2018 21:21:24 +0000 http://jackofalltradesmasterofsome.com/blog/?p=186 In previous posts we have looked at storing data files to blob storage and using PowerShell to spin up an HDInsight Hadoop cluster. We have also installed some basic software that will help us get going once the services are provisioned. Now that the basics are ready, it is time to process some of that data using Hive and Visual Studio. In this scenario, we will be loading our Happiness Index data files into Hive tables and then consolidating that data into a single file.

For this tutorial we are going to leverage Azure Storage Browser to view our storage files as well as create folders. You can use this tool as well as the shell to create new folders and upload your data files. The actual folder will not be created until you upload files using this tool. In a real-world scenario, you would use a file transfer task or data factory to stage your files.

Open the tool and sign into your Azure account. Once created, navigate to your HDInsights Cluster and in your storage and create a new folder in your Hive Storage called “data” and uploaded all 3 of our files to this location.

Open Visual Studio with your newly installed “Azure Data Lake Tools” installed. If you look at your server explorer and ensure you are signed in with the same Azure account as where your cluster is located, you will see the cluster listed in the menu.

You can create a new table directly from the server explorer. In the tool you can either script or use the wizard to create the new table. Create it so that it has the same column names as the file. For the example below we will just run the create table and the load table step as one Hive script. Update the script to load the 2016 and the 2017 files as well.

CREATE TABLE IF NOT EXISTS default.sourcedata2015(Country string,

       Region string,

       HappinessRank float,

       HappinessScore float,

       LowerConfidenceInterval float,

       UpperConfidenceInterval float,

       Economy_GDPperCapita float,

       Family float,

       Health_LifeExpectancy float,

       FreedomTrust_GovernmentCorruption  float,

       GenerosityDystopiaResidual float)

ROW FORMAT DELIMITED

        FIELDS TERMINATED BY ‘,’

        COLLECTION ITEMS TERMINATED BY ‘\002’

        MAP KEYS TERMINATED BY ‘\003’

STORED AS TEXTFILE;

LOAD DATA INPATH ‘/data/2015.csv’ INTO table sourcedata2015;

Once you have created your 3 new Hive tables, it is time to consolidate them into one table using a simple SQL like union statement and adding the year column to the end. 

Select * from

(

    SELECT *, ‘2015’ as Year FROM sourcedata2017 b

    UNION ALL

    SELECT *, ‘2016’ as Year FROM sourcedata2017 c

    UNION ALL

    SELECT *, ‘2017’ as Year FROM sourcedata2017 d

) CombinedTable

Once the job is complete, you should be able to view the results in see by clicking “Job Output”. All we have done here is created a select statement, but this data could have also been inserted into a new Hive table for either more processing or a push to a targeted data warehouse.

I hope this introduction helps you get off the ground in the basics of running Hive queries. In later posts we will look at how HDinsights intersects with Azure Data Lake and Data Factory. Before you leave, be sure to delete your HDCluster. Since this services runs as an hourly service and is not billed on a job basis, so it will continue to charge your account. You can either delete via the Azure dashboard or using the PowerShell script from our previous posts.

]]>
Setting up tools to work with HDInsights and run Hive Queries – Azure Data Lake Tools and Azure Storage Browser http://jackofalltradesmasterofsome.com/blog/2018/11/27/setting-up-visual-studio-to-work-with-hdinsights-and-run-hive-queries/ Tue, 27 Nov 2018 22:00:30 +0000 http://jackofalltradesmasterofsome.com/blog/?p=169 Two tools that are going to make life a bit simpler if you are going to be working with HDInsights and Azure blog storage are “Azure Data Lake and Stream Analytic Tools for Visual Studio” and Azure Storage Browser.

Azure Data Lake and Stream Analytic Tools for Visual Studio

  •  To run Hive Queries, you’re going to need to install Azure Data Lake and Stream Analytic Tools to your version of Visual Studio, sometimes referred to as HDInsight tools for Visual Studio or Azure Data Lake tools for Visual Studio. You can install directly from Visual studio by selection Tools -> Get Tools and Features.

  • Once you have Visual Studio Open, navigate to your server explorer and verify you are connected to the right Azure Subscription. You should now see your cluster information as created in the previous blog post on “Provisioning HDInsight Clusters using PowerShell

You should be able to browse and interact with your storage from Visual Studio as well as the Azure dashboard at this point but another helpful tool to install will be “Azure Storage Explorer”. It can be found and installed from the following location

Data explorer can be used to create new folders and upload your data files and this does make that job a little easier. Please note, when creating new folders, the actual folder will not be created until you upload files using this tool.

That should be it! From the Server explorer, you can now see the HDInsight Cluster that was spun up as well as the storage accounts where files will be stored. In the next post we will look at building and running Hive queries against your data files to get them ready for reporting.

]]>
Big Data for The Rest of Us. Affordable and Modern Business Intelligence Architecture – Adding Lifecycles to your S3 buckets to save cost and retain data forever! http://jackofalltradesmasterofsome.com/blog/2018/09/25/big-data-for-the-rest-of-us-affordable-and-modern-business-intelligence-architecture-adding-lifecycles-to-your-s3-buckets-to-save-cost-and-retain-data-forever/ Tue, 25 Sep 2018 01:14:50 +0000 http://jackofalltradesmasterofsome.com/blog/?p=156 I wanted to keep this post short since as I mentioned in the previous post about cloud storage, our use case is already an affordable one, but it still makes sense to touch on some of the file movement strategy to other tiers of storage to make sure we are maximizing our cost saving vs. our base level requirements. In S3, you can easily define life cycles rules that allow you to move your files from standard storage, to infrequent and eventually cold storage. The different pricing structures can be found on AWS’s documentation located here.

Let’s quickly discuss the different types of classes offered by AWS. For more details, please see here.

Standard

Amazon S3 Standard offers high durability, availability, and performance object storage for frequently accessed data. Because it delivers low latency and high throughput, S3 Standard is perfect for a wide variety of use cases including cloud applications, dynamic websites, content distribution, mobile and gaming applications, and Big Data analytics.

Infrequent Access

Amazon S3 Standard-Infrequent Access (S3 Standard-IA) is an Amazon S3 storage class for data that is accessed less frequently but requires rapid access when needed. S3 Standard-IA offers the high durability, high throughput, and low latency of S3 Standard, with a low per GB storage price and per GB retrieval fee. This combination of low cost and high performance make S3 Standard-IA ideal for long-term storage, backups, and as a data store for disaster recovery.

Glacier (Cold Storage)

Amazon Glacier is a secure, durable, and extremely low-cost storage service for data archiving. You can reliably store any amount of data at costs that are competitive with or cheaper than on-premises solutions. To keep costs low yet suitable for varying retrieval needs, Amazon Glacier provides three options for access to archives, from a few minutes to several hours.

In our basic scenario where we are getting files representing a whole years’ worth of data, we can consider a scenario where data older then 3 years is moved to infrequent storage while data older then 7 years is moved to cold storage. This way when basic day to day analytics and analysis is occurring, it is against S3 standard, whereas analytics for more directional items such as company direction which occurs less frequently (maybe once a quarter) can occur from the infrequent buckets. If in the rare case an audit is required during an event such as a merger and acquisition, data can then be pulled and provided from cold storage to meet those needs as well. This allows you to maximized cost savings while having an infinite data retention policy.

To set up a lifecycle policy on your data, just follow the few simple steps.

  • From your bucket, navigate to Management -> Add Lifecycle Rule
  • Give your Rue a name and a scope. You can use tags if needed if you need the policy to apply t a certain set of files.
  • In the transition and Expiration sections, you can define the time period for when data is moved to another tier and when to expire the file if that is required as well. It is a good idea to tag a file on the upload so that similar tags can be used for the lifecycle rules as well.
  • That’s it, you now have an automated policy on your data without an ETL or File task process. You can create as many policies as you need and each can be unique to how you wish to retain your data.

]]>
Big Data for The Rest of Us. Affordable and Modern Business Intelligence Architecture – Auto uploading and syncing your data using AWS S3 http://jackofalltradesmasterofsome.com/blog/2018/09/18/big-data-for-the-rest-of-us-affordable-and-modern-business-intelligence-architecture-auto-uploading-and-syncing-your-data-using-aws-s3/ Tue, 18 Sep 2018 01:58:18 +0000 http://jackofalltradesmasterofsome.com/blog/?p=141

The first process in any data warehouse project is obtaining the data into a staging environment. In the traditional data warehouse, this required an ETL process to pick up data files from a local folder or FTP, and in some cases, a direct SQL connection to source systems to then load into a dedicated staging database. In the new process we will be defining we will be using an S3 bucket to be our new staging environment. Staging data will forever live in raw file data, as any analysis or query needed against this data will be handled via tools like Athena or Elastic Map Reduce which we will cover later. Keeping this data in S3 (known as Azure Blog storage in the Microsoft world) is a cheap and convenient way to store massive amounts of data for a relatively low cost. For example, the first 50 TB is stored at a cost of $0.023 per GB. In most use cases we can assume our data requirements stay under this threshold so if we are assuming 1 TB of data files, we can ball park around $23 a month for storage or $276 a year. Pretty cheap.

S3 also offers classes of storage that get cheaper with less readability needs. Data can be moved from S3 Standard which is high availability to S3 infrequent access which is cheaper but also has slightly slower read/writes. Very old data can then be moved to S3 Glacier which is the cheapest of the classes but also is very slow on access and restore. Processes can be defined around your data rules to maximize your storage vs. cost. We will walk through that process in a later post as well. For now, we will work in S3 Standard since the cost is already low.

So, let’s get into it. Here are the steps required to automate data from a local folder to you first S3 bucket!

  • You will need to create your first S3 bucket in the AWS console. For more details, please see the AWS documentation. We will call this bucket “DataDump002”. No need for anything fancy just yet.
  • The next step will be setting up the right access to the bucket from a newly created IAM user. For this, navigate to IAM in AWS and create a new user. I called mine “AutoUploader”, put it in a group called “AutoUploaderGroup” and set its permission to “AmazonS3FullAccess”
    1. Be sure to copy the ARN as well as the Key and Secret Key.
  • From here, navigate back to your S3 Bucket and click on your bucket to get to the details and click on Bucket Policy. This is where you will define the JSON string that grants the correct permissions for read write. This step is a bit tricky as the Policy Generator in the IAM tool can create some odd rules and cause some errors if the bucket is empty vs. a dummy file being in place. Please see my JSON below for a working sample. Replace what is in BOLD for the ARN for your IAM user and the bucket. If you get errors saving this rule, remove all items from the Action section except the Get and Put, and add them back in once you have data in the bucket at a later time.
  • JSON Policy for Bucket

{

    “Version”: “2012-10-17”,

    “Id”: “Policy1537137960662”,

    “Statement”: [

        {

            “Sid”: “Stmt1537137433831”,

            “Effect”: “Allow”,

            “Principal”: {

“AWS”: “arn:aws:iam::123456789:user/AutoUploader

            },

            “Action”: [

                “s3:Get*”,

                “s3:Put*”,

                “s3:List*”,

                “s3:ListBucket”,

                “s3:ListBucketVersions”

            ],

“Resource”: “arn:aws:s3:::dailydatadump02

        }

    ]

}

  • That should be it for the set-up portion on AWS. For the next section you will need to download and install AWS Command Line Interface which can be found using a Google Search. Once installed, you will need to restart your machine.
  • Once restarted, run the command “AWS Config”.
  1. You will be promoted for your key and secret key from Step 2.
  2. Enter the region your AWS was set up in. The S3 buckets do not require a region. Mine was set to us-east-1.
  3. Set output to “text”
  • For this example I created a local folder on my C drive and dropped 3 big data files I downloaded with data surrouding the world happiness index which is free to download from https://www.kaggle.com/
  • From here all you need to know is the following two commands.
    1. aws s3 sync . s3://dailydatadump to sync files from your local folder to your bucket
    2. aws s3 sync s3://dailydatadump . to sync files from your S3 bucket back to your local folder
    3. The sync command works a lot like the XCOPY command in regular BAT file command prompt. Files that exist and are unchanged will be skipped and save you the processing time.
    • That’s it! A few small tweaks and this can now be set as BAT file and scheduled to run every few minutes to keep data files flowing to your S3 bucket for all your future data analysis needs!

]]>
Big Data for The Rest of Us. Affordable and Modern Business Intelligence Architecture – An Introduction using AWS http://jackofalltradesmasterofsome.com/blog/2018/09/17/big-data-for-the-rest-of-us-affordable-and-modern-business-intelligence-architecture-an-introduction/ Mon, 17 Sep 2018 20:54:18 +0000 http://jackofalltradesmasterofsome.com/blog/?p=138 If you google the use cases for Big Data, you will usually find references to scenarios such as web click analytics, streaming data or even IOT sensor data, but most organizations data needs and data sources never fall into any of these categories. However, that does not mean they are not great candidates for a Modern Big Data BI solution.

What is never mentioned in the use cases above is the cost, which can get astronomical. Most businesses do not require that level of horse power but can still leverage the new technologies to create a data lake and data warehouse for a fraction of the cost. If done right, we can have a solution that is more scalable and cheaper than traditional server-based data warehouses. This allows organizations to future proof their data needs as they may have “medium” data right now but expect to grow into the big data space later.

In the next series of blog post, were going to investigate the steps on creating the basic building blocks of a modern Business Intelligence solution in AWS as well as keep an eye on cost and resourcing. As a current production server VM with 1TB of space and 8GB of RAM, including database licensing runs around 20K a year, we will set this as a baseline to see if we can build out a solution that is close or cheaper. Check our the first blog post for the first in the series!

]]>
Accelerating the Staging Process for your Data Warehouse http://jackofalltradesmasterofsome.com/blog/2018/06/27/real-estate-data-warehouse-accelerating-the-staging-process/ Wed, 27 Jun 2018 21:09:56 +0000 http://jackofalltradesmasterofsome.com/blog/?p=85 Real Estate Data Warehouse – Accelerating the Staging Process

The script below can be used to build a staging environment for any sort of industry and not just real estate related databases. The specifics of a RE Data warehouse will be covered in future blog post. It will allow you to Accelerating the Staging Process for your Data Warehouse

When starting the process to capture data analytics, whether you planning to eventually build a data warehouse or eventually feed a big data Hadoop cluster, it helps to stage your data away from your source systems. This provides many benefits; the primary being having a copy of data to work with and process that is no longer in the transactional operational system. Having large processes or queries running against your transactional operation system provides unnecessary risk, can introduce bottlenecks or slowdowns and even open security holes that may not be needed. When you have a staging environment, all you need is one service level account managed by IT security that has read access to the source systems. From there, building a scheduled refresh process to load this data as a blanket truncate and reload can be set up easily. Many tools for ETL can be used and robust auditing and scheduling should be set up but getting off the ground quickly to start prototyping and profiling your data will allow you to get moving a lot sooner and providing value to the business.

For this reason, I wrote the SQL script below a while back to help me on new projects. Running this script against a linked server connection or a replicated database will quickly allow you to build a staging database with procedures to load all the data as truncate and reloads. This can then be wrapped in a master SQL procedure and scheduled, giving you a full Staging ETL process with out needing ETL tools. Remember, this is just an accelerator and will require some tweaking and optimization to get to a final state, but this should get you off the ground with your basic SQL based source systems.

/***********************

Start of Script

************************/

/***********************

Configuration

************************/

DECLARE @sqlCommand varchar(1000)

DECLARE @DatabaseNameStaging varchar(75)

DECLARE @DatabaseNameSource varchar(75)

SET @DatabaseNameStaging = ‘Staging’

SET @DatabaseNameSource = ‘SourceDB’

— Add all tables to ignore to this list

DROP TABLE #TablestoIgnore

CREATE TABLE #TablestoIgnore

(

TableName varchar(255)

)

INSERT INTO #TablestoIgnore

Select ”

–UNION

–Select ”

— Table to Store List of all Table is Source Database

DROP TABLE #TableList

CREATE TABLE #TableList

(

TableName varchar(255)

)

/***********************

Create Staging Database

************************/

SET @sqlCommand = ‘IF NOT EXISTS(SELECT * FROM sys.databases WHERE NAME = ”’+@DatabaseNameStaging+”’)

BEGIN

CREATE DATABASE ‘+@DatabaseNameStaging+’

— Set Logging to Simple

USE master ;

ALTER DATABASE ‘+@DatabaseNameStaging+’ SET RECOVERY SIMPLE

END’

EXEC (@sqlCommand)

/***********************

Get List of All Tables

************************/

SET @sqlCommand = ‘INSERT INTO #TableList SELECT DISTINCT T.name AS Table_Name

FROM   ‘+@DatabaseNameSource+’.sys.objects AS T

WHERE  T.type_desc = ”USER_TABLE”

AND T.name NOT IN (SELECT TableName FROM #TablestoIgnore)

ORDER By 1′

EXEC (@sqlCommand)

–Create Drop and Create Statements

SELECT ‘IF OBJECT_ID(”’ + @DatabaseNameStaging + ‘.dbo.’+ TableName +  ”’, ”U”) IS NOT NULL DROP TABLE ‘ + @DatabaseNameStaging + ‘.dbo.’+ TableName  + ‘;’ AS DropStatement,

‘SELECT Top 1 * INTO ‘ + @DatabaseNameStaging + ‘.dbo.’+ TableName  + ‘ From ‘ + @DatabaseNameSource + ‘.dbo.’+ TableName  AS CreateStatement

INTO #DatabaseStatements

FROM #TableList

–Create Drop and Create Statements

— Run Drop Commands

DECLARE @MyCursor CURSOR;

DECLARE @MyField varchar(500);

BEGIN

SET @MyCursor = CURSOR FOR

SELECT DropStatement FROM #DatabaseStatements

OPEN @MyCursor

FETCH NEXT FROM @MyCursor

INTO @MyField

WHILE @@FETCH_STATUS = 0

BEGIN

EXEC (@MyField)

FETCH NEXT FROM @MyCursor

INTO @MyField

END;

CLOSE @MyCursor ;

DEALLOCATE @MyCursor;

END;

— Run Create Commands

BEGIN

SET @MyCursor = CURSOR FOR

SELECT CreateStatement FROM #DatabaseStatements

OPEN @MyCursor

FETCH NEXT FROM @MyCursor

INTO @MyField

WHILE @@FETCH_STATUS = 0

BEGIN

EXEC (@MyField)

FETCH NEXT FROM @MyCursor

INTO @MyField

END;

CLOSE @MyCursor ;

DEALLOCATE @MyCursor;

END;

/***********************

Create All Stored Procedures

to Load Staging

*** THIS SECTION MUST BE RUN AGAINST STAGING ENVIRONMENT ***

*** This step may result in Error for Identity Tables. Those ETL’s will need to be created

************************/

USE Staging

— Run Create Commands

BEGIN

SET @MyCursor = CURSOR FOR

SELECT TableName

FROM #TableList

OPEN @MyCursor

FETCH NEXT FROM @MyCursor

INTO @MyField

WHILE @@FETCH_STATUS = 0

BEGIN

EXEC ( ‘IF OBJECT_ID(”spLoad’+@MyField+”’, ”P”) IS NOT NULL DROP PROC spLoad’+@MyField+”)

EXEC ( ‘TRUNCATE TABLE ‘+@DatabaseNameStaging+’.dbo.’+@MyField+’

CREATE PROCEDURE dbo.spLoad’+@MyField+’

AS

BEGIN

SET NOCOUNT ON;

— Insert statements for procedure here

INSERT INTO ‘+@DatabaseNameStaging+’.dbo.’+@MyField+’

SELECT * FROM ‘+@DatabaseNameSource+’.dbo.’+@MyField+’

END’)

EXEC ( ‘CREATE PROCEDURE dbo.spLoad’+@MyField+’

AS

BEGIN

SET NOCOUNT ON;

— Insert statements for procedure here

INSERT INTO ‘+@DatabaseNameStaging+’.dbo.’+@MyField+’

SELECT * FROM ‘+@DatabaseNameSource+’.dbo.’+@MyField+’

END’)

FETCH NEXT FROM @MyCursor

INTO @MyField

END;

CLOSE @MyCursor ;

DEALLOCATE @MyCursor;

END;

DROP TABLE #DatabaseStatements

/***********************

End of Script

************************/

 



Be sure to check out my full online class on the topic. A hands on walk through of a Modern Data Architecture using Microsoft Azure. For beginners and experienced business intelligence experts alike, learn the basic of navigating the Azure Portal to building an end to end solution of a modern data warehouse using popular technologies such as SQL Database, Data Lake, Data Factory, Data Bricks, Azure Synapse Data Warehouse and Power BI. Link to the class can be found here or directly here.

Part 1 – Navigating the Azure Portal

Part 2 – Resource Groups and Subscriptions

Part 3 – Creating Data Lake Storage

Part 4 – Setting up an Azure SQL Server

Part 5 – Loading Data Lake with Azure Data Factory

Part 6 – Configuring and Setting up Data Bricks

Part 7 – Staging data into Data Lake

Part 8 = Provisioning a Synapse SQL Data Warehouse

Part 9 – Loading Data into Azure Data Synapse Data Warehouse



]]>