Saturday, April 24, 2021

Azure Data Lake Storage Gen2 Query Acceleration

Ahsan Siddique

Introduction

 
Nowadays, I was working to upgrade the architecture of my IoT-based application that was based on IoT Hub and Event Hub. I was dumping the device's telemetry data into the on-premises Postgres database but the data is growing more rapidly day by day therefore I decided to move my all old data to the Azure data lake Storage Gen2. But I also needed the old data in my application for some reasons. So, I was searching for the best solution for my application to access the specific data from the Azure data lake then I found an optimal solution for querying specific data from my files.
 

Azure Data Lake Storage Query Acceleration

 
Query acceleration is used to retrieve a subset of data from your storage account. Query acceleration supports CSV and JSON formatted data as input to each request. Query acceleration enables applications and analytics frameworks to dramatically optimize data processing by retrieving only the data that they require to perform a given operation.
 

Data flow

 
The following diagram illustrates how a typical application uses query acceleration to process data.
 
 
Prerequisites
  1. Create a general-purpose v2 storage account
  2. Enable query acceleration
  3. Azure Storage .Net SDK
Road Map
  • Azure Login
  • Enable Query Acceleration
  • Create an Azure Resource Group
  • Create Azure Data Lake Storage Gen2 Account
  • Create Azure Data Lake Container
  • Upload JSON File
  • Create a new Console App in Visual Studio 2019
  • Add References to NuGet Packages
  • Query Acceleration Main Logic

Step 1 - Login Using Azure Portal OR Azure CLI

 
Log in to the Azure portal or log in through Azure CLI. I will show you both ways. Open your command prompt and use the following command in order to login to Azure. Make sure you already have installed the Azure CLI on your local.
  1. az login
Build CI/CD Pipeline For Azure Container Instances
 
After logging in, you can see your all active subscriptions in the output and you need to set the subscription for this current context. To do so use the following command. I have multiple subscriptions and I will use my Azure pass sponsorship for this context.
  1. az account set --subscription "Visual Studio Enterprise Subscription"  //Write your subscription name
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 2 - Enable Query Acceleration

  1. az feature register --namespace Microsoft.Storage --name BlobQuery

Step 3 - Create an Azure Resource Group Using Portal and CLI

 
As you know, we already logged in using CLI. Now we will create a resource group for the Azure data lake storage gen2. We will keep our all resources in this resource group that we are creating. Use the following command in order to create the resource group.
  1. az group create --name "datalake-rg" --location "centralus"
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 4 - Create Azure Data Lake Storage Gen2 Account

 
Click on create a new resource and search storage account and choose the storage account from the result. You can also use the following commands in order to create the storage account using Azure CLI.
  1. az storage account create --name <STORAGEACCOUNTNAME> \
  2. --resource-group <RESOURCEGROUPNAME> \
  3. --location eastus --sku Standard_LRS \
  4. --kind StorageV2 --hierarchical-namespace true
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
Choose your subscription, resource group, and storage account name and click Next: Networking >
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
The networking, tab these are the default selected options and maybe you want to use another one but for this demo please use the default selected options and hits Next: Data Protection >
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
The data protection, tab these are the optional option you want to use but for this demo please also use the default selected options and hits Next: Data Protection >
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
The advance, tab is the default selected option and maybe you want to use another one but for this demo please use the default selected options and choose the enable against the Data Lake Storage Gen2 option, and hits Next: Tags > and next review + create.
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
After creating the azure data lake gen2 account, open the newly created azure data lake storage gen2 account.
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 5 - Create Azure Data Lake Container

 
Select the container option from the left navigation. Click on the + Container button and create a new container.
 
 

Step 6 - Upload Json File

 
Open the newly created container and upload a JSON file inside the container. I also attached the sample Json file on the top. You can download and use it for your learning.
 

Step 7 - Create a new Console App in Visual Studio 2019

 
Open the visual studio and create a new project and add a console app (.Net Core) with C#.
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 8 - Add References to NuGet Packages

 
First of all, in References, add the references to Azure.Storage.Blob and Newtonsoft.Json using NuGet Package Manager, as shown below.
 
 

Step 9 - Query Acceleration Main Logic

 
Open the Program class and replace the following code with an appropriate namespace. Please grab your connection string from your storage account and replace it in the code and also use your container name inside the code.
  1. using Azure.Storage.Blobs;  
  2. using Azure.Storage.Blobs.Models;  
  3. using Azure.Storage.Blobs.Specialized;  
  4. using Newtonsoft.Json;  
  5. using System;  
  6. using System.Collections.Generic;  
  7. using System.Diagnostics;  
  8. using System.IO;  
  9. using System.Threading.Tasks;  
  10.   
  11. namespace AzureDataLake.QueryAcceleration  
  12. {  
  13.     class Program  
  14.     {  
  15.         static void Main(string[] args)  
  16.         {  
  17.             MainAsync().Wait();  
  18.         }  
  19.         private static async Task MainAsync()  
  20.         {  
  21.             var connectionString = "DefaultEndpointsProtocol=https;AccountName=ahsan;AccountKey=51sarUyYuhNLCv/+w9LjcbT7914Q==;EndpointSuffix=core.windows.net";  
  22.             var blobServiceClient = new BlobServiceClient(connectionString);  
  23.             var containerClient = blobServiceClient.GetBlobContainerClient("ContainerName");  
  24.   
  25.             await foreach (var blobItem in containerClient.GetBlobsAsync(BlobTraits.Metadata,BlobStates.None, "File Prefix"))  
  26.             {  
  27.                     var blobClient = containerClient.GetBlockBlobClient(blobItem.Name);  
  28.   
  29.                     var options = new BlobQueryOptions  
  30.                     {  
  31.                         InputTextConfiguration = new BlobQueryJsonTextOptions(),  
  32.                         OutputTextConfiguration = new BlobQueryJsonTextOptions()  
  33.                     };  
  34.   
  35.                     var result = await blobClient.QueryAsync(@"SELECT * FROM BlobStorage WHERE measuringpointid = 547", options);  
  36.   
  37.                     var jsonString = await new StreamReader(result.Value.Content).ReadToEndAsync();  
  38.   
  39.                     Console.WriteLine(jsonString);  
  40.                     Console.ReadLine();  
  41.                 }  
  42.             }  
  43.         }  
  44.     }  
Run your console app and then you will see the following output in the console.
 
 

Monday, February 15, 2021

Build CI/CD Pipeline For Azure Kubernetes

Ahsan Siddique

 


Introduction

 
In my recent three articles, I developed a REST API with Azure Functions using SQL database and containerized Azure Functions Apps using Docker Desktop and created a CI/CD pipeline for my containerized Azure Functions App. In this article, I will create CI/CD pipeline for the Azure Kubernetes cluster because I want to do an automatic build and release process. If any developer wants to make a change in the code, then my pipelines will run automatically based on the branch name trigger. I already have pushed this application source code to Azure Repo. This is a public repo and anyone can access and clone the code. I will use the Azure Container Registry service for docker image hosting and I will use Azure Kubernetes service for container deployments.
  1. Develop A REST API With Azure Functions Using SQL
  2. Containerized Azure Functions Apps
Prerequisites
 
You are required to have intermediate-level knowledge of Azure Container Registry and Azure Kubernetes services and Azure DevOps.
 
Scheme
  • Create Azure Resource Group
  • Create Azure Container Registry
  • Create Azure Kubernetes Cluster
  • Create a Pipeline For Deployment to Kubernetes
Let's get started,
 
Log in to the Azure portal or log in through Azure CLI. I will show you both ways. Open your command prompt and use the following command in order to login to Azure. Make sure you already have installed the Azure CLI on your local.
 
az login
 
After logging in, you can see your all active subscriptions in the output and you need to set the subscription for this current context to do so use the following command. I have multiple subscriptions and I will use my Azure pass sponsorship for this context.
 
Build CI/CD Pipeline For Azure Container Instances
 
az account set --subscription "Azure Pass - Sponsorship"
 
Step 1 - Create an Azure Resource Group
 
As you know, we already logged in using CLI. Now we will create a resource group for our docker images and Kubernetes cluster. We will keep our all resources in this resource group that we are creating. Use the following command in order to create the resource group. If you don't have Azure CLI installed on your local you can follow the Azure Portal steps as shown below.
 
az group create --name "azurefunction-rg" --location "centralus"
 
Build CI/CD Pipeline For Azure Container Instances
 
Build CI/CD Pipeline For Azure Container Instances
 
az group list --output table
 
Step 2 - Create Azure Container Registry
 
We have created a resource group. Now for every new resource, we will add to this resource group. Use the following command to create the Azure container registry for hosting docker images. This is a paid service, but you can also use a free hosting service like Docker Hub.
 
(Azure Container Registry Name : azurefuncAcr)
 
az acr create --resource-group azurefunction-rg --name azurefuncAcr --sku Basic
 
Build CI/CD Pipeline For Azure Container Instances
 
Build CI/CD Pipeline For Azure Container Instances
 
As shown above, you can see that we have successfully created the Azure container registry.
 
Build CI/CD Pipeline For Azure Container Instances
 
Build CI/CD Pipeline For Azure Container Instances
 
Step 3 - Create Azure Kubernetes Cluster
 
Click on create a resource and choose Kubernetes Service. 
 
 
Select your subscription and resource group. Enter your cluster name and other options leave it to default, and then simply click on Review and Create.
 
 
We successfully created our Azure Kubernetes cluster.
 
 
Step 4 - Create a Pipeline For Deployment to Kubernetes
 
Login to your DevOps accounts and create a new project with the name Azure Function App.
 
Build CI/CD Pipeline For Azure Container Instances
 
Open this newly created project and add source code to this project repo. You can also use the existing source code that I mention on top of this article. I already have the source code in this project.
 
Build CI/CD Pipeline For Azure Container Instances
 
Next, click on the Pipeline tab and then choose the pipelines. This pipeline will build the source code and then create the docker image and will push to the Azure container registry that we created in the last step and create a new deployment to the Azure Kubernetes cluster. Click on create pipeline and choose the source code option. As I mentioned earlier that, I am using Azure Repos for my source code. I will select the Azure repo git option.
 
Build CI/CD Pipeline For Azure Container Instances
 
Select your repository and then branch. In my case, I will choose Azure Function App.
 
Build CI/CD Pipeline For Azure Container Instances
 
Next, select the Deploy to Azure Kubernetes Service (build and push the image to Azure Container Registry; Deploy to Azure Kubernetes Service) option from the Configure your pipeline page.
 
Build CI/CD Pipeline For Azure Container Instances
 
Next, select your active subscription from pop up and hit continue. In my case, I have two subscriptions and I select the (Azure Pass - Sponsorship). Enter your Azure account credentials and click sign in.
 
Build CI/CD Pipeline For Azure Container Instances
 
Next, select your Cluster name from the dropdown and choose the Namespace (existing or new), and select the container registry and enter the image name that you want to use. The service Port option leaves it with a default value. Finally, click on validate and configure.
 
 
Next, you will see an azure-pipelines.yml file that is a predefined template for building and pushing images to the Azure container registry and deploying them to the Kubernetes cluster. Just update the branch trigger name main to master or any branch name that you want to use for this build pipeline. Now we are good to go and we can save this pipeline as-is. Click on save and run.
  1. # Deploy to Azure Kubernetes Service  
  2. # Build and push the image to Azure Container Registry; Deploy to Azure Kubernetes Service  
  3. # https://docs.microsoft.com/azure/devops/pipelines/languages/docker  
  4.   
  5. trigger:  
  6. - master  
  7.   
  8. resources:  
  9. - repo: self  
  10.   
  11. variables:  
  12.  
  13.   # Container registry service connection established during pipeline creation  
  14.   dockerRegistryServiceConnection: '65de1c2f-d191-4f20-b6cb-15201e44d415'  
  15.   imageRepository: 'azurefunctionapp'  
  16.   containerRegistry: 'learningimagescontainerregistry.azurecr.io'  
  17.   dockerfilePath: '**/Dockerfile'  
  18.   tag: '$(Build.BuildId)'  
  19.   imagePullSecret: 'learningimagescontainerregistry1592bca3-auth'  
  20.  
  21.   # Agent VM image name  
  22.   vmImageName: 'ubuntu-latest'  
  23.     
  24.   
  25. stages:  
  26. - stage: Build  
  27.   displayName: Build stage  
  28.   jobs:    
  29.   - job: Build  
  30.     displayName: Build  
  31.     pool:  
  32.       vmImage: $(vmImageName)  
  33.     steps:  
  34.     - task: Docker@2  
  35.       displayName: Build and push an image to container registry  
  36.       inputs:  
  37.         command: buildAndPush  
  38.         repository: $(imageRepository)  
  39.         dockerfile: $(dockerfilePath)  
  40.         containerRegistry: $(dockerRegistryServiceConnection)  
  41.         tags: |  
  42.           $(tag)  
  43.             
  44.     - upload: manifests  
  45.       artifact: manifests  
  46.   
  47. - stage: Deploy  
  48.   displayName: Deploy stage  
  49.   dependsOn: Build  
  50.   
  51.   jobs:  
  52.   - deployment: Deploy  
  53.     displayName: Deploy  
  54.     pool:  
  55.       vmImage: $(vmImageName)  
  56.     environment: 'AzureFunctionApp-1436.default'  
  57.     strategy:  
  58.       runOnce:  
  59.         deploy:  
  60.           steps:  
  61.           - task: KubernetesManifest@0  
  62.             displayName: Create imagePullSecret  
  63.             inputs:  
  64.               action: createSecret  
  65.               secretName: $(imagePullSecret)  
  66.               dockerRegistryEndpoint: $(dockerRegistryServiceConnection)  
  67.                 
  68.           - task: KubernetesManifest@0  
  69.             displayName: Deploy to Kubernetes cluster  
  70.             inputs:  
  71.               action: deploy  
  72.               manifests: |  
  73.                 $(Pipeline.Workspace)/manifests/deployment.yml  
  74.                 $(Pipeline.Workspace)/manifests/service.yml  
  75.               imagePullSecrets: |  
  76.                 $(imagePullSecret)  
  77.               containers: |  
  78.                 $(containerRegistry)/$(imageRepository):$(tag)  
 
Here you will see in the output popup the three files will be added to our repository (azure-pipeline, deployment.yml, service.yml) choose the commit directly to the master branch option and hit Save and Run.
 
 
After Save and Run, you will see two stages. In the first stage, our code will be built and the docker image will be created and pushed to the Azure container registry. In the second stage, it will be deployed to the Kubernetes cluster.
 
 
The process of building an image and pushing it to the Azure container registry is in progress.
 
 
The stage of building an image and pushing it to the Azure container registry is done.
 
 
The process of deploying to the Kubernetes cluster is in progress. 
 
 
The stage of deploying to the Kubernetes cluster is done.
 
 
Open your Kubernetes cluster and from the left panel click on Services and ingresses. Here you will see a new service named azurefuncapp is running under the default namespace with its External IP address that we will use for the public.
 
 
Grab the External IP address of your service and open it in any browser. Amazing, the Azure Kubernetes cluster is up and running in the Azure portal. So, this was the process of building a CI/CD pipeline for Azure Kubernetes Cluster.