Introduction
Nowadays, I was working to upgrade the architecture of my IoT-based application that was based on IoT Hub and Event Hub. I was dumping the device's telemetry data into the on-premises Postgres database but the data is growing more rapidly day by day therefore I decided to move my all old data to the Azure data lake Storage Gen2. But I also needed the old data in my application for some reasons. So, I was searching for the best solution for my application to access the specific data from the Azure data lake then I found an optimal solution for querying specific data from my files.
Azure Data Lake Storage Query Acceleration
Query acceleration is used to retrieve a subset of data from your storage account. Query acceleration supports CSV and JSON formatted data as input to each request. Query acceleration enables applications and analytics frameworks to dramatically optimize data processing by retrieving only the data that they require to perform a given operation.
Data flow
The following diagram illustrates how a typical application uses query acceleration to process data.
Prerequisites
- Create a general-purpose v2 storage account
- Enable query acceleration
- Azure Storage .Net SDK
Road Map
- Azure Login
- Enable Query Acceleration
- Create an Azure Resource Group
- Create Azure Data Lake Storage Gen2 Account
- Create Azure Data Lake Container
- Upload JSON File
- Create a new Console App in Visual Studio 2019
- Add References to NuGet Packages
- Query Acceleration Main Logic
Step 1 - Login Using Azure Portal OR Azure CLI
Log in to the Azure portal or log in through Azure CLI. I will show you both ways. Open your command prompt and use the following command in order to login to Azure. Make sure you already have installed the Azure CLI on your local.
- az login
After logging in, you can see your all active subscriptions in the output and you need to set the subscription for this current context. To do so use the following command. I have multiple subscriptions and I will use my Azure pass sponsorship for this context.
- az account set --subscription "Visual Studio Enterprise Subscription" //Write your subscription name
Step 2 - Enable Query Acceleration
- az feature register --namespace Microsoft.Storage --name BlobQuery
Step 3 - Create an Azure Resource Group Using Portal and CLI
As you know, we already logged in using CLI. Now we will create a resource group for the Azure data lake storage gen2. We will keep our all resources in this resource group that we are creating. Use the following command in order to create the resource group.
- az group create --name "datalake-rg" --location "centralus"
Step 4 - Create Azure Data Lake Storage Gen2 Account
Click on create a new resource and search storage account and choose the storage account from the result. You can also use the following commands in order to create the storage account using Azure CLI.
- az storage account create --name <STORAGEACCOUNTNAME> \
- --resource-group <RESOURCEGROUPNAME> \
- --location eastus --sku Standard_LRS \
- --kind StorageV2 --hierarchical-namespace true
Choose your subscription, resource group, and storage account name and click Next: Networking >
The networking, tab these are the default selected options and maybe you want to use another one but for this demo please use the default selected options and hits Next: Data Protection >
The data protection, tab these are the optional option you want to use but for this demo please also use the default selected options and hits Next: Data Protection >
The advance, tab is the default selected option and maybe you want to use another one but for this demo please use the default selected options and choose the enable against the Data Lake Storage Gen2 option, and hits Next: Tags > and next review + create.
After creating the azure data lake gen2 account, open the newly created azure data lake storage gen2 account.
Step 5 - Create Azure Data Lake Container
Select the container option from the left navigation. Click on the + Container button and create a new container.
Step 6 - Upload Json File
Open the newly created container and upload a JSON file inside the container. I also attached the sample Json file on the top. You can download and use it for your learning.
Step 7 - Create a new Console App in Visual Studio 2019
Open the visual studio and create a new project and add a console app (.Net Core) with C#.
Step 8 - Add References to NuGet Packages
First of all, in References, add the references to Azure.Storage.Blob and Newtonsoft.Json using NuGet Package Manager, as shown below.
Step 9 - Query Acceleration Main Logic
Open the Program class and replace the following code with an appropriate namespace. Please grab your connection string from your storage account and replace it in the code and also use your container name inside the code.
- using Azure.Storage.Blobs;
- using Azure.Storage.Blobs.Models;
- using Azure.Storage.Blobs.Specialized;
- using Newtonsoft.Json;
- using System;
- using System.Collections.Generic;
- using System.Diagnostics;
- using System.IO;
- using System.Threading.Tasks;
- namespace AzureDataLake.QueryAcceleration
- {
- class Program
- {
- static void Main(string[] args)
- {
- MainAsync().Wait();
- }
- private static async Task MainAsync()
- {
- var connectionString = "DefaultEndpointsProtocol=https;AccountName=ahsan;AccountKey=51sarUyYuhNLCv/+w9LjcbT7914Q==;EndpointSuffix=core.windows.net";
- var blobServiceClient = new BlobServiceClient(connectionString);
- var containerClient = blobServiceClient.GetBlobContainerClient("ContainerName");
- await foreach (var blobItem in containerClient.GetBlobsAsync(BlobTraits.Metadata,BlobStates.None, "File Prefix"))
- {
- var blobClient = containerClient.GetBlockBlobClient(blobItem.Name);
- var options = new BlobQueryOptions
- {
- InputTextConfiguration = new BlobQueryJsonTextOptions(),
- OutputTextConfiguration = new BlobQueryJsonTextOptions()
- };
- var result = await blobClient.QueryAsync(@"SELECT * FROM BlobStorage WHERE measuringpointid = 547", options);
- var jsonString = await new StreamReader(result.Value.Content).ReadToEndAsync();
- Console.WriteLine(jsonString);
- Console.ReadLine();
- }
- }
- }
- }
Run your console app and then you will see the following output in the console.