Saturday, January 30, 2021

Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core

Ahsan Siddique


Introduction

 
In this article, I will demonstrate how we can get all the files from Azure Data Lake Gen2 and read the data from Avro files in the .Net Core app.
 
Prerequisites
 
You are required to have basic knowledge of Azure Data Lake Storage Gen2, beginner-level knowledge of DotNet Core, and a basic idea of Avro Formate.
 
Required NuGet Packages
  1. WindowsAzure.Storage
  2. Microsoft.Avro.Core
  3. Newtonsoft.Json 

Road Map

  • Login Using Azure
  • Create an Azure Resource Group
  • Create Azure Data Lake Storage Gen2 Account
  • Create Azure Data Lake Container
  • Upload Avro File
  • Create a new Console App in Visual Studio 2019
  • Add References to NuGet Packages
  • Create Model Class
  • Reading Avro Files Main Logic
Let's get started,
 

Step 1 - Login Using Azure Portal OR Azure CLI 

 
Log in to the Azure portal or log in through Azure CLI. I will show you both ways. Open your command prompt and use the following command in order to login to Azure. Make sure you already have installed the Azure CLI on your local.
  1. az login  
Build CI/CD Pipeline For Azure Container Instances
 
After logging in, you can see your all active subscriptions in the output and you need to set the subscription for this current context. To do so use the following command. I have multiple subscriptions and I will use my Azure pass sponsorship for this context.
  1. az account set --subscription "Azure Pass - Sponsorship"
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 2 - Create an Azure Resource Group Using Portal and CLI

 
As you know, we already logged in using CLI. Now we will create a resource group for the Azure web app and app service plan. We will keep our all resources in this resource group that we are creating. Use the following command in order to create the resource group.
  1. az group create --name "datalake-rg" --location "centralus"   
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 3 - Create Azure Data Lake Storage Gen2 Account

 
Click on create a new resource and search storage account and choose the storage account from the result. You can also use the following commands in order to create the storage account using Azure CLI.
  1. # Create managed identity  
  2. az identity create -g <RESOURCEGROUPNAME> -n <MANAGEDIDENTITYNAME>  
  3.   
  4. az extension add --name storage-preview  
  5.   
  6. az storage account create --name <STORAGEACCOUNTNAME> \  
  7.     --resource-group <RESOURCEGROUPNAME> \  
  8.     --location eastus --sku Standard_LRS \  
  9.     --kind StorageV2 --hierarchical-namespace true   
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
Choose your subscription, resource group, and storage account name and click Next: Networking >
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
The networking, tab these are the default selected options and maybe you want to use another one but for this demo please use the default selected options and hits Next: Data Protection >
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
The data protection, tab these are the optional option you want to use but for this demo please also use the default selected options and hits Next: Data Protection >
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
The advance, tab is the default selected option and maybe you want to use another one but for this demo please use the default selected options and choose the enable against the Data Lake Storage Gen2 option, and hits Next: Tags > and next review + create.
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
After creating the azure data lake gen2 account, open the newly created azure data lake storage gen2 account.
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 4 - Create Azure Data Lake Container

 
Select the container option from the left navigation. Click on the + Container button and create a new container. 
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 5 - Upload Avro File

 
Open the newly created container and upload an Avro file inside the container. I also attached the sample Avro file on the top. You can download and use it for your learning.
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 6 - Create a new Console App in Visual Studio 2019

 
Open the visual studio and create a new project and add a console app (.Net Core) with C#.
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 7 - Add References to NuGet Packages

 
First of all, in References, add the references to WindowsAzure.Storage, Microsoft.Avro.Core and Newtonsoft.Json using NuGet Package Manager, as shown below. 
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 

Step 8 - Add Sensor Model Class

 
Add a new class to your project with the name SensorModel.cs. Add the following properties to get the result set from JSON response with an appropriate namespace.
(FileName: SensorModel.cs)
  1. public class SensorModel  
  2.     {  
  3.         public string refid { getset; }  
  4.         public long measuringpointid { getset; }  
  5.         public long dateid { getset; }  
  6.         public long timeid { getset; }  
  7.         public string timestamp { getset; }  
  8.         public double value { getset; }  

  9.         public override string ToString()  
  10.         {  
  11.             return $"DateTime: {timestamp:HH:mm:ss} | MeasuringPointId: {measuringpointid} | "  
  12.                  + $"Date: {dateid} | Serialnumber: {refid} | SensorValue: {value} ";  
  13.         }  
  14.     }  

Step 9 - Reading Avro Files Main Logic

 
Open the Program class and replace the following code with an appropriate namespace. Please grab your connection string from your storage account and replace it in the code and also use your container name inside the code.
  1. using Microsoft.Hadoop.Avro;  
  2. using Microsoft.Hadoop.Avro.Container;  
  3. using Microsoft.WindowsAzure.Storage;  
  4. using Microsoft.WindowsAzure.Storage.Blob;  
  5. using System;  
  6. using System.Collections.Generic;  
  7. using System.IO;  
  8. using System.Linq;  
  9. using System.Threading.Tasks;  
  10.   
  11. namespace AzureDataLakeAvroReader  
  12. {  
  13.     class Program  
  14.     {  
  15.         // TODO: Enter the connection string of your storage account here  
  16.         const string storageConnectionString = "DefaultEndpointsProtocol=https;AccountName=datalack;AccountKey=51sa4q45jtnUfUnjEMuVx9Rn9EhW04ATFXcLliyYuhNLCv/+w9LjcbT7914Q==;EndpointSuffix=core.windows.net";  
  17.   
  18.         // TODO: Enter the blob container name here  
  19.         const string containerName = "avrocontainer";  
  20.         static void Main(string[] args)  
  21.         {  
  22.             MainAsync().Wait();  
  23.         }  
  24.   
  25.         private static async Task MainAsync()  
  26.         {  
  27.             var storageAccount = CloudStorageAccount.Parse(storageConnectionString);  
  28.             var blobClient = storageAccount.CreateCloudBlobClient();  
  29.             var blobContainer = blobClient.GetContainerReference(containerName);  
  30.   
  31.             var resultSegment =  
  32.                await blobContainer.ListBlobsSegmentedAsync(nulltrue, BlobListingDetails.All,  
  33.                nullnullnullnull);  
  34.   
  35.             foreach (var cloudBlockBlob in resultSegment.Results.OfType<CloudBlockBlob>())  
  36.             {  
  37.                 await ProcessCloudBlockBlobAsync(cloudBlockBlob);  
  38.             }  
  39.             Console.ReadLine();  
  40.         }  
  41.   
  42.         private static async Task ProcessCloudBlockBlobAsync(CloudBlockBlob cloudBlockBlob)  
  43.         {  
  44.             var avroRecords = await DownloadAvroRecordsAsync(cloudBlockBlob);  
  45.   
  46.             PrintSensorDatas(avroRecords);  
  47.         }  
  48.   
  49.         private static void PrintSensorDatas(List<AvroRecord> avroRecords)  
  50.         {  
  51.             var SensorDatas = avroRecords.Select(avroRecord =>  
  52.             CreateSensorData(avroRecord));  
  53.   
  54.             foreach (var SensorData in SensorDatas)  
  55.             {  
  56.                 Console.WriteLine(SensorData);  
  57.             }  
  58.         }  
  59.   
  60.         private static async Task<List<AvroRecord>> DownloadAvroRecordsAsync(CloudBlockBlob cloudBlockBlob)  
  61.         {  
  62.             var memoryStream = new MemoryStream();  
  63.             await cloudBlockBlob.DownloadToStreamAsync(memoryStream);  
  64.             memoryStream.Seek(0, SeekOrigin.Begin);  
  65.             List<AvroRecord> avroRecords;  
  66.             using (var reader = AvroContainer.CreateGenericReader(memoryStream))  
  67.             {  
  68.                 using (var sequentialReader = new SequentialReader<dynamic>(reader))  
  69.                 {  
  70.                     avroRecords = sequentialReader.Objects.OfType<AvroRecord>().ToList();  
  71.                 }  
  72.             }  
  73.   
  74.             return avroRecords;  
  75.         }  
  76.   
  77.         private static SensorModel CreateSensorData(AvroRecord avroRecord)  
  78.         {  
  79.             var model = new SensorModel  
  80.             {  
  81.                 refid = avroRecord.GetField<string>("refid"),  
  82.                 dateid = avroRecord.GetField<long>("dateid"),  
  83.                 measuringpointid = avroRecord.GetField<long>("measuringpointid"),  
  84.                 timeid = avroRecord.GetField<long>("timeid"),  
  85.                 timestamp = avroRecord.GetField<string>("timestamp"),  
  86.                 value = avroRecord.GetField<double>("value"),  
  87.             };  
  88.             return model;  
  89.         }  
  90.     }  
  91. }  
Run your console app and then you will see the following output in the console.
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core
 
Azure Data Lake Storage Gen2 Reading Avro Files Using .Net Core