Introduction
In this article, I will demonstrate how we can get all the files from Azure Data Lake Gen2 and read the data from Avro files in the .Net Core app. Prerequisites You are required to have basic knowledge of Azure Data Lake Storage Gen2, beginner-level knowledge of DotNet Core, and a basic idea of Avro Formate. Required NuGet Packages- WindowsAzure.Storage
- Microsoft.Avro.Core
- Newtonsoft.Json
In this article, I will demonstrate how we can get all the files from Azure Data Lake Gen2 and read the data from Avro files in the .Net Core app.
Prerequisites
You are required to have basic knowledge of Azure Data Lake Storage Gen2, beginner-level knowledge of DotNet Core, and a basic idea of Avro Formate.
Required NuGet Packages
- WindowsAzure.Storage
- Microsoft.Avro.Core
- Newtonsoft.Json
Road Map
- Login Using Azure
- Create an Azure Resource Group
- Create Azure Data Lake Storage Gen2 Account
- Create Azure Data Lake Container
- Upload Avro File
- Create a new Console App in Visual Studio 2019
- Add References to NuGet Packages
- Create Model Class
- Reading Avro Files Main Logic
Let's get started,
- Login Using Azure
- Create an Azure Resource Group
- Create Azure Data Lake Storage Gen2 Account
- Create Azure Data Lake Container
- Upload Avro File
- Create a new Console App in Visual Studio 2019
- Add References to NuGet Packages
- Create Model Class
- Reading Avro Files Main Logic
Step 1 - Login Using Azure Portal OR Azure CLI
Log in to the Azure portal or log in through Azure CLI. I will show you both ways. Open your command prompt and use the following command in order to login to Azure. Make sure you already have installed the Azure CLI on your local.- az login
After logging in, you can see your all active subscriptions in the output and you need to set the subscription for this current context. To do so use the following command. I have multiple subscriptions and I will use my Azure pass sponsorship for this context.- az account set --subscription "Azure Pass - Sponsorship"
- az login
After logging in, you can see your all active subscriptions in the output and you need to set the subscription for this current context. To do so use the following command. I have multiple subscriptions and I will use my Azure pass sponsorship for this context.
- az account set --subscription "Azure Pass - Sponsorship"
Step 2 - Create an Azure Resource Group Using Portal and CLI
As you know, we already logged in using CLI. Now we will create a resource group for the Azure web app and app service plan. We will keep our all resources in this resource group that we are creating. Use the following command in order to create the resource group.- az group create --name "datalake-rg" --location "centralus"
As you know, we already logged in using CLI. Now we will create a resource group for the Azure web app and app service plan. We will keep our all resources in this resource group that we are creating. Use the following command in order to create the resource group.
- az group create --name "datalake-rg" --location "centralus"
Step 3 - Create Azure Data Lake Storage Gen2 Account
Click on create a new resource and search storage account and choose the storage account from the result. You can also use the following commands in order to create the storage account using Azure CLI.- # Create managed identity
- az identity create -g <RESOURCEGROUPNAME> -n <MANAGEDIDENTITYNAME>
-
- az extension add --name storage-preview
-
- az storage account create --name <STORAGEACCOUNTNAME> \
- --resource-group <RESOURCEGROUPNAME> \
- --location eastus --sku Standard_LRS \
- --kind StorageV2 --hierarchical-namespace true
Choose your subscription, resource group, and storage account name and click Next: Networking > The networking, tab these are the default selected options and maybe you want to use another one but for this demo please use the default selected options and hits Next: Data Protection > The data protection, tab these are the optional option you want to use but for this demo please also use the default selected options and hits Next: Data Protection > The advance, tab is the default selected option and maybe you want to use another one but for this demo please use the default selected options and choose the enable against the Data Lake Storage Gen2 option, and hits Next: Tags > and next review + create. After creating the azure data lake gen2 account, open the newly created azure data lake storage gen2 account.
Click on create a new resource and search storage account and choose the storage account from the result. You can also use the following commands in order to create the storage account using Azure CLI.
- # Create managed identity
- az identity create -g <RESOURCEGROUPNAME> -n <MANAGEDIDENTITYNAME>
- az extension add --name storage-preview
- az storage account create --name <STORAGEACCOUNTNAME> \
- --resource-group <RESOURCEGROUPNAME> \
- --location eastus --sku Standard_LRS \
- --kind StorageV2 --hierarchical-namespace true
Choose your subscription, resource group, and storage account name and click Next: Networking >
The networking, tab these are the default selected options and maybe you want to use another one but for this demo please use the default selected options and hits Next: Data Protection >
The data protection, tab these are the optional option you want to use but for this demo please also use the default selected options and hits Next: Data Protection >
The advance, tab is the default selected option and maybe you want to use another one but for this demo please use the default selected options and choose the enable against the Data Lake Storage Gen2 option, and hits Next: Tags > and next review + create.
After creating the azure data lake gen2 account, open the newly created azure data lake storage gen2 account.
Step 4 - Create Azure Data Lake Container
Select the container option from the left navigation. Click on the + Container button and create a new container.
Select the container option from the left navigation. Click on the + Container button and create a new container.
Step 5 - Upload Avro File
Open the newly created container and upload an Avro file inside the container. I also attached the sample Avro file on the top. You can download and use it for your learning.
Open the newly created container and upload an Avro file inside the container. I also attached the sample Avro file on the top. You can download and use it for your learning.
Step 6 - Create a new Console App in Visual Studio 2019
Open the visual studio and create a new project and add a console app (.Net Core) with C#.
Open the visual studio and create a new project and add a console app (.Net Core) with C#.
Step 7 - Add References to NuGet Packages
First of all, in References, add the references to WindowsAzure.Storage, Microsoft.Avro.Core and Newtonsoft.Json using NuGet Package Manager, as shown below.
First of all, in References, add the references to WindowsAzure.Storage, Microsoft.Avro.Core and Newtonsoft.Json using NuGet Package Manager, as shown below.
Step 8 - Add Sensor Model Class
Add a new class to your project with the name SensorModel.cs. Add the following properties to get the result set from JSON response with an appropriate namespace.(FileName: SensorModel.cs)- public class SensorModel
- {
- public string refid { get; set; }
- public long measuringpointid { get; set; }
- public long dateid { get; set; }
- public long timeid { get; set; }
- public string timestamp { get; set; }
- public double value { get; set; }
- public override string ToString()
- {
- return $"DateTime: {timestamp:HH:mm:ss} | MeasuringPointId: {measuringpointid} | "
- + $"Date: {dateid} | Serialnumber: {refid} | SensorValue: {value} ";
- }
- }
Add a new class to your project with the name SensorModel.cs. Add the following properties to get the result set from JSON response with an appropriate namespace.
(FileName: SensorModel.cs)
- public class SensorModel
- {
- public string refid { get; set; }
- public long measuringpointid { get; set; }
- public long dateid { get; set; }
- public long timeid { get; set; }
- public string timestamp { get; set; }
- public double value { get; set; }
- public override string ToString()
- {
- return $"DateTime: {timestamp:HH:mm:ss} | MeasuringPointId: {measuringpointid} | "
- + $"Date: {dateid} | Serialnumber: {refid} | SensorValue: {value} ";
- }
- }
Step 9 - Reading Avro Files Main Logic
Open the Program class and replace the following code with an appropriate namespace. Please grab your connection string from your storage account and replace it in the code and also use your container name inside the code.- using Microsoft.Hadoop.Avro;
- using Microsoft.Hadoop.Avro.Container;
- using Microsoft.WindowsAzure.Storage;
- using Microsoft.WindowsAzure.Storage.Blob;
- using System;
- using System.Collections.Generic;
- using System.IO;
- using System.Linq;
- using System.Threading.Tasks;
-
- namespace AzureDataLakeAvroReader
- {
- class Program
- {
- // TODO: Enter the connection string of your storage account here
- const string storageConnectionString = "DefaultEndpointsProtocol=https;AccountName=datalack;AccountKey=51sa4q45jtnUfUnjEMuVx9Rn9EhW04ATFXcLliyYuhNLCv/+w9LjcbT7914Q==;EndpointSuffix=core.windows.net";
-
- // TODO: Enter the blob container name here
- const string containerName = "avrocontainer";
- static void Main(string[] args)
- {
- MainAsync().Wait();
- }
-
- private static async Task MainAsync()
- {
- var storageAccount = CloudStorageAccount.Parse(storageConnectionString);
- var blobClient = storageAccount.CreateCloudBlobClient();
- var blobContainer = blobClient.GetContainerReference(containerName);
-
- var resultSegment =
- await blobContainer.ListBlobsSegmentedAsync(null, true, BlobListingDetails.All,
- null, null, null, null);
-
- foreach (var cloudBlockBlob in resultSegment.Results.OfType<CloudBlockBlob>())
- {
- await ProcessCloudBlockBlobAsync(cloudBlockBlob);
- }
- Console.ReadLine();
- }
-
- private static async Task ProcessCloudBlockBlobAsync(CloudBlockBlob cloudBlockBlob)
- {
- var avroRecords = await DownloadAvroRecordsAsync(cloudBlockBlob);
-
- PrintSensorDatas(avroRecords);
- }
-
- private static void PrintSensorDatas(List<AvroRecord> avroRecords)
- {
- var SensorDatas = avroRecords.Select(avroRecord =>
- CreateSensorData(avroRecord));
-
- foreach (var SensorData in SensorDatas)
- {
- Console.WriteLine(SensorData);
- }
- }
-
- private static async Task<List<AvroRecord>> DownloadAvroRecordsAsync(CloudBlockBlob cloudBlockBlob)
- {
- var memoryStream = new MemoryStream();
- await cloudBlockBlob.DownloadToStreamAsync(memoryStream);
- memoryStream.Seek(0, SeekOrigin.Begin);
- List<AvroRecord> avroRecords;
- using (var reader = AvroContainer.CreateGenericReader(memoryStream))
- {
- using (var sequentialReader = new SequentialReader<dynamic>(reader))
- {
- avroRecords = sequentialReader.Objects.OfType<AvroRecord>().ToList();
- }
- }
-
- return avroRecords;
- }
-
- private static SensorModel CreateSensorData(AvroRecord avroRecord)
- {
- var model = new SensorModel
- {
- refid = avroRecord.GetField<string>("refid"),
- dateid = avroRecord.GetField<long>("dateid"),
- measuringpointid = avroRecord.GetField<long>("measuringpointid"),
- timeid = avroRecord.GetField<long>("timeid"),
- timestamp = avroRecord.GetField<string>("timestamp"),
- value = avroRecord.GetField<double>("value"),
- };
- return model;
- }
- }
- }
Run your console app and then you will see the following output in the console.
Open the Program class and replace the following code with an appropriate namespace. Please grab your connection string from your storage account and replace it in the code and also use your container name inside the code.
- using Microsoft.Hadoop.Avro;
- using Microsoft.Hadoop.Avro.Container;
- using Microsoft.WindowsAzure.Storage;
- using Microsoft.WindowsAzure.Storage.Blob;
- using System;
- using System.Collections.Generic;
- using System.IO;
- using System.Linq;
- using System.Threading.Tasks;
- namespace AzureDataLakeAvroReader
- {
- class Program
- {
- // TODO: Enter the connection string of your storage account here
- const string storageConnectionString = "DefaultEndpointsProtocol=https;AccountName=datalack;AccountKey=51sa4q45jtnUfUnjEMuVx9Rn9EhW04ATFXcLliyYuhNLCv/+w9LjcbT7914Q==;EndpointSuffix=core.windows.net";
- // TODO: Enter the blob container name here
- const string containerName = "avrocontainer";
- static void Main(string[] args)
- {
- MainAsync().Wait();
- }
- private static async Task MainAsync()
- {
- var storageAccount = CloudStorageAccount.Parse(storageConnectionString);
- var blobClient = storageAccount.CreateCloudBlobClient();
- var blobContainer = blobClient.GetContainerReference(containerName);
- var resultSegment =
- await blobContainer.ListBlobsSegmentedAsync(null, true, BlobListingDetails.All,
- null, null, null, null);
- foreach (var cloudBlockBlob in resultSegment.Results.OfType<CloudBlockBlob>())
- {
- await ProcessCloudBlockBlobAsync(cloudBlockBlob);
- }
- Console.ReadLine();
- }
- private static async Task ProcessCloudBlockBlobAsync(CloudBlockBlob cloudBlockBlob)
- {
- var avroRecords = await DownloadAvroRecordsAsync(cloudBlockBlob);
- PrintSensorDatas(avroRecords);
- }
- private static void PrintSensorDatas(List<AvroRecord> avroRecords)
- {
- var SensorDatas = avroRecords.Select(avroRecord =>
- CreateSensorData(avroRecord));
- foreach (var SensorData in SensorDatas)
- {
- Console.WriteLine(SensorData);
- }
- }
- private static async Task<List<AvroRecord>> DownloadAvroRecordsAsync(CloudBlockBlob cloudBlockBlob)
- {
- var memoryStream = new MemoryStream();
- await cloudBlockBlob.DownloadToStreamAsync(memoryStream);
- memoryStream.Seek(0, SeekOrigin.Begin);
- List<AvroRecord> avroRecords;
- using (var reader = AvroContainer.CreateGenericReader(memoryStream))
- {
- using (var sequentialReader = new SequentialReader<dynamic>(reader))
- {
- avroRecords = sequentialReader.Objects.OfType<AvroRecord>().ToList();
- }
- }
- return avroRecords;
- }
- private static SensorModel CreateSensorData(AvroRecord avroRecord)
- {
- var model = new SensorModel
- {
- refid = avroRecord.GetField<string>("refid"),
- dateid = avroRecord.GetField<long>("dateid"),
- measuringpointid = avroRecord.GetField<long>("measuringpointid"),
- timeid = avroRecord.GetField<long>("timeid"),
- timestamp = avroRecord.GetField<string>("timestamp"),
- value = avroRecord.GetField<double>("value"),
- };
- return model;
- }
- }
- }
Run your console app and then you will see the following output in the console.