We were overwhelmed during the webinar itself when Microsoft launched Azure Purview as their next generation data catalog or say unified governance workspace and it was continuously in the mind to explore it to figure out if it is the right time to plan about it. So, let’s find out together what it offers, what its shortcomings are and how it differs from the already matured I would say Azure Data Catalog –
First things first, to provision this, it must be a registered resource provider under the subscription –
It has two more requisites too which must be registered – Microsoft.Storage and Microsoft.EventHub. If any of these is not registered, provisioning the Purview Account would fail within the validation phase itself.
You may chose 4 or 6 compute units as the underlying infrastructure to scan all your configured sources for metadata. For testing purpose, we selected the minimum i.e. 4 compute units –
Apart from the Purview Account itself, it creates a managed resource group too to host your storage account and even hub namespace which were pre-requisites for it. As of now, you do no get an option to name it your own way so you will have to live with its default name as –
During the provisioning, it gives you the flexibility to chose the region that you would like to host your purview account in. This is in contrast to Azure Data Catalog which is by default provisioned in the region, your tenant has been provisioned in. This can be considered as a great flexibility in terms of compliance.
Purview Studio is the place where you will be mostly working with to scan the resources, browse the assets and govern the accesses to your assets –
After having this provisioned, this is what we discovered as the differences from Azure Data Catalog which is now termed as ADC Gen1 –
There are few limitations that we quickly figured out as we started scanning the sources e.g. –
- Views from [Azure] SQL Server database are not supported
- Azure Analysis Services as a source is not there
- Azure Databricks is also not there as a source
- So our lineage is broken if your data is crossing such resources
- You can not delete the assets from UI once they are added. You will have to do so through its API
So, if we have to take a call then our thoughts would be like that it would be too early to plan using Azure Purview for production workloads though it is implicit too that any of the preview feature is not recommended to be used for production. However, the roadmap seems quite promising as unified place for all the governance needs in terms of data so keep an eye on this, continue to explore this but wait to figure out what exactly is offered on the table when it becomes Generally Available.
If you have any thoughts please feel encouraged to write in the comments section below.