Azure Data Factory – Data Wrangling at Scale With Power Query – Practical Aspects of It

In the previous post, we looked at how Wrangling Data Flows has been removed from its original place under Data Flows and its hereafter implementation with Power Query. Let us now explore what all it brings together –

Point No. – 1 This is called as “At Scale” because this power query executes on Spark which is a node based clustered MPP distributed architecture that allows you to scale the compute power as the need arise.

Point No. – 2 Power query runs against a dataset and at the moment it supports ONLY Azure Storage (Blob Storage and Data Lakes) and Azure Relational Databases (Azure SQL Database and Azure Synapse)

Point No. – 3 When you are creating the dataset for relational data store, you need to ensure that you have schema specified there i.e. under the dataset, you need to explicitly import schema. You can’t live with “None” specified under the schema. If you do so, you will get the following error as shown in the snapshot below –

Point No. – 4 Next is an amazing this…as soon as you select the dataset and click on OK to create the power query, you get to see the same Power Query Editor interface that you see in Power BI

Point No. – 5 If you observed, in the previous snapshot, it displays a warning that not all the functions are supported even if they are available so be cognizant of this fact so that if you find something not working, it might be because that function is still not supported.

Point No. – 6 As mentioned in point no. 1, this is executed on Spark cluster so while testing this under the Debug mode, it would need a cluster to be spun up.

Point No. – 7 Further, you can store the result of all the transformations done under power query on a target by specifying the sink dataset in the Power Query and you have number of options as the sink, not just limited to Azure storage or relational data stores –

With this basic understanding, you can explore it further to see how it fits into your scenario but definitely a good attempt by Microsoft to have consistency while dealing with data across its own products.