I have been asked this question many times by my colleagues that why can’t Blob Storage and Data Lake Store be used interchangeably when they share the same platform or how does that matter what they decide to go with. So, let’s try to document the answer today 🙂
When Azure Data Lake Store Gen1 was released then probably there were few more differences than what we need to learn against ADLS Gen2. Since data lake Gen1 is an old story now so we will not touch upon it in this article.
In fact, I would say, before comparing them, one thing we need to understand, learn and grab is that Azure Data Lake Store Gen2 has been built on top of Azure Blob Storage. By just knowing this, it will help you to answer many questions or clear any doubts that you still had in your mind probably because of being read something about ADLS Gen1 and thinking the same in context of ADLS Gen2. Note that ADLS Gen1 was not built over Azure Blob Storage.
While making a feature comparison between Azure Blob Storage and ADLS Gen2, you just need to see what additional capabilities data lake store provides because all features of Azure Blob Storage are available in ADLS too.
ADLS Gen2 is optimized for analytics workload i.e. in addition to just storing the files here, you can query them too in an optimal way. This is more scalable. There is no limit on the file size or the number of files which can be stored in ADLS. Further, ADLS provides you a mechanism to authenticate using AAD instead of the account keys. In terms of authorization, other than RBAC, you have granular control at the file and folder level due to its POSIX compliance.
When deciding between Azure Blob Storage and Azure Data Lake Store, you need to consider the costing factor. This is what Microsoft says –
If you read this, you might get an impression that ADLS Gen2 and Azure blob storage both have the same costing model and probably the same billing cost too. No, that’s not the case. But, Microsoft is also not lying. If you read this carefully, it says that same low-cost model but for storage i.e. per GB cost is same but storage cost is not just the disk cost. You are billed for I/O operations on the storage too and this is where it comes as a huge difference between Azure Blob Storage vs. ADLS Gen2. Let’s see it through some screenshots captured from pricing calculator estimates –
Two pictures below shows that both Azure Blob Storage and ADLS Gen2 bears the same cost for same capacity. No change at all and this is what Microsoft says that same storage cost model. True!
After considering operations cost at some assumption about number of operations per month, final estimates stands around $21.84 for Azure Blob Storage as indicated in the screenshot below –
However, final estimates for Azure Data Lake Store Gen2 stands around $116.90 as indicated in below snapshot. A huge difference!
So, it might be tempting to use ADLS because of its features and reading about the fact that both shares the same storage cost model but operations cost is something that you can not ignore. If you are not worried about the dollars then yes, go ahead and play with ADLS Gen2, you will enjoy it. But, if you have to take a neutral decision then use it only when it’s needed i.e. when you plan to perform some analytics over it.
So, now you know that both have their own purpose and even after having the same low cost storage model, where comes the huge difference in blob vs. ADLS billing cost.