DBFS

DBFS stands for Databricks File System. It provides a place to put your data in databricks. It is a distributed file system mounted into Azure databricks workspace. This is an abstraction on top of object storage. It allows you to interact with the object storage using directory and file semantics instead of storage URLs.

Default storage location in DBFS is known as root and is referred as “/”. In a new workspace, DBFS root has the following defaults folders

or simply referred as dbfs:/FileStore/ or /mnt/ etc.

Data written to mount point paths i.e. /mnt is stored outside of dbfs root. Even though DBFS is writable i.e. you can store data in DBFS folders, it is highly recommended that you store your data in mounted object-storage rather than in DBFS file system.

You can write data to DBFS either by uploading a file or saving a data frame to it. When you upload a file, by default it goes into /FileStore/tables folder as shown in the snapshot below

Any object i.e. a file on to your storage, be it in DBFS root or in a mounted folder could be registered into a table for easy querying. This is done by reading a file content into a data frame and then saving the data frame as a parquet format. Data frames could be saved as a local or global temp tables too. The registration of the table simply makes the future access easy and you don’t need to connect to your ultimate data sources with passwords/tokens again and again. We will see all these options in a different post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s