Default Apache Spark Variables in Azure Databricks Notebook

I came across some python code in one of my notebooks and realized that a variable called “sc” was referenced there which was not defined anywhere and still the code was running fine. Though I understood its behavior due to the naming convention i.e. sc = Spark Context but still was wondering how this is working without declaring/defining it.

So, did some research and discovered that anytime a notebook is attached to the cluster, SparkContext is instantiated to “sc” by default. Not just this but found a caveat too on Databricks’ official documentation that if you define it explicitly in your code, it may lead to some inconsistent behavior.

Quite a good fundamental information for Azure Databricks beginners. Sometimes, such basic concepts are of utmost importance. Happy to share it. There are some other variables too which are instantiated when notebook is attached to a cluster as shown below.

Image source: Databricks official documentation