A new Era of SPARK and PANDAS UnificationBlog

A new Era of SPARK and PANDAS Unification


BY MA Raza / ON Nov 10, 2021

When it comes to using distributed processing frameworks, Spark is the de-facto choice for professionals and large data processing hubs. Recently, Databricks’s team open-sourced a library called Koalas to implement the Pandas API with spark backend. This library is under active development and covers more than 80% of Pandas API. With the release of Spark 3.2.0, the KOALAS is integrated in the pyspark submodule named as pyspark.pandas. The seamless integration of pandas with Spark is one of the key upgrades to Spark.

To read the complete article, follow below medium link.

A new Era of SPARK and PANDAS Unification

comments powered by Disqus