Overview¶
Packages available inside the Notebook
By default, some of the basic packages are already pre-installed in the notebook (unless prescribed otherwise by your system administrator): NumPy, pandas, scikit-learn, findspark.
To install any additional packages, contact your technology team and follow the appropriate process as per their guidelines.
Creating a large pandas DataFrame is giving me Memory Errors - How can I get more memory?
While there are no additional limitations on the Platform imposes, there are constraints that the infrastructure setup for your installation may impose.
Pandas require all the data that is being operated on to be in Memory. In such cases, it may be beneficial to use PySpark which is meant for larger data - But this may not always be possible.
Contact the technology team to understand how to handle your use-case - be it increasing the infrastructure pool available for your notebooks, or an alternative method.
When I try to import pyspark, it says pyspark not found
There are multiple ways to use PySpark in Notebooks - depending on what sort of setup was done in the Platform installation.
In our experience, the most robust way to find and use spark is: