Month: September 2020

Reading S3 data from a local PySpark session

For the impatient To read data on S3 to a local PySpark dataframe using temporary security credentials, you need to: Download a Spark distribution bundled with Hadoop 3.x Build and install the pyspark package Tell PySpark to use the hadoop-aws library Configure the credentials The problem When you attempt read S3 data from a local […]

Scroll to top