September, 2020 - David's blog

Reading S3 data from a local PySpark session

By David LindelöfPosted on September 25, 2020Posted in Python

For the impatient To read data on S3 to a local PySpark dataframe using temporary security credentials, you need to: Download a Spark distribution bundled with Hadoop 3.x Build and install the pyspark package Tell PySpark to use the hadoop-aws library Configure the credentials The problem When you attempt read S3 data from a local […]

Month: September 2020

Share this:

Like this: