PowerBI has quickly risen to become the default dashboarding tool for many business users, while S3 continues to remain the default object store for technical users. So, what would seem like an effortless connection between the two is surprisingly not so effortless. Let us spare you the pain and take you right to our 5-minute solution.
Microsoft PowerBI is quickly growing as the dashboarding tool of choice because: desktop version is free, sharing dashboards only costs $10/user/mo, library of (free) crowdsourced visualizations is growing rapidly (such as word clouds, box plots etc), and data querying capabilities are comprehensive enough to replace the need for a separate data engineering tool.
Why S3? Amazon S3 is the most popular object store for small and medium businesses due to the fact that: uploading files to S3 is free, storing and retrieving files only costs ~$0.02/GB/mo access management is easy to setup, and programmatic access capabilities of S3 make it convenient enough to be embedded within enterprise-ready applications.
How would we connect the two?
While most online solutions suggest setting up a database connection to Redshift or Athena, here is a workaround that is much simpler:
Step 1: Create an Amazon S3 bucket in your AWS account.
Step 2: Store your data file in this bucket.
Step 3: Create access credentials in Amazon IAM to access the data.
Step 4: Open PowerBI desktop and select Get data > Other > Python script.
Step 5: Insert the code below, specifying your AWS keys, bucket name and file path. Note that embedding credentials within code is not ideal (even though this is within PowerBI), so in practice it would be better to save the credentials securely as environment variables.
Step 6: Hit OK and watch your data file come through as a data table. From here on, your PowerBI queries should work as they would with any data table. Enjoy!