Your business is growing, and keeping track of your structured and unstructured data is becoming more difficult. You have decided to use AWS Lake Formation to build a data lake because it allows you to control and audit access to the data stored there.
In this lab, you use Lake Formation to set up a data lake for the Amazon Customer Reviews Dataset. After creating the data lake, you set up an AWS Glue crawler to determine the schema and create a table in the AWS Glue Data Catalog. Once you have crawled the data, you grant access to the table and use Amazon Athena to query the data.
AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few steps in the AWS Management Console.
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Table of Contents
Section | Time Stamp |
---|---|
Task 1: Explore the Lab Environment | 2:38 |
Task 1.1: Create folders in the S3 bucket | 2:38 |
Task 1.2: Load the AWS Cloud9 IDE | 4:35 |
Task 1.3: Copy data to the S3 bucket | 6:01 |
Task 2: Set up AWS Lake Formation | 10:04 |
Task 2.1: Register the Amazon S3 storage | 13:29 |
Task 2.2: Update permissions | 15:59 |
Task 2.3: Validate permissions for databases and tables | 18:26 |
Task 2.4: Create a database | 20:05 |
Task 3: Crawl review data with AWS Glue | 21:49 |
Task 3.1: Use a Crawler to add a table | 22:30 |
Task 3.2: Run the Crawler to add data to the table | 26:12 |
Task 3.3: Task validation | 27:15 |
Task 4: Use Athena to query data | 30:06 |
Task 4.1: Update the query results location | 32:14 |
Task 4.2: Run a query | 33:28 |
Task 5: Manage users with AWS Lake Formation policies | 35:21 |
Task 5.1: Grant the user access to the table | 39:05 |
Challenge Task: Add a user with restrictive permissions to access data | 42:27 |
Conclusion | 50:07 |