Your business is growing, and keeping track of your structured and unstructured data is becoming more difficult. You have decided to use AWS Lake Formation to build a data lake because it allows you to control and audit access to the data stored there.

In this lab, you use Lake Formation to set up a data lake for the Amazon Customer Reviews Dataset. After creating the data lake, you set up an AWS Glue crawler to determine the schema and create a table in the AWS Glue Data Catalog. Once you have crawled the data, you grant access to the table and use Amazon Athena to query the data.

AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions.

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few steps in the AWS Management Console.

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Table of Contents

Section Time Stamp
Task 1: Explore the Lab Environment 2:38
Task 1.1: Create folders in the S3 bucket 2:38
Task 1.2: Load the AWS Cloud9 IDE 4:35
Task 1.3: Copy data to the S3 bucket 6:01
Task 2: Set up AWS Lake Formation 10:04
Task 2.1: Register the Amazon S3 storage 13:29
Task 2.2: Update permissions 15:59
Task 2.3: Validate permissions for databases and tables 18:26
Task 2.4: Create a database 20:05
Task 3: Crawl review data with AWS Glue 21:49
Task 3.1: Use a Crawler to add a table 22:30
Task 3.2: Run the Crawler to add data to the table 26:12
Task 3.3: Task validation 27:15
Task 4: Use Athena to query data 30:06
Task 4.1: Update the query results location 32:14
Task 4.2: Run a query 33:28
Task 5: Manage users with AWS Lake Formation policies 35:21
Task 5.1: Grant the user access to the table 39:05
Challenge Task: Add a user with restrictive permissions to access data 42:27
Conclusion 50:07