AWS Glue ETL Jobs
An AWS Glue ETL Job is the code that runs in the managed Apache Spark environment to perform the necessary data transformations. You can write the ETL code in Python or Scala, and AWS Glue takes care of managing the underlying Spark infrastructure for you. Based on the metadata stored in the Data C…
AWS Certified Solutions Architect - AWS Glue ETL Jobs Example Questions
Test your knowledge of AWS Glue ETL Jobs
Question 1
You have an ETL pipeline that processes online gaming data stored in Amazon S3 using AWS Glue jobs. The games are popular worldwide, and data is continuously ingested via Kinesis Data Firehose. You want to reduce gaming event processing delays. What can you do to optimize your ETL pipeline?
Question 2
As a Solutions Architect, you are building an ETL pipeline that processes customer data in JSON format stored in Amazon S3. Each day, you receive about 10 GB of new data. You need to cleanse and store this data in Amazon Redshift for immediate querying. Which is the most cost-effective solution?
Question 3
You are using AWS Glue to process a large dataset stored in Amazon S3. The dataset is primarily composed of small files. The ETL job is taking longer than expected. How would you optimize the ETL job performance?