Source: https://www.linkedin.com/feed/update/urn%3Ali%3Ashare%3A6955606826549743616
How to Use #Apache #Hudi, #Delta #Lake, and #Apache #Iceberg on #AWS #Glue Platform: https://lnkd.in/gzbqVi2Y: #Demonstration #Tutorials how Each Format Works with a #Glue #Studio #Notebook: https://lnkd.in/gmPZbE-P: You can start using those data lake formats easily in #Spark #DataFrames and #Spark #SQL on the #Glue jobs or the #GlueStudio notebooks. This post focuses on #interactive #coding and #querying on notebooks.
IN DEPTH: #AWS #Glue: https://lnkd.in/gsDW6SUe :
3 Options
There are currently three available options for bringing libraries for the #datalake formats on the #AWS #Glue job #platform: #Marketplace #connectors, #custom #connectors (BYOL), and #extra #library #dependencies.
Marketplace connectors
IN DEPTH: #AWS #Glue #Connector: https://lnkd.in/gwfjdKbY :
AWS Glue Connector Marketplace is the centralized repository for cataloging the available Glue connectors provided by multiple vendors. You can subscribe to more than 60 connectors offered in AWS Glue Connector Marketplace as of today. There are marketplace connectors available for Apache Hudi, Delta Lake, and Apache Iceberg. Furthermore, the marketplace connectors are hosted on Amazon Elastic Container Registry (Amazon ECR) repository, and downloaded to the Glue job system in runtime. When you prefer simple user experience by subscribing to the connectors and using them on your Glue ETL jobs, the marketplace connector is a good option.
Custom connectors as bring-your-own-connector (BYOC)
IN DEPTH: #AWS #Glue #Custom #Connector: https://lnkd.in/gMA5Vp7j :
AWS Glue custom connector enables you to upload and register your own libraries located in Amazon S3 as Glue connectors. You have more control over the library versions, patches, and dependencies. Since it uses your S3 bucket, you can configure the S3 bucket policy to share the libraries only with specific users, you can configure private network access to download the libraries using VPC Endpoints, etc. When you prefer having more control over those configurations, the custom connector as BYOC is a good option.
Extra library dependencies
IN DEPTH: #AWS #Glue #Library #Dependencies: https://lnkd.in/gayt9MXK :
There is another option – to download the data lake format libraries, upload them to your S3 bucket, and add extra library dependencies to them. With this option, you can add libraries directly to the job without a connector and use them. In Glue job, you can configure in Dependent JARs path. In API, it’s the –extra-jars parameter. In Glue Studio notebook, you can configure in the %extra_jars magic. To download the relevant JAR files, see the library locations in the section Create a Custom connection (BYOC).
AWS Partner-NYS Cloud VC-PE: Silicon Valley-Wall Street-Pentagon-Global Risk Management Network, LLC