How to Use #Apache #Hudi, #Delta #Lake, and #Apache #Iceberg on #AWS #Glue Platform: ht. . .

Source: https://www.linkedin.com/feed/update/urn%3Ali%3Ashare%3A6955606826549743616

How to Use #Apache #Hudi, #Delta #Lake, and #Apache #Iceberg on #AWS #Glue Platform: https://lnkd.in/gzbqVi2Y: #Demonstration #Tutorials how Each Format Works with a #Glue #Studio #Notebook: https://lnkd.in/gmPZbE-P: You can start using those data lake formats easily in #Spark #DataFrames and #Spark #SQL on the #Glue jobs or the #GlueStudio notebooks. This post focuses on #interactive #coding and #querying on notebooks.

IN DEPTH: #AWS #Glue: https://lnkd.in/gsDW6SUe :

3 Options

There are currently three available options for bringing libraries for the #datalake formats on the #AWS #Glue job #platform: #Marketplace #connectors, #custom #connectors (BYOL), and #extra #library #dependencies.

Marketplace connectors

IN DEPTH: #AWS #Glue #Connector: https://lnkd.in/gwfjdKbY :

AWS Glue Connector Marketplace is the centralized repository for cataloging the available Glue connectors provided by multiple vendors. You can subscribe to more than 60 connectors offered in AWS Glue Connector Marketplace as of today. There are marketplace connectors available for Apache Hudi, Delta Lake, and Apache Iceberg. Furthermore, the marketplace connectors are hosted on Amazon Elastic Container Registry (Amazon ECR) repository, and downloaded to the Glue job system in runtime. When you prefer simple user experience by subscribing to the connectors and using them on your Glue ETL jobs, the marketplace connector is a good option.

Custom connectors as bring-your-own-connector (BYOC)

IN DEPTH: #AWS #Glue #Custom #Connector: https://lnkd.in/gMA5Vp7j :

AWS Glue custom connector enables you to upload and register your own libraries located in Amazon S3 as Glue connectors. You have more control over the library versions, patches, and dependencies. Since it uses your S3 bucket, you can configure the S3 bucket policy to share the libraries only with specific users, you can configure private network access to download the libraries using VPC Endpoints, etc. When you prefer having more control over those configurations, the custom connector as BYOC is a good option.

Extra library dependencies

IN DEPTH: #AWS #Glue #Library #Dependencies: https://lnkd.in/gayt9MXK :

There is another option – to download the data lake format libraries, upload them to your S3 bucket, and add extra library dependencies to them. With this option, you can add libraries directly to the job without a connector and use them. In Glue job, you can configure in Dependent JARs path. In API, it’s the –extra-jars parameter. In Glue Studio notebook, you can configure in the %extra_jars magic. To download the relevant JAR files, see the library locations in the section Create a Custom connection (BYOC).

AWS Partner-NYS Cloud VC-PE: Silicon Valley-Wall Street-Pentagon-Global Risk Management Network, LLC

August 2023: This post was reviewed and updated for accuracy. AWS Glue supports native integration with Apache Hudi, Delta Lake, and Apache Iceberg. Refer to Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor to learn more. Cloud data lakes [?]
Share this post
Avatar photo

Global Post AI-Quantum Finance & Trading Networks Pioneer Dr.-Eng.-Prof. Yogesh Malhotra is the “Singular Post AI-Quantum Pioneer” identified by Grok AI with R&D impact recognized among Artificial Intelligence (AI) and Quantitative Finance Nobel Laureates. As MIT-Princeton AI-ML-Cyber-Crypto-Quantum Finance & Trading and FinTech-Crypto Faculty-Industry Expert, and U.S. and Global Hedge Funds Advisory & Venture Capital CEO-CTO Teams Mentor, he has pioneered Silicon Valley-Wall Street-Pentagon Digital CEO-CTO Practices, Technologies, and Networks from world’s first-foremost-largest Global Digital Transformation Networks to New York State IDEA Award recognized Pentagon-USAF MVP Global Post AI-Quantum Networks pioneering Future of Finance and Trading practices as Trillion-Dollar Wall Street Hedge Funds and Investment Banks leader.