Parse an S3 Protocol URI (s3://)
Parse the s3:// protocol URI format used by AWS CLI, SDKs, and tools like Spark and Hadoop. Understand how it maps to HTTP endpoints.
S3 Protocol
Detailed Explanation
The s3:// Protocol URI
The s3:// protocol is not an HTTP URL — it is a URI scheme used by AWS tools (CLI, SDKs), Apache Spark, Hadoop, and other data processing frameworks to reference S3 objects. It provides a concise, human-readable format that abstracts away the HTTP endpoint details.
URI Structure
s3://BUCKET/KEY
Example
s3://data-lake-prod/raw/events/2024/01/15/events.parquet
Parsed Components
| Component | Value |
|---|---|
| Bucket | data-lake-prod |
| Key | raw/events/2024/01/15/events.parquet |
| Region | (not embedded in URI) |
| Style | S3 Protocol |
Where s3:// Is Used
| Context | Example |
|---|---|
| AWS CLI | aws s3 cp s3://bucket/key ./local-file |
| AWS SDK | Used in SDK configuration to specify S3 locations |
| Apache Spark | spark.read.parquet("s3://bucket/path") |
| Hadoop | hadoop fs -ls s3://bucket/prefix/ |
| AWS Glue | Data catalog table locations |
| AWS Athena | Query result locations |
| Terraform | S3 backend state storage |
Region Resolution
The s3:// URI does not include region information. The region is resolved by:
- The
AWS_DEFAULT_REGIONenvironment variable - The
~/.aws/configfile profile - The EC2 instance metadata (when running on AWS)
- The bucket's actual region (via a HEAD request)
Related URI Schemes
s3a://— Used by Hadoop 2.x+ for improved S3 access with the S3A filesystem connector.s3n://— Legacy Hadoop S3 native filesystem (deprecated).s3-external-1://— Rarely used, for us-east-1 explicit routing.
Use Case
Converting between s3:// URIs used in AWS Glue job scripts and HTTPS URLs needed for browser-based access or API calls in a web application frontend.