AWS Athena: Adding yyyy/mm/dd partitions to an S3 table
Assume data in S3:
s3://mybucket/foo/2021/07/27/data.gz
s3://mybucket/foo/2021/07/28/data.gz
s3://mybucket/foo/2021/07/29/data.gz
s3://mybucket/foo/2021/07/30/data.gz
And AWS Athena table linked to the S3 bucket, with the partitions Year
, Month
, and Day
.
CREATE external TABLE foos (...)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT SERDE ...
LOCATION 's3://mybucket/foo';
After creating a table, partitions must be created. The docs suggests the following command:
MSCK REPAIR TABLE foos;
However, this won't work. The query will succeed but with warnings:
Partitions not in metastore:
foo:2021/07/27,
foo:2021/07/28,
foo:2021/07/29,
foo:2021/07/30.
This is because MSCK REPAIR TABLE foos
supports only partitioning with key-value pairs in file paths, such as:
s3://mybucket/foo/year=2021/month=07/day=27/data.gz
s3://mybucket/foo/year=2021/month=07/day=28/data.gz
s3://mybucket/foo/year=2021/month=07/day=29/data.gz
s3://mybucket/foo/year=2021/month=07/day=30/data.gz
Partitions without the key-value pairs must be added manually:
ALTER TABLE foos ADD
PARTITION (year='2021',month='07',day='27')
location 's3://mybucket/foo/2021/07/27'
PARTITION (year='2021',month='07',day='28')
location 's3://mybucket/foo/2021/07/28'
PARTITION (year='2021',month='07',day='29')
location 's3://mybucket/foo/2021/07/29'
PARTITION (year='2021',month='07',day='30')
location 's3://mybucket/foo/2021/07/30';
More info:
- Amazon Forum answer from 2017
- StackOverflow
- AWS Partitioning Docs
- Alter Table Add Partition Synopsis
- ← Previous post: PostgreSQL vs Aurora PostgreSQL vs Aurora PostgreSQL Serverless
- → Next post: Functional programming: driving team adoption
This blog is written by Marcel Krcah, an independent consultant for product-oriented software engineering. If you like what you read, sign up for my newsletter