How to list all AWS S3 objects in a bucket using Java

Question

What is the simplest way to get a list of all items within an S3 bucket using Java?

List<S3ObjectSummary> s3objects = s3.listObjects(bucketName,prefix).getObjectSummaries();

This example only returns 1000 items.

The question should be edited to provide the S3 package and version that you are using. — Gray, Commented Nov 6, 2011 at 23:35
This worked for me: codeflex.co/get-list-of-objects-from-s3-directory — ybonda, Commented Sep 8, 2016 at 16:53
How they could hardcode 1000 files limit. This is so bug prone, I need to fix my production now. — Przemek Piotrowski, Commented Sep 14, 2016 at 12:55
One can check this if they are not able to list objects/files in a specific folder stackoverflow.com/a/68481553/8874958 — Kishan Solanki, Commented Aug 3, 2021 at 4:07

Aleksandr Dubinsky · Accepted Answer · 2014-05-16 15:09:38Z

140

It might be a workaround but this solved my problem:

ObjectListing listing = s3.listObjects( bucketName, prefix );
List<S3ObjectSummary> summaries = listing.getObjectSummaries();

while (listing.isTruncated()) {
   listing = s3.listNextBatchOfObjects (listing);
   summaries.addAll (listing.getObjectSummaries());
}

edited May 16, 2014 at 15:09

Aleksandr Dubinsky

23k15 gold badges84 silver badges101 bronze badges

answered Nov 7, 2011 at 8:17

Ron D.

3,7946 gold badges32 silver badges39 bronze badges

44

Doesn't look like a workaround to me, that seems to be the intended use of the API.
– Joachim Sauer
Commented Nov 7, 2011 at 8:24
6

Someone suggested this edit to your answer, if you're interested
– Benjol
Commented May 2, 2012 at 7:26
s3.listObjects has a default limit of 1000 elements per listing, so as @JoachimSauer said this is the intended use of the API
– Fgblanch
Commented Aug 2, 2012 at 12:04
3

This makes the dangerous assumption that the List returned by getObjectSummaries() is mutable.
– Steve Kuo
Commented Oct 24, 2015 at 18:39
2

May i know what is the prefix here?
– Veerendar Chary Munigadapa
Commented Jul 22, 2019 at 9:49

| Show 4 more comments

madhead · Accepted Answer · 2020-04-16 13:26:48Z

For those, who are reading this in 2018+. There are two new pagination-hassle-free APIs available: one in AWS SDK for Java 1.x and another one in 2.x.

1.x

There is a new API in Java SDK that allows you to iterate through objects in S3 bucket without dealing with pagination:

AmazonS3 s3 = AmazonS3ClientBuilder.standard().build();

S3Objects.inBucket(s3, "the-bucket").forEach((S3ObjectSummary objectSummary) -> {
    // TODO: Consume `objectSummary` the way you need
    System.out.println(objectSummary.key);
});

This iteration is lazy:

The list of S3ObjectSummarys will be fetched lazily, a page at a time, as they are needed. The size of the page can be controlled with the withBatchSize(int) method.

2.x

The API changed, so here is an SDK 2.x version:

S3Client client = S3Client.builder().region(Region.US_EAST_1).build();
ListObjectsV2Request request = ListObjectsV2Request.builder().bucket("the-bucket").prefix("the-prefix").build();
ListObjectsV2Iterable response = client.listObjectsV2Paginator(request);

for (ListObjectsV2Response page : response) {
    page.contents().forEach((S3Object object) -> {
        // TODO: Consume `object` the way you need
        System.out.println(object.key());
    });
}

ListObjectsV2Iterable is lazy as well:

When the operation is called, an instance of this class is returned. At this point, no service calls are made yet and so there is no guarantee that the request is valid. As you iterate through the iterable, SDK will start lazily loading response pages by making service calls until there are no pages left or your iteration stops. If there are errors in your request, you will see the failures only after you start iterating through the iterable.

Awesome answer helped a lot, but I'd like to ask more information. I want to iterate over the pages like Spring Pageable, e.g. request the first 20 objects and if I need I could request the second page with the 20 next. Is it possible? — Guilherme Bernardi, Commented Sep 25, 2020 at 21:02
FWIW in the 1.x API replace the inBucket call with withPrefix if required. — RTF, Commented Mar 21, 2022 at 15:28
is it worth noting that the listObjectsV2Paginator method on the S3Client behaves as you have stated, BUT the same method on the S3AsyncClient actually gives you back a org.reactivestreams.Publisher<ListObjectsV2Response> which you can convert easily into a Flux and flatMapIterable into the S3 objects. Why is this good? Because it can pre-fetch pages on demand and back off on back-pressure. So once you finish a page you don't wait for the next page to be requested internally. It's "quicker" or "smoother". (N.B. they had a bug with their own flatMapIterable method I reported) — iZian, Commented Mar 21, 2023 at 16:41

user3652779 · Accepted Answer · 2014-05-19 13:14:53Z

This is direct from AWS documentation:

AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());        

ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
    .withBucketName(bucketName)
    .withPrefix("m");
ObjectListing objectListing;

do {
        objectListing = s3client.listObjects(listObjectsRequest);
        for (S3ObjectSummary objectSummary : 
            objectListing.getObjectSummaries()) {
            System.out.println( " - " + objectSummary.getKey() + "  " +
                    "(size = " + objectSummary.getSize() + 
                    ")");
        }
        listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());

Alberto A. Medina · Accepted Answer · 2013-06-12 17:09:49Z

I am processing a large collection of objects generated by our system; we changed the format of the stored data and needed to check each file, determine which ones were in the old format, and convert them. There are other ways to do this, but this one relates to your question.

    ObjectListing list = amazonS3Client.listObjects(contentBucketName, contentKeyPrefix);

    do {                

        List<S3ObjectSummary> summaries = list.getObjectSummaries();

        for (S3ObjectSummary summary : summaries) {

            String summaryKey = summary.getKey();               

            /* Retrieve object */

            /* Process it */

        }

        list = amazonS3Client.listNextBatchOfObjects(list);

    }while (list.isTruncated());

Aleksandr Dubinsky · Accepted Answer · 2018-05-31 13:44:42Z

Listing Keys Using the AWS SDK for Java

http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html

import java.io.IOException;
import com.amazonaws.AmazonClientException;
import com.amazonaws.AmazonServiceException;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.model.ListObjectsRequest;
import com.amazonaws.services.s3.model.ListObjectsV2Request;
import com.amazonaws.services.s3.model.ListObjectsV2Result;
import com.amazonaws.services.s3.model.ObjectListing;
import com.amazonaws.services.s3.model.S3ObjectSummary;

public class ListKeys {
    private static String bucketName = "***bucket name***";

    public static void main(String[] args) throws IOException {
        AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());
        try {
            System.out.println("Listing objects");
            final ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucketName);
            ListObjectsV2Result result;
            do {               
               result = s3client.listObjectsV2(req);

               for (S3ObjectSummary objectSummary : 
                   result.getObjectSummaries()) {
                   System.out.println(" - " + objectSummary.getKey() + "  " +
                           "(size = " + objectSummary.getSize() + 
                           ")");
               }
               System.out.println("Next Continuation Token : " + result.getNextContinuationToken());
               req.setContinuationToken(result.getNextContinuationToken());
            } while(result.isTruncated() == true ); 

         } catch (AmazonServiceException ase) {
            System.out.println("Caught an AmazonServiceException, " +
                    "which means your request made it " +
                    "to Amazon S3, but was rejected with an error response " +
                    "for some reason.");
            System.out.println("Error Message:    " + ase.getMessage());
            System.out.println("HTTP Status Code: " + ase.getStatusCode());
            System.out.println("AWS Error Code:   " + ase.getErrorCode());
            System.out.println("Error Type:       " + ase.getErrorType());
            System.out.println("Request ID:       " + ase.getRequestId());
        } catch (AmazonClientException ace) {
            System.out.println("Caught an AmazonClientException, " +
                    "which means the client encountered " +
                    "an internal error while trying to communicate" +
                    " with S3, " +
                    "such as not being able to access the network.");
            System.out.println("Error Message: " + ace.getMessage());
        }
    }
}

Russell Leggett · Accepted Answer · 2013-03-27 15:44:07Z

8

As a slightly more concise solution to listing S3 objects when they might be truncated:

ListObjectsRequest request = new ListObjectsRequest().withBucketName(bucketName);
ObjectListing listing = null;

while((listing == null) || (request.getMarker() != null)) {
  listing = s3Client.listObjects(request);
  // do stuff with listing
  request.setMarker(listing.getNextMarker());
}

edited Mar 27, 2013 at 15:44

Russell Leggett

8,8653 gold badges31 silver badges45 bronze badges

answered Mar 6, 2013 at 21:14

pedorro

3,2391 gold badge25 silver badges27 bronze badges

Add a comment |

jon · Accepted Answer · 2012-03-10 08:20:55Z

4

Gray your solution was strange but you seem like a nice guy.

AmazonS3Client s3Client = new AmazonS3Client(new BasicAWSCredentials( ....

ObjectListing images = s3Client.listObjects(bucketName); 

List<S3ObjectSummary> list = images.getObjectSummaries();
for(S3ObjectSummary image: list) {
    S3Object obj = s3Client.getObject(bucketName, image.getKey());
    writeToFile(obj.getObjectContent());
}

answered Mar 10, 2012 at 8:20

jon

811 silver badge1 bronze badge

5

as far as I can tell this solution will only take the firs 1000 kyes/files and print them. But does not further iterate for more files.
– CruncherBigData
Commented Oct 5, 2013 at 14:59

Add a comment |

Vini.g.fer · Accepted Answer · 2014-09-26 12:11:37Z

3

I know this is an old post, but this still might be usefull to anyone: The Java/Android SDK on version 2.1 provides a method called setMaxKeys. Like this:

s3objects.setMaxKeys(arg0)

You probably found a solution by now, but please check one answer as correct so that it might help others in the future.

answered Sep 26, 2014 at 12:11

Vini.g.fer

11.8k18 gold badges64 silver badges96 bronze badges

Add a comment |

user2798227 · Accepted Answer · 2015-03-27 04:56:32Z

This worked for me.

Thread thread = new Thread(new Runnable(){
    @Override
    public void run() {
        try {
            List<String> listing = getObjectNamesForBucket(bucket, s3Client);
            Log.e(TAG, "listing "+ listing);

        }
        catch (Exception e) {
            e.printStackTrace();
            Log.e(TAG, "Exception found while listing "+ e);
        }
    }
});

thread.start();



  private List<String> getObjectNamesForBucket(String bucket, AmazonS3 s3Client) {
        ObjectListing objects=s3Client.listObjects(bucket);
        List<String> objectNames=new ArrayList<String>(objects.getObjectSummaries().size());
        Iterator<S3ObjectSummary> oIter=objects.getObjectSummaries().iterator();
        while (oIter.hasNext()) {
            objectNames.add(oIter.next().getKey());
        }
        while (objects.isTruncated()) {
            objects=s3Client.listNextBatchOfObjects(objects);
            oIter=objects.getObjectSummaries().iterator();
            while (oIter.hasNext()) {
                objectNames.add(oIter.next().getKey());
            }
        }
        return objectNames;
}

Shababb Karim · Accepted Answer · 2020-09-21 12:43:54Z

You don't want to list all 1000 object in your bucket at a time. A more robust solution will be to fetch a max of 10 objects at a time. You can do this with the withMaxKeys method.

The following code creates an S3 client, fetches 10 or less objects at a time and filters based on a prefix and generates a pre-signed url for the fetched object:

import com.amazonaws.HttpMethod;
import com.amazonaws.SdkClientException;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.*;

import java.net.URL;
import java.util.Date;

/**
 * @author shabab
 * @since 21 Sep, 2020
 */
public class AwsMain {

    static final String ACCESS_KEY = "";
    static final String SECRET = "";
    static final Regions BUCKET_REGION = Regions.DEFAULT_REGION;
    static final String BUCKET_NAME = "";

    public static void main(String[] args) {
        BasicAWSCredentials awsCreds = new BasicAWSCredentials(ACCESS_KEY, SECRET);

        try {
            final AmazonS3 s3Client = AmazonS3ClientBuilder
                    .standard()
                    .withRegion(BUCKET_REGION)
                    .withCredentials(new AWSStaticCredentialsProvider(awsCreds))
                    .build();

            ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(BUCKET_NAME).withMaxKeys(10);
            ListObjectsV2Result result;

            do {
                result = s3Client.listObjectsV2(req);

                result.getObjectSummaries()
                        .stream()
                        .filter(s3ObjectSummary -> {
                            return s3ObjectSummary.getKey().contains("Market-subscriptions/")
                                    && !s3ObjectSummary.getKey().equals("Market-subscriptions/");
                        })
                        .forEach(s3ObjectSummary -> {

                            GeneratePresignedUrlRequest generatePresignedUrlRequest =
                                    new GeneratePresignedUrlRequest(BUCKET_NAME, s3ObjectSummary.getKey())
                                            .withMethod(HttpMethod.GET)
                                            .withExpiration(getExpirationDate());

                            URL url = s3Client.generatePresignedUrl(generatePresignedUrlRequest);

                            System.out.println(s3ObjectSummary.getKey() + " Pre-Signed URL: " + url.toString());
                        });

                String token = result.getNextContinuationToken();
                req.setContinuationToken(token);

            } while (result.isTruncated());
        } catch (SdkClientException e) {
            e.printStackTrace();
        }

    }

    private static Date getExpirationDate() {
        Date expiration = new java.util.Date();
        long expTimeMillis = expiration.getTime();
        expTimeMillis += 1000 * 60 * 60;
        expiration.setTime(expTimeMillis);

        return expiration;
    }
}

Johan Perez · Accepted Answer · 2022-07-19 14:25:26Z

Using SDK V2 reactive streams integrations with automatic pagination as described in the docs

This example uses project reactor implementation of the reactive streams standard but it also works with other implementation (e.g., RxJava)

    ListObjectsV2Request listObjects = ListObjectsV2Request
            .builder()
            .bucket("<bucketName>")
            .maxKeys(100) // Number of items per page. Using pagination to get all objects in the bucket.
            .build();

    // https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/pagination.html
    // Auto-pagination method that makes multiple service calls to get the next page of results automatically.
    // Publish messages by batches to sqs as they come from s3 pagination result.
    return Flux.from(s3Client.listObjectsV2Paginator(listObjects))
            .flatMap(list -> Flux.fromIterable(list.contents())
                    .map(s3Object -> transformObject(s3Object))
                    .collectList()
                    .flatMap(sqsPublisher::publishBatch))
            .doOnError(e -> log.error("Failed to blabla", e))
            .then();

Bishwajit Vikram · Accepted Answer · 2022-11-04 18:29:18Z

0

AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
                             .withRegion(Regions.US_EAST_1)
                             .withCredentials(new AWSStaticCredentialsProvider(new BasicAWSCredentials(getAccessKey(), getSecretKey())))
                             .build();

List<String> s3Keys = s3Client.listObjects("bucketName")
            .getObjectSummaries().stream().map(S3ObjectSummary::getKey)
            .collect(Collectors.toList());

answered Nov 4, 2022 at 18:29

Bishwajit Vikram

1031 silver badge9 bronze badges

Add a comment |

khoa junior · Accepted Answer · 2024-02-01 06:16:11Z

ListObjectsV2Request request = ListObjectsV2Request.builder().bucket("bucketName").prefix("your-prefix").build();
ListObjectsV2Iterable response = s3.listObjectsV2Paginator(request);
for (ListObjectsV2Response page : response) {
    page.contents().forEach((S3Object object) -> {
       // TODO: Consume `object` the way you need
   })

use awsV2 s3paginatorListObject in doc(https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/javav2/example_code/s3/src/main/java/com/example/s3/ListObjectsPaginated.java)

Niraj Singh · Accepted Answer · 2013-03-29 04:55:01Z

-3

Try this one out

public void getObjectList(){
        System.out.println("Listing objects");
        ObjectListing objectListing = s3.listObjects(new ListObjectsRequest()
                .withBucketName(bucketName)
                .withPrefix("ads"));
        for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
            System.out.println(" - " + objectSummary.getKey() + "  " +
                               "(size = " + objectSummary.getSize() + ")");
        }
    }

You can all the objects within the bucket with specific prefix.

answered Mar 29, 2013 at 4:55

Niraj Singh

1274 bronze badges

no u cant, only 1000 files limit, did u not read above, your solution has the same issue
– ninja
Commented Jun 26, 2019 at 0:28

Add a comment |

Collectives™ on Stack Overflow

How to list all AWS S3 objects in a bucket using Java

14 Answers 14

1.x

2.x

Not the answer you're looking for? Browse other questions tagged
java
amazon-web-services
amazon-s3
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

14 Answers 14

1.x

2.x

Not the answer you're looking for? Browse other questions tagged javaamazon-web-servicesamazon-s3 or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
java
amazon-web-services
amazon-s3
or ask your own question.