A thread-safe performant URL Shortener in Java

Question

This is a popular interview question. It is meant to be done within 45 minutes. The task is to implement a thread-safe performant in-memory URL Shortener.

My implementation is based on two maps that store shortened url to original mapping and original to shortened mapping. Given the selected alphabet and keyLength are large enough, there should be very few collisions. If they still occur, I try to mitigate them by retrying generating a new key 3 times as well as ensuring there is no race condition at the time I add the key (by executing the ConcurrentHashMap.compute() block which is atomic and blocking).

Questions: I am interested to hear any feedback about the concurrency aspects of the solution as well as the general feedback. Did I overengineer anything? Is it thread safe? Are there possible concurrency issues? Would you expect anybody to write something like this in 45 minutes during an interview (what if there are also tests required)?

public final class UrlShortener {
    private final Map<String, String> urlToKey = new ConcurrentHashMap<>();
    private final Map<String, String> keyToUrl = new ConcurrentHashMap<>();
    private final String alphabet;
    private final Random random;
    private final int keyLength;

    public UrlShortener(String alphabet, Random random, int keyLength) {
        if (alphabet == null || alphabet.isEmpty())
            throw new IllegalArgumentException("Alphabet must not be empty");
        if (random == null) throw new IllegalArgumentException("Random must not be null");
        if (keyLength <= 0) throw new IllegalArgumentException("keyLength must be positive");
        this.alphabet = alphabet;
        this.random = random;
        this.keyLength = keyLength;
    }

    public String addUrl(String url) {
        if (url == null) throw new IllegalArgumentException("URL must not be null");
        urlToKey.computeIfAbsent(url, (urlToAdd) -> {
            var key = newUniqueKey(3);
            keyToUrl.compute(key, (keyToAdd, existingValue) -> {
                if (existingValue != null) throw new IllegalStateException("Failed to add url");
                return url;
            });
            return key;
        });
        return urlToKey.get(url);
    }

    public String getUrl(String key) {
        if (key == null) throw new IllegalArgumentException("requested key must not be null");
        return keyToUrl.get(key);
    }

    String generateRandomKey() {
        var sb = new StringBuilder();
        for (int i = 0; i < keyLength; i++) {
            int index = random.nextInt(alphabet.length());
            sb.append(alphabet.charAt(index));
        }
        return sb.toString();
    }

    String newUniqueKey(int attempts) {
        var key = generateRandomKey();
        var attemptsMade = 1;
        while (keyToUrl.containsKey(key) && attemptsMade < attempts) {
            key = generateRandomKey();
            attemptsMade++;
        }
        if (keyToUrl.containsKey(key))
            throw new IllegalStateException("Failed to generate a key");
        return key;
    }
}

In terms of collision probability, if the alphabet is for instance 62 chars (uppercase and lowercase English letters and numbers) and the key length is 10 it would produce a space of more than 8 * 10^17 unique keys. Assuming there are let's say 1 000 000 stored keys, the probability of generating a collision would be less than one per 8 * 10^11 for a single attempt, but then I perform three attempts which reduces the probability significantly.

EDIT: Adding tests as was requested in answers:

class UrlShortenerTest {

    UrlShortener urlShortener;
    Random random;

    @BeforeEach
    void before() {
        var alphabet = "ab";
        var keyLength = 2;
        random = mock(Random.class);
        urlShortener = new UrlShortener(alphabet, random, keyLength);
    }

    @Test
    void shouldGenerateRandomKey() {
        given(random.nextInt(2)).willReturn(1, 0);

        var key = urlShortener.generateRandomKey();

        assertEquals("ba", key);
    }

    @Test
    void shouldGetNewUniqueKey() {
        given(random.nextInt(2)).willReturn(1, 0);
        var attempts = 3;

        var key = urlShortener.newUniqueKey(attempts);

        assertEquals("ba", key);
    }


    @Test
    void shouldFailToGetNewUniqueKeyGivenCollision() {
        given(random.nextInt(2)).willReturn(1, 0, 1, 0, 1, 0, 1, 0);
        urlShortener.addUrl("https://example.com");

        var exception = assertThrows(IllegalStateException.class, () -> urlShortener.newUniqueKey(3));

        assertEquals("Failed to generate a key", exception.getMessage());
    }

    @Test
    void shouldGetNewUniqueKeyFromThirdAttempt() {
        given(random.nextInt(2)).willReturn(1, 0, 1, 0, 1, 0, 0, 1);
        urlShortener.addUrl("https://example.com");
        var attempts = 3;

        var uniqueKey = urlShortener.newUniqueKey(attempts);

        assertEquals("ab", uniqueKey);
    }

    @Test
    void shouldAddUrl() {
        given(random.nextInt(2)).willReturn(1, 0);

        var key = urlShortener.addUrl("https://example.com");

        assertEquals("ba", key);
    }

    @Test
    void shouldGetUrl() {
        given(random.nextInt(2)).willReturn(1, 0);
        var key = urlShortener.addUrl("https://example.com");

        var url = urlShortener.getUrl(key);

        assertEquals("https://example.com", url);
    }

    @Test
    void shouldFailToGenerateUniqueKeyWhenKeySpaceExhausted() {
        given(random.nextInt(2)).willReturn(0, 0, 1, 1, 0, 1, 1, 0);
        urlShortener.addUrl("https://example1.com");
        urlShortener.addUrl("https://example2.com");
        urlShortener.addUrl("https://example3.com");
        urlShortener.addUrl("https://example4.com");

        var exception = assertThrows(IllegalStateException.class, () -> urlShortener.addUrl("https://example5.com"));

        assertEquals("Failed to generate a key", exception.getMessage());
    }
}

As an interviewer I would use this question as a gotcha to find premature optimizers. As a candidate I would concentrate on an API that can be easily changed to support concurrent retrieval if need arises and just simply synchronize the read and write. If we use 15 minutes dev time to shave a nanosecond, we'll be in the green after 900000000000 requests. Assuming computer time is as expensive as developer time. Which it isn't. And as you can see from the answers, concurrency isn't easy so leave the error vectors to where it counts. ;) — TorbenPutkonen, Commented Jun 2 at 14:15
And again, in an interview question it's more important to clearly explain why you chose to do what you did than it is to create the most perfect answer. Always write comments that explain why you chose the given solution over some other option. — TorbenPutkonen, Commented Jun 2 at 14:22
FWIW, I'm slightly surprised that nobody's apparently written an efficient concurrent implementation of Guava's BiMap interface yet. (I mean, there kind of is one, but it uses synchronized for modifications and still doesn't appear to be 100% race-free.) That said, while possibly an interesting exercise in itself, this would be a lot more work than just implementing a URL shortener. — Ilmari Karonen, Commented Jun 2 at 14:37

Toby Speight · Accepted Answer · 2024-06-01 11:05:02Z

The concurrency looks okay to me, given that the compute() calls lock urlToKey and keyToUrl (always in the same order, so avoiding deadlock). I'd probably add some comments to the non-public functions to record what locks are held when they are called.

It might be better to explicitly lock keyToUrl before we call newUniqueKey() so that we don't have to test whether the candidate key is still absent when we insert it (and is it right that this test just fails immediately, without retrying?). Alternatively, I think it might be better to put the retry loop within the compute function:

while (attempts-- > 0) {
    var key = generateRandomKey();
    if (keyToUrl.putIfAbsent(key, urlToAdd) == null) {
        return key;
    }
}

It seems strange to ignore the return value from computeIfAbsent() and then immediately re-retrieve it as urlToKey.get(url). I would probably

    public String addUrl(String url) {
        if (url == null) {
            throw new IllegalArgumentException("URL must not be null");
        }
        return urlToKey.computeIfAbsent(url, (urlToAdd) -> {
            attempts = 3;  // fixme: magic number.
            while (attempts-- > 0) {
                var key = generateRandomKey();
                if (keyToUrl.putIfAbsent(key, urlToAdd) == null) {
                    return key;
                }
            }
            throw new IllegalStateException("Failed to create unique key");
        });
    }

I know there are no unit tests with this code, but I would expect to see some tests that specify a small alphabet and short key length to show what happens when the key space is exhausted. You might even be able to do some calculations to determine how full the mapping is relative to the entire key space.

Thank you for the answer (+1). Great suggestions. Somewhat I didn't notice that I could use the result of computeIfAbsent. When the key space is exhausted, it wouldn't be possible to obtain a unique key after three attempts and it will always fail. I did write unit test for this class and added one for testing the exhaustion case, however it closely mimics my other test, that checks if an error is thrown after three attempts. I edited the question adding tests. I am actually not very confident about my tests. What do you think about them(in general and in the context of a 45 mins interview)? — Sasha Shpota, Commented Jun 1 at 14:19

Alexander Ivanchenko · Accepted Answer · 2024-06-02 14:14:25Z

Thread-safety of `addUrl()`

Method computeIfAbsent() is guaranteed to be executed as an atomic action when invoked on a ConcurrentHashMap. Here's a quote from the documentation of ConcurrentHashMap.computeIfAbsent():

The entire method invocation is performed atomically, so the function is applied at most once per key.

Let's say, that two concurrent threads are trying to add the same URL simultaneously (i.e. addUrl() was invoked in both threads with the same argument), only one of these threads will execute both urlToKey.computeIfAbsent() and keyToUrl.compute(). The second thread will obtain from urlToKey.computeIfAbsent() the key computed by the first thread.

Hence, your implementation of addUrl() is thread-safe.

Avoid magic numbers

There's a magic number of 3 in your method addUrl() as a maximum number of attempt to generate a key.

Instead, it should be an explicit notion:

private static final int ATTEMPT_THRESHOLD = 3;

And method newUniqueKey() becomes parameterless.

Null-checks / Foolproof API

When a parameter is null in many cases it fine to blow up with a NullPointerException, and you can leverage Objects.requireNonNull(). This method has an overloaded version expecting a string message as its second argument.

But I want to raise a question, whether there's a real need to perform a null-check on every reference-type parameter? If it UrlShortener is a part of a public library, then it's fine, but if we imagine that it's intended only for you and your teammates, I doubt if it's really needed.

Why? Because if your teammates fill perfectly comfortable doing calls someMethod(null) and the application is suffering from null-infestation, then there's a more serious problem to solve than ensuring that a certain parameter isn't null.

Impure lambda / Nested lambda / Multiline lambda

There are several issues with the lambda expression below:

urlToKey.computeIfAbsent(url, (urlToAdd) -> {
    var key = newUniqueKey(3);
    keyToUrl.compute(key, (keyToAdd, existingValue) -> {
        if (existingValue != null) throw new IllegalStateException("Failed to add url");
        return url;
    });
    return key;
});

While usage of computeIfAbsent is a good idea (since it would be wasteful to generate a key if the URL already exits), this code snippet doesn't look very nice, to put mildly.

Lambda expressions are functional programming constructs, hence it's better to stay away from performing side effects inside lambdas. Always try to implement your lambdas a pure functions.

The call keyToUrl.compute() performs a side effect, updating the map keyToUrl.

Also, lambda expressions might be handy in making code more concise and expressive. But the lambda above is neither easy to read, nor succinct, to the contrary, it bears quite a bit of cognitive load.

Avoid multiline lambdas. As Venkat Subramaniam, a Java Champion and author of several books on functional programming, once said:

Lambda expressions should be glue code. Two lines may be too many.

And it's even worse than multiline lambda, it's a nested one.

The whole method addUrl differently warrants a refactoring.

We need to preserve lazy computation of keys, so computeIfAbsent stays in place, but its second argument should be changed.

public String addUrl(String url) {
    requireNonNull(url);

    var key = urlToKey.computeIfAbsent(url, urlToAdd -> uniqueKey());
    keyToUrl.putIfAbsent(key, url);
    return key;
}

There's a subtle bug, though.

As Sasha Shpota and Ilmari Karonen pointed out in the comments, there's a possibility that addUrl() might be invoked from two different threads with different arguments (i.e. we have two distinct URLs) that will be mapped to different buckets of urlToKey map. So, there will be no contention urlToKey.computeIfAbsent() will be executed simultaneously in both threads and for some reason (short key or not diverse enough alphabetic string) method uniqueKey() yields an identical key in both threads.

In this scenario, only one thread updates keyToUrl map, another thread ends up with a new entry in urlToKey that has no counterpart inside keyToUrl.

To handle this case, we need to ensure that a new entry was into keyToUrl:

public String addUrl(String url) {
    requireNonNull(url);
    
    var key = urlToKey.computeIfAbsent(url, urlToAdd -> uniqueKey());
    var previousUrl = keyToUrl.putIfAbsent(key, url);
    ensureUrlAdded(previousUrl, url);
    return key;
}

private void ensureUrlAdded(String previousUrl, String urlToAdd) {
    if (previousUrl != null) {
        urlToKey.remove(urlToAdd);
        throw new UrlRegistrationException(urlToAdd); // to be handled downstream to form a proper responce
    }
}

The compensatory action urlToKey.remove(urlToAdd) is necessary and has a cost, but if the key isn't short and the alphabet-string is sufficiently diverse the performance impact will be neglectable since such rollbacks will be very rare.

If we choose a path of optimizing the implementation for the worst case scenario (short key, alphabet-string with a few unique characters), then in this case we need to give up on using pure of functions.

Here's how it might look like

public String addUrl(String url) {
    requireNonNull(url);
    return urlToKey.computeIfAbsent(url, this::tryAddUrl);
}

private String tryAddUrl(String url) {
    var key = uniqueKey();
    var previousUrl = keyToUrl.putIfAbsent(key, url);
    if (previousUrl != null) {
        throw new UrlRegistrationException(url);
    }
    return key;
}

Refactoring `newUniqueKey`

Firstly, I advise shortening the name to uniqueKey and make it parameterless (as explained above).

Instead of doing multiple invocations of generateRandomKey(), you can make use of the do-while loop. But in this case we can leverage Stream API quite nicely, here how it might look like:

private String uniqueKey() {
    return Stream.generate(this::generateRandomKey)
        .limit(ATTEMPT_THRESHOLD)
        .filter(this::isUniqueKey)
        .findFirst()
        .filter(this::isUniqueKey)
        .orElseThrow(UrlShortener::keyGenerationFailure);
}

private boolean isUniqueKey(String key) {
    return !keyToUrl.containsKey(key);
}

private static RuntimeException keyGenerationFailure() {
    return new KeyGenerationException(); // to be handled downstream to form a proper responce
}

Tests

Leaking tests

The implementation details are leaked to the tests in two ways:

you're testing internal methods (in some cases it might be useful, I'll return to that);
the knowledge of how generateRandomKey() and newUniqueKey() are implemented is reflected in the tests.

For instance, you need to know the internal logic of generateRandomKey() in order to assert that if random produces 0 and then 1, then the generated key will be "ba".

Leaking tests are brittle, and you need to have a very strong reason to resort to them.

The tests dialing with generateRandomKey() and newUniqueKey() doesn't bring much, very dump hard-coded implementations are sufficient to pass them.

If you wanted to ensure that generated keys are well dispersed, then instead of stubbing random and using two-character alphabet and two-character key, you can write the tests against a more realistic setup (like the one you mentioned: 62-character alphabet, ten-character key) and mandate that, let's say, all 10,000 first generated keys are unique.

Just dump the required number of keys into a Set and assert that its size is equal to the number of generated keys.

Such a test does not require stubbing, as well as making assumptions about the internal mechanics of constructing keys, and will be more robust and useful.

^{You can supply a seed while creating an instance of Random to ensure that the sequence of numbers it produces is deterministic and tests are repeatable.}

Testing public API of UrlShortener class

Testing the public methods is mandatory, and it's more helpful when it is done using realistic settings and as fewer mocks as possible. Mocks are useful when you need to isolate complex components of your system from each other, mocking Random while testing getUrl() is unnecessary.

Asserting exception messages

Tests asserting exception messages are brittle.

When asserting exception type is insufficient because there are several cases in which exception of the same type can carry different messages, consider introducing a custom exception for each use case, or they are closely related a single custom exception expecting some sort of status code (implemented as an enum) as parameter. It will allow writing assertions against the exception type or these status codes.

Thank you for the answer (+1). While I agree with keeping lambdas short, I believe splitting them into two separate calls (like in your snippet) introduces a race condition that is not precent in the original code. Two concurrent calls to urlToKey.computeIfAbsent(url, urlToAdd -> uniqueKey()) with different URLs can technically produce the same key. Then, only one of the concurrent keyToUrl.computeIfAbsent(key, keyToAdd -> url) calls would insert a row, thus one of the callers will receive a key that is referring to the value different from the one they were trying to add. — Sasha Shpota, Commented Jun 1 at 14:01
Could you please elaborate on how exactly there can be a race condition in the original version? Assuming two threads execute urlToKey.computeIfAbsent() in parallel and both generate the same key, the inner keyToUrl.compute() call will allow only one of the callers entering the block (because they have the same key). The second one would receive an IllegalStateException which would also cause the urlToKey.computeIfAbsent() to exit with an exception, thus not polluting the state. How can there be a race condition? Am I missing anything? — Sasha Shpota, Commented Jun 1 at 19:22
I'm not upvoting this otherwise good answer because all the addUrl implementations currently in it have a race condition that can cause two URLs to be incorrectly mapped to the same key. (Basically, nothing prevents two concurrent calls to uniqueKey from returning the same key.) The OP's version at least checks for this and throws a "Failed to add url" error, but Toby Speight's answer actually solves this issue properly, so he gets the +1 instead. — Ilmari Karonen, Commented Jun 1 at 21:49
@SashaShpota I apologize for causing confusion, for some reason I misread your message. Sure, your initial version is free from a race condition (I meant thread contention). And you're right, there's a possibility of that urlToKey is updated while keyToUrl is not. Fixed the answer. — Alexander Ivanchenko, Commented Jun 2 at 14:00

Stack Exchange Network

A thread-safe performant URL Shortener in Java

2 Answers 2

Thread-safety of `addUrl()`

Avoid magic numbers

Null-checks / Foolproof API

Impure lambda / Nested lambda / Multiline lambda

Refactoring `newUniqueKey`

Tests

Not the answer you're looking for? Browse other questions tagged
java
multithreading
interview-questions
concurrency
or ask your own question.

Hot Network Questions

A thread-safe performant URL Shortener in Java

2 Answers 2

Thread-safety of addUrl()

Avoid magic numbers

Null-checks / Foolproof API

Impure lambda / Nested lambda / Multiline lambda

Refactoring newUniqueKey

Tests

Not the answer you're looking for? Browse other questions tagged javamultithreadinginterview-questionsconcurrency or ask your own question.

Related

Hot Network Questions

Thread-safety of `addUrl()`

Refactoring `newUniqueKey`

Not the answer you're looking for? Browse other questions tagged
java
multithreading
interview-questions
concurrency
or ask your own question.