1

I'm trying to find the longest word in a given text, but it has to be done "old school": without using split, enumerate, etc. (I'm using Python but what I'm looking for is more of a pseudocode or a general algorithm). I'm struggling with the last word (or the last character).

As a precondition, there are no leading or trailing spaces.

So I came up with two options that do the job but I feel like they're breaking some best practices:

Option 1

longestWord=""
lenlongestWord=0
currentWord=""
for i in range(len(text)):
    if text[i]==" " or i==len(text)-1:
        if i==len(text)-1:
            currentWord+=text[i]
        if len(currentWord) > lenlongestWord:
            longestWord=currentWord
            lenlongestWord=len(currentWord)
        currentWord=""
    else:
        currentWord+=text[i]
print("Longest word: ", longestWord)

From this one I dislike this part (but I had to do it or I would be missing the last word in the string:

if i==len(text)-1:
    currentWord+=text[i]

because I'm duplicating the concatenation, both in the "if" block and the "else" block.

Option 2:

text+=" "
longestWord=""
lenlongestWord=0
currentWord=""
for i in range(len(text)):
    if text[i]==" ":
        if len(currentWord) > lenlongestWord:
            longestWord=currentWord
            lenlongestWord=len(currentWord)
        currentWord=""
    else:
        currentWord+=text[i]
print("Longest word: ", longestWord)

What I don't like from this one is the fact that I'm manipulating the original text by adding a trailing whitespace so I can read it as a word separator.

Is there a third option I'm missing? Or maybe one of these two is not as bad practice as I believe?

8
  • 4
    I'm not really sure this is on-topic here. But regarding your first approach - if you don't like concatenating so much, why are you bothering with remembering the current longest word, rather than its start and end position?
    – Ordous
    Commented Mar 11, 2019 at 11:47
  • 1
    Note that your examples should reset currentWord when the find a space, rather than when they mark something as longer
    – Caleth
    Commented Mar 11, 2019 at 12:02
  • Thanks @Caleth it was a typo. Fixed it :)
    – Floella
    Commented Mar 11, 2019 at 12:18
  • 2
    You're essentially just reimplementing string.split. How is that really accomplishing anything with the artificial requirement not to use it? Commented Mar 11, 2019 at 15:21
  • 2
    Let me give you some doesn't-answer-the-question advice: Just about every language's standard library has functions for doing these things. They're well-solved problems and often available in source form. If you want to know how they've been implemented by the more-experinced, study them.
    – Blrfl
    Commented Mar 11, 2019 at 15:22

3 Answers 3

27

String concatenation can be slow. If you just want to go "old school", and speed is important, then forget storing the current word, and continually concatenating letters onto it.

Just store the start of the current word, as an index into the array, and the start of the longest word. Store the length of the current word, and the length of the longest.

Whenever the length of the current word exceeds the length of the (previous) longest, update the start and length of the longest.

At the end, you will have the start index and length of the longest word. You can just extract that word from the string.

PS: There's an assumption in your code that words are always separated by spaces, and that there's no other form of punctuation. Is this deliberate?

2

If you feel like trying recursion, you could give this a go. This is by no means elegant, and it does assume that your string is separated by a single space between words, but it's pretty quick. Other considerations you'll need to take into account is how you break ties (in this case, unless a word is strictly longer, it goes with the first one it finds). I've also built a helper function to recursively build the next word in the string.

def buildWord(string):
  if(len(string) == 0 or string[0]==' '):
    return "";
  else:
    return string[0] + buildWord(string[1:]);

def findLongest(string, word = ""):
  #stop if the string is empty
  if(len(string) <= 0):
    return word;
  #build the next word in the string
  curWord = buildWord(string);
  #check the current longest word    
  longestWord = curWord if len(curWord) > len(word) else word;
  #keep checking
  return findLongest(string[len(curWord)+1:], longestWord);

print(findLongest("this is a string with a very long antediluvian word in it"));
5
  • How is this different to OPs Option 1, except done as recursion, rather than iteration? In fact, since this is tail recursion, it's trivial to show that it does the exact same thing.
    – Ordous
    Commented Mar 11, 2019 at 15:44
  • Other than the fact that it removes the duplicate concatenation code, which is what he was concerned about, I suppose it isn't different. Commented Mar 11, 2019 at 15:48
  • Except it still does letter-by-letter concatenation of each word in buildWord?
    – Ordous
    Commented Mar 11, 2019 at 15:50
  • It does concatenation once, in one place. In option 1, it does concatenation once, in two places (in both the if and else statement). The goal here wasn't to remove concatenation entirely (and indeed, without using split, there's only a few options for that - one of which is in the other answer provided by Simon B). I was trying to address the duplicate code issue from Option 1, and I suppose I could have been more clear about that. Commented Mar 11, 2019 at 15:56
  • 1
    Sorry, you are right! I misread the question and thought OPs problem was unnecessary work, not code duplication. Your variant does solve that.
    – Ordous
    Commented Mar 11, 2019 at 18:06
1

How about the straightforward approach?

string = 'abc defg hij klm ytrda nopasdas'

st, n = 0, 0 # start and length of longest word
m = 0        # start of current word

# instead of enumerate, you can just increment i in the loop
for i, x in enumerate(string): 
    if x == ' ':
        if i > m+n:
            st = m
            n = i-st
        m = i+1

# this is the special case for the last word
if i > m+n:
    st = m
    n = i-st+1

print(string[st:st+n])

This will not copy strings and only go through your string once. If you did this in C you would probably use pointers. For C++ you would use iterators - sadly, in Python, these are not as powerful.

7
  • 3
    I think the "straightforward approach" would use split... Commented Mar 11, 2019 at 19:06
  • The split approach would be extremely slow and wasteful for any string of non-trivial length. I wouldn't dare describe it as “straightforward” at all because of all the pointless work that will have to be done in the background. Commented Mar 12, 2019 at 2:37
  • @PatriceLevesque well, in a lazy language like Haskell the split approach (maximumBy (comparing length) . words) would be performant and nice to read :)
    – michi7x7
    Commented Mar 12, 2019 at 11:51
  • 1
    @PatriceLevesque: You're micro-optimizing. I'd expect the split approach to be less than 1 order of magnitude slower than the faster approach, which even with a multi-MB string (which it sounds like is not OP's case) would be a matter of milliseconds. Commented Mar 12, 2019 at 18:24
  • To the last paragraph: In C/C++ indices are usually considered better style then pointers/iterators, they are simpler, more readable, and often also more robust (vector/string iterators are broken on modifications).
    – Frax
    Commented Mar 12, 2019 at 19:16

Not the answer you're looking for? Browse other questions tagged or ask your own question.