Find longest word in a string: are any of these algorithms good? [closed]

Question

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 5 years ago.

I'm trying to find the longest word in a given text, but it has to be done "old school": without using split, enumerate, etc. (I'm using Python but what I'm looking for is more of a pseudocode or a general algorithm). I'm struggling with the last word (or the last character).

As a precondition, there are no leading or trailing spaces.

So I came up with two options that do the job but I feel like they're breaking some best practices:

Option 1

longestWord=""
lenlongestWord=0
currentWord=""
for i in range(len(text)):
    if text[i]==" " or i==len(text)-1:
        if i==len(text)-1:
            currentWord+=text[i]
        if len(currentWord) > lenlongestWord:
            longestWord=currentWord
            lenlongestWord=len(currentWord)
        currentWord=""
    else:
        currentWord+=text[i]
print("Longest word: ", longestWord)

From this one I dislike this part (but I had to do it or I would be missing the last word in the string:

if i==len(text)-1:
    currentWord+=text[i]

because I'm duplicating the concatenation, both in the "if" block and the "else" block.

Option 2:

text+=" "
longestWord=""
lenlongestWord=0
currentWord=""
for i in range(len(text)):
    if text[i]==" ":
        if len(currentWord) > lenlongestWord:
            longestWord=currentWord
            lenlongestWord=len(currentWord)
        currentWord=""
    else:
        currentWord+=text[i]
print("Longest word: ", longestWord)

What I don't like from this one is the fact that I'm manipulating the original text by adding a trailing whitespace so I can read it as a word separator.

Is there a third option I'm missing? Or maybe one of these two is not as bad practice as I believe?

I'm not really sure this is on-topic here. But regarding your first approach - if you don't like concatenating so much, why are you bothering with remembering the current longest word, rather than its start and end position? — Ordous, Commented Mar 11, 2019 at 11:47
Note that your examples should reset currentWord when the find a space, rather than when they mark something as longer — Caleth, Commented Mar 11, 2019 at 12:02
You're essentially just reimplementing string.split. How is that really accomplishing anything with the artificial requirement not to use it? — whatsisname, Commented Mar 11, 2019 at 15:21
Let me give you some doesn't-answer-the-question advice: Just about every language's standard library has functions for doing these things. They're well-solved problems and often available in source form. If you want to know how they've been implemented by the more-experinced, study them. — Blrfl, Commented Mar 11, 2019 at 15:22

Simon B · Accepted Answer · 2019-03-11 11:56:03Z

String concatenation can be slow. If you just want to go "old school", and speed is important, then forget storing the current word, and continually concatenating letters onto it.

Just store the start of the current word, as an index into the array, and the start of the longest word. Store the length of the current word, and the length of the longest.

Whenever the length of the current word exceeds the length of the (previous) longest, update the start and length of the longest.

At the end, you will have the start index and length of the longest word. You can just extract that word from the string.

PS: There's an assumption in your code that words are always separated by spaces, and that there's no other form of punctuation. Is this deliberate?

GnoveltyGnome · Accepted Answer · 2019-03-11 15:29:41Z

2

If you feel like trying recursion, you could give this a go. This is by no means elegant, and it does assume that your string is separated by a single space between words, but it's pretty quick. Other considerations you'll need to take into account is how you break ties (in this case, unless a word is strictly longer, it goes with the first one it finds). I've also built a helper function to recursively build the next word in the string.

def buildWord(string):
  if(len(string) == 0 or string[0]==' '):
    return "";
  else:
    return string[0] + buildWord(string[1:]);

def findLongest(string, word = ""):
  #stop if the string is empty
  if(len(string) <= 0):
    return word;
  #build the next word in the string
  curWord = buildWord(string);
  #check the current longest word    
  longestWord = curWord if len(curWord) > len(word) else word;
  #keep checking
  return findLongest(string[len(curWord)+1:], longestWord);

print(findLongest("this is a string with a very long antediluvian word in it"));

edited Mar 11, 2019 at 15:29

answered Mar 11, 2019 at 15:22

GnoveltyGnome

1294 bronze badges

How is this different to OPs Option 1, except done as recursion, rather than iteration? In fact, since this is tail recursion, it's trivial to show that it does the exact same thing.
– Ordous
Commented Mar 11, 2019 at 15:44
Other than the fact that it removes the duplicate concatenation code, which is what he was concerned about, I suppose it isn't different.
– GnoveltyGnome
Commented Mar 11, 2019 at 15:48
Except it still does letter-by-letter concatenation of each word in buildWord?
– Ordous
Commented Mar 11, 2019 at 15:50
It does concatenation once, in one place. In option 1, it does concatenation once, in two places (in both the if and else statement). The goal here wasn't to remove concatenation entirely (and indeed, without using split, there's only a few options for that - one of which is in the other answer provided by Simon B). I was trying to address the duplicate code issue from Option 1, and I suppose I could have been more clear about that.
– GnoveltyGnome
Commented Mar 11, 2019 at 15:56
1

Sorry, you are right! I misread the question and thought OPs problem was unnecessary work, not code duplication. Your variant does solve that.
– Ordous
Commented Mar 11, 2019 at 18:06

Add a comment |

michi7x7 · Accepted Answer · 2019-03-11 16:36:48Z

1

How about the straightforward approach?

string = 'abc defg hij klm ytrda nopasdas'

st, n = 0, 0 # start and length of longest word
m = 0        # start of current word

# instead of enumerate, you can just increment i in the loop
for i, x in enumerate(string): 
    if x == ' ':
        if i > m+n:
            st = m
            n = i-st
        m = i+1

# this is the special case for the last word
if i > m+n:
    st = m
    n = i-st+1

print(string[st:st+n])

This will not copy strings and only go through your string once. If you did this in C you would probably use pointers. For C++ you would use iterators - sadly, in Python, these are not as powerful.

answered Mar 11, 2019 at 16:36

michi7x7

1191 bronze badge

3

I think the "straightforward approach" would use split...
– BlueRaja - Danny Pflughoeft
Commented Mar 11, 2019 at 19:06
The split approach would be extremely slow and wasteful for any string of non-trivial length. I wouldn't dare describe it as “straightforward” at all because of all the pointless work that will have to be done in the background.
– Patrice Levesque
Commented Mar 12, 2019 at 2:37
@PatriceLevesque well, in a lazy language like Haskell the split approach (maximumBy (comparing length) . words) would be performant and nice to read :)
– michi7x7
Commented Mar 12, 2019 at 11:51
1

@PatriceLevesque: You're micro-optimizing. I'd expect the split approach to be less than 1 order of magnitude slower than the faster approach, which even with a multi-MB string (which it sounds like is not OP's case) would be a matter of milliseconds.
– BlueRaja - Danny Pflughoeft
Commented Mar 12, 2019 at 18:24
To the last paragraph: In C/C++ indices are usually considered better style then pointers/iterators, they are simpler, more readable, and often also more robust (vector/string iterators are broken on modifications).
– Frax
Commented Mar 12, 2019 at 19:16

| Show 2 more comments

Stack Exchange Network

Find longest word in a string: are any of these algorithms good? [closed]

Option 1

Option 2:

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
strings
or ask your own question.

Hot Network Questions

Find longest word in a string: are any of these algorithms good? [closed]

Option 1

Option 2:

3 Answers 3

Not the answer you're looking for? Browse other questions tagged strings or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
strings
or ask your own question.