-3

I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.

Some Examples:

  1. "this is test A/ABC"
    Expected output: "this is test A" and "ABC"

  2. "this is a test; ABC/XYZ"
    Expected output: "this is a test; ABC" and "XYZ"

  3. "This TASK is assigned to ANIL/SHAM in our project"
    Expected output: "This TASK is assigned to ANIL in our project" and "SHAM"

  4. "This TASK is assigned to ANIL/SHAM in OUR project"
    Expected output: "This TASK is assigned to ANIL/SHAM in project" and "OUR"

  5. "this is test AWN.A"
    Expected output: "this is test" and "AWN.A"

  6. "XETRA-DAX" Expected output: "XETRA" and "DAX"

  7. "FTSE-100" Expected output: "-100" and "FTSE"

  8. "ATHEX" Expected output: "" and "ATHEX"

  9. "Euro-Stoxx-50" Expected output: "Euro-Stoxx-50" and ""

How can I achieve that?

19
  • 4
    Why do you need to do this with LINQ and not, say, a regex?
    – MrKWatkins
    Commented Mar 11, 2011 at 13:42
  • 6
    Is this homework? Sounds a bit like it. Commented Mar 11, 2011 at 13:43
  • 5
    sukumar - I've went ahead and edited the question for you, trying to keep only what's relevant (Linq isn't relevant, nor is the blurb about intelligence and optimization), I narrowed it down to the heart of the question. If you don't like it you can (and should) rollback, but I suggest you edit it extensively. Good luck.
    – Kobi
    Commented Mar 15, 2011 at 6:48
  • 2
    AWN.A raises a new issue - you are no longer dealing with uppercase words... Can you please explain what sorts of characters are you trying to extract?
    – Kobi
    Commented Mar 15, 2011 at 7:38
  • 4
    @DanielHilgarth: Homework or work-work it doesn't really matter. You are assigned one to learn and you are assigned the other to get paid; either way you are assigned something and are expected to complete it.
    – user1228
    Commented Mar 15, 2011 at 14:31

5 Answers 5

14

An "intelligent" version:

    string strValue = "this is test A/ABC";
    int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
    var str1 = strValue.Substring(0, ix);
    var str2 = strValue.Substring(ix + 1);

A "stupid LINQ" version:

    var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
    var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());

both cases are WITHOUT checks. The OP can add checks if he wants them.

For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".

var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");

var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");

For the third question

var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);

var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");

With code sample: http://ideone.com/5OSs0

Another update (it's becoming BORING)

Regex Regex = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);

var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");

The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ

With code sample: http://ideone.com/FqcmY

13
  • Lets consider another string as below: string strSentence ="This TASK is assigned to ANIL/SHAM in our project"; I need two sub-strings as follows: 1) This TASK is assigned to ANIL in our project 2) SHAM Does the above code gives the solution(Split at Uppercase words with delimiter) which i expects as mentioned?
    – venkat
    Commented Mar 11, 2011 at 14:10
  • 5
    @sukumar You should learn to pose better questions, in all your post you din't mention you needed to split AND extract "couples" of uppercase words.
    – xanatos
    Commented Mar 11, 2011 at 14:14
  • 1
    @sukumar Perhaps you have problems with your computer, because I just retested them with your examples and they work correctly. And if you want to check it online: ideone.com/1yY3y
    – xanatos
    Commented Mar 14, 2011 at 10:58
  • 1
    +1 for a comrade in battle. I suspect you can greatly simplify it for the new requirements, something like ^(.*)[-/ ;(]([A-Z.]+)(.*)$ should be enough - it seems you can simply ignore the word before the separator.
    – Kobi
    Commented Mar 15, 2011 at 7:51
  • 1
    @xanatos. +1 for going all out on this one. you deserve a badge or something for all the effort you've put in.
    – Todd Main
    Commented Mar 19, 2011 at 15:43
6

This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:

Match lastSeparator = Regex.Match(strExample,
                                  @"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
                                  RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("$`$'");  // before and after the match
string word = lastSeparator.Groups[1].Value; // word after the separator

This regex is a little tricky. Main tricks:


  • If the word shouldn't follow an upper case word, you can simplify the regex to:

    @"[-/ ;(](\p{Lu}+)\b"  
    
  • If you want other characters as well, you can use a character class (and maybe remove \b). For example:

    @"[-/ ;(]([\p{Lu}.,]+)"
    

Working example: http://ideone.com/U9AdK

12
  • This code doesn't give the expected result as mentioned with both scenarios(scenario 1 and 2)
    – venkat
    Commented Mar 14, 2011 at 8:22
  • @sukumar - How is that? I renamed a variable so the code would compile. Aside from that, it looks right according to both examples: ideone.com/nVs2a . What am I missing? Can you please edit the question and explain?
    – Kobi
    Commented Mar 14, 2011 at 8:31
  • For this string, string strSentence ="This TASK is assigned to ANIL/SHAM in OUR project"; The output should be in two strings as below. This TASK is assigned to ANIL/SHAM in project as one string and OUR as another string
    – venkat
    Commented Mar 15, 2011 at 6:13
  • Here the last capitalized word is taken into consideration while split from the whole string. This is also a condition to check and require the split appropriately. The above code doesn't give the result as i mentioned which i expected. in case if there is no uppercase words after 'SHAM' then only SHAM part would be separated. Hope you got it now. Provide me the code which gives the output by considering this condition as well.
    – venkat
    Commented Mar 15, 2011 at 6:15
  • 1
    @sukumar - That's because . isn't an upper case character (which is a new requirement again ಠ_ಠ ). You can specify the characters you need in the regex, for example, @"[-/ ;(]([\p{Lu}.,]+)", but then it effect \b. What if you have hello CRUEL... world, or hello CRUEL...DARK, world?
    – Kobi
    Commented Mar 15, 2011 at 7:21
4
+25

use a List of strings, set all the words to it

find the index of the / then use ElementAt() to determine the word to split which is "SHAM" in your question.

in the below sentence of yours your index of / will be 6.

string strSentence ="This TASK is assigned to ANIL/SHAM in our project"; 

then use ElementAt(6) at the end of

index is the index of the / in your List<string>

str = str.Select(s => strSentence.ElementAt(index+1)).ToList();

this will return you the SHAM

str = str.Delete(s => strSentence.ElementAt(index+1));

this will delete the SHAM then just print the strSentence without SHAM

if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.

the idea of mine is right i think but the code may not be that flawless.

2
  • Seems still you are not getting this query what exactly i need. here i mentioned as an example with a sentence but dynamically i want split just after the delimiter(/) and checking the uppercase condition etc. Please ready my question from beginning and go through both scenarios and reply.
    – venkat
    Commented Mar 14, 2011 at 8:29
  • i did i still believe what i wrote is a very easy way to do that also Paolo's code must be working i think
    – Bastardo
    Commented Mar 14, 2011 at 10:04
3

You can use a combination of the string.Split() method and the Regex class. A simple Split is suitable for simple cases, such as splitting according to the character /. Regular expressions are perfect for matching more complicated patterns.

1
  • @sukumar are you only looking for LINQ based approach?? If you're looking split and regex based technique i can help you!
    – Venki
    Commented Mar 15, 2011 at 21:45
3

As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile

    string strValue  = "this is test A/ABC";
    var s1=new string(
        strValue
        .TakeWhile(c => c!= '/')
        .ToArray());
    var s2=new string(
        strValue
        .SkipWhile(c => c!= '/')
        .Skip(1)
        .ToArray());

I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq

1
  • I updated the question in better understanding with two scenarios. Please provide solution for the query.
    – venkat
    Commented Mar 11, 2011 at 14:57

Not the answer you're looking for? Browse other questions tagged or ask your own question.