Split the string with different conditions using Linq in C#

Question

I need to extract and remove a word from a string. The word should be upper-case, and following one of the delimiters /, ;, (, - or a space.

Some Examples:

"this is test A/ABC"
Expected output: "this is test A" and "ABC"
"this is a test; ABC/XYZ"
Expected output: "this is a test; ABC" and "XYZ"
"This TASK is assigned to ANIL/SHAM in our project"
Expected output: "This TASK is assigned to ANIL in our project" and "SHAM"
"This TASK is assigned to ANIL/SHAM in OUR project"
Expected output: "This TASK is assigned to ANIL/SHAM in project" and "OUR"
"this is test AWN.A"
Expected output: "this is test" and "AWN.A"
"XETRA-DAX" Expected output: "XETRA" and "DAX"
"FTSE-100" Expected output: "-100" and "FTSE"
"ATHEX" Expected output: "" and "ATHEX"
"Euro-Stoxx-50" Expected output: "Euro-Stoxx-50" and ""

How can I achieve that?

sukumar - I've went ahead and edited the question for you, trying to keep only what's relevant (Linq isn't relevant, nor is the blurb about intelligence and optimization), I narrowed it down to the heart of the question. If you don't like it you can (and should) rollback, but I suggest you edit it extensively. Good luck. — Kobi, Commented Mar 15, 2011 at 6:48
AWN.A raises a new issue - you are no longer dealing with uppercase words... Can you please explain what sorts of characters are you trying to extract? — Kobi, Commented Mar 15, 2011 at 7:38
@DanielHilgarth: Homework or work-work it doesn't really matter. You are assigned one to learn and you are assigned the other to get paid; either way you are assigned something and are expected to complete it. — user1228, Commented Mar 15, 2011 at 14:31

xanatos · Accepted Answer · 2011-03-15 14:04:58Z

An "intelligent" version:

    string strValue = "this is test A/ABC";
    int ix = strValue.LastIndexOfAny(new[] { '/', ' ', ';', '(', '-' });
    var str1 = strValue.Substring(0, ix);
    var str2 = strValue.Substring(ix + 1);

A "stupid LINQ" version:

    var str3 = new string(strValue.Reverse().SkipWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Skip(1).Reverse().ToArray());
    var str4 = new string(strValue.Reverse().TakeWhile(p => p != '/' && p != ' ' && p != ';' && p != '(' && p != '-').Reverse().ToArray());

both cases are WITHOUT checks. The OP can add checks if he wants them.

For the second question, using LINQ is REALLY too much difficult. With a Regex it's "easily doable".

var regex = new Regex("^(.*[A-Z]+)([-/ ;(]+)([A-Z]+)(.*?)$");

var strValueWithout = regex.Replace(strValue, "$1$4");
var extractedPart = regex.Replace(strValue, "$3");

For the third question

var regex = new Regex("^(.*?)([A-Z.]*)([-/ ;(]+)([A-Z.]+)(.*?)$", RegexOptions.RightToLeft);

var strValueWithout = regex.Replace(strValue, "$1$2$5");
var extractedPart = regex.Replace(strValue, "$4");

With code sample: http://ideone.com/5OSs0

Another update (it's becoming BORING)

Regex Regex = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>[A-Z.]+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);
Regex Regex2 = new Regex(@"^(?<1>.*?)(?<2>[-/ ;(]*)(?<=\b)(?<3>(?:\p{Lu}|\.)+)(?=\b)(?<4>.*?)$|^(?<1>.*)$", RegexOptions.RightToLeft);

var str1 = Regex.Replace(str, "$1$4");
var str2 = Regex.Replace(str, "$3");

The difference between the two is that the first will use A-Z as upper case characters, the second one will use other "upper case" characters, for example ÀÈÉÌÒÙ

With code sample: http://ideone.com/FqcmY

Lets consider another string as below: string strSentence ="This TASK is assigned to ANIL/SHAM in our project"; I need two sub-strings as follows: 1) This TASK is assigned to ANIL in our project 2) SHAM Does the above code gives the solution(Split at Uppercase words with delimiter) which i expects as mentioned? — venkat, Commented Mar 11, 2011 at 14:10
@sukumar You should learn to pose better questions, in all your post you din't mention you needed to split AND extract "couples" of uppercase words. — xanatos, Commented Mar 11, 2011 at 14:14
@sukumar Perhaps you have problems with your computer, because I just retested them with your examples and they work correctly. And if you want to check it online: ideone.com/1yY3y — xanatos, Commented Mar 14, 2011 at 10:58
+1 for a comrade in battle. I suspect you can greatly simplify it for the new requirements, something like ^(.*)[-/ ;(]([A-Z.]+)(.*)$ should be enough - it seems you can simply ignore the word before the separator. — Kobi, Commented Mar 15, 2011 at 7:51
@xanatos. +1 for going all out on this one. you deserve a badge or something for all the effort you've put in. — Todd Main, Commented Mar 19, 2011 at 15:43

Kobi · Accepted Answer · 2011-03-15 07:47:06Z

6

This should work according to the new requirements: it should find the last separator that is wrapped with uppercase words:

Match lastSeparator = Regex.Match(strExample,
                                  @"(?<=\b\p{Lu}+)[-/ ;(](\p{Lu}+)\b",
                                  RegexOptions.RightToLeft); // last match
string main = lastSeparator.Result("$`$'");  // before and after the match
string word = lastSeparator.Groups[1].Value; // word after the separator

This regex is a little tricky. Main tricks:

Use RegexOptions.RightToLeft to find the last match.
Use of Match.Result for a replace.
$`$' as replacement string: http://www.regular-expressions.info/refreplace.html
\p{Lu} for upper-case letters, you can change that to [A-Z] if your more comfortable with that.

If the word shouldn't follow an upper case word, you can simplify the regex to:
```
@"[-/ ;(](\p{Lu}+)\b"  
```
If you want other characters as well, you can use a character class (and maybe remove \b). For example:
```
@"[-/ ;(]([\p{Lu}.,]+)"
```

Working example: http://ideone.com/U9AdK

edited Mar 15, 2011 at 7:47

answered Mar 11, 2011 at 17:42

Kobi

137k41 gold badges255 silver badges297 bronze badges

This code doesn't give the expected result as mentioned with both scenarios(scenario 1 and 2)
– venkat
Commented Mar 14, 2011 at 8:22
@sukumar - How is that? I renamed a variable so the code would compile. Aside from that, it looks right according to both examples: ideone.com/nVs2a . What am I missing? Can you please edit the question and explain?
– Kobi
Commented Mar 14, 2011 at 8:31
For this string, string strSentence ="This TASK is assigned to ANIL/SHAM in OUR project"; The output should be in two strings as below. This TASK is assigned to ANIL/SHAM in project as one string and OUR as another string
– venkat
Commented Mar 15, 2011 at 6:13
Here the last capitalized word is taken into consideration while split from the whole string. This is also a condition to check and require the split appropriately. The above code doesn't give the result as i mentioned which i expected. in case if there is no uppercase words after 'SHAM' then only SHAM part would be separated. Hope you got it now. Provide me the code which gives the output by considering this condition as well.
– venkat
Commented Mar 15, 2011 at 6:15
1

@sukumar - That's because . isn't an upper case character (which is a new requirement again ಠ_ಠ ). You can specify the characters you need in the regex, for example, @"[-/ ;(]([\p{Lu}.,]+)", but then it effect \b. What if you have hello CRUEL... world, or hello CRUEL...DARK, world?
– Kobi
Commented Mar 15, 2011 at 7:21

| Show 7 more comments

Bastardo · Accepted Answer · 2011-05-23 09:07:47Z

4

+25

use a List of strings, set all the words to it

find the index of the / then use ElementAt() to determine the word to split which is "SHAM" in your question.

in the below sentence of yours your index of / will be 6.

string strSentence ="This TASK is assigned to ANIL/SHAM in our project";

then use ElementAt(6) at the end of

index is the index of the / in your List<string>

str = str.Select(s => strSentence.ElementAt(index+1)).ToList();

this will return you the SHAM

str = str.Delete(s => strSentence.ElementAt(index+1));

this will delete the SHAM then just print the strSentence without SHAM

if you dont want to use a list of strings you can use the " " to determinate the words in your sentence i think, but that would be a long way to go.

the idea of mine is right i think but the code may not be that flawless.

edited May 23, 2011 at 9:07

answered Mar 14, 2011 at 8:20

Bastardo

4,1529 gold badges42 silver badges60 bronze badges

Seems still you are not getting this query what exactly i need. here i mentioned as an example with a sentence but dynamically i want split just after the delimiter(/) and checking the uppercase condition etc. Please ready my question from beginning and go through both scenarios and reply.
– venkat
Commented Mar 14, 2011 at 8:29
i did i still believe what i wrote is a very easy way to do that also Paolo's code must be working i think
– Bastardo
Commented Mar 14, 2011 at 10:04

Add a comment |

escargot agile · Accepted Answer · 2011-03-11 13:44:37Z

3

You can use a combination of the string.Split() method and the Regex class. A simple Split is suitable for simple cases, such as splitting according to the character /. Regular expressions are perfect for matching more complicated patterns.

answered Mar 11, 2011 at 13:44

escargot agile

22.3k15 gold badges86 silver badges143 bronze badges

@sukumar are you only looking for LINQ based approach?? If you're looking split and regex based technique i can help you!
– Venki
Commented Mar 15, 2011 at 21:45

Add a comment |

Paolo Falabella · Accepted Answer · 2011-03-11 14:06:13Z

3

As a proof of concept, you could re-implement Split in LINQ using TakeWhile and SkipWhile

    string strValue  = "this is test A/ABC";
    var s1=new string(
        strValue
        .TakeWhile(c => c!= '/')
        .ToArray());
    var s2=new string(
        strValue
        .SkipWhile(c => c!= '/')
        .Skip(1)
        .ToArray());

I think the resulting code is so mind-blowingly ugly that I hope you'll decide not to use linq

answered Mar 11, 2011 at 14:06

Paolo Falabella

25.6k3 gold badges77 silver badges90 bronze badges

I updated the question in better understanding with two scenarios. Please provide solution for the query.
– venkat
Commented Mar 11, 2011 at 14:57

Add a comment |

Collectives™ on Stack Overflow

Split the string with different conditions using Linq in C#

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
c#
linq
c#-4.0
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Not the answer you're looking for? Browse other questions tagged c#linqc#-4.0 or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
c#
linq
c#-4.0
or ask your own question.