4

I have to truncate all loginIDs from a 17k file. How do I delete all text except for what's between two strings?

EG:

<USER_LOGIN_ID>user1</USER_LOGIN_ID>

<USER_LOGIN_ID>user2</USER_LOGIN_ID>

<USER_LOGIN_ID>user3</USER_LOGIN_ID>

<USER_LOGIN_ID>user4</USER_LOGIN_ID>

would leave

user1

user2

user3

user4

2 Answers 2

1

This is a slightly complex regular expression:

Find: <USER_LOGIN_ID>([^<]*)</USER_LOGIN_ID> Replace: $1

Here you are matching <USER_LOGIN_ID>, followed by any number of characters which are not <, followed by </USER_LOGIN_ID>. The brackets () mark the central text as a field, and $1 in the replacement string expands to this matched field only.

For the instance you show of only one match per line, you can use the slightly simpler find string <USER_LOGIN_ID>(.*)</USER_LOGIN_ID>, but this will fail if there are two log-ins per line.

If there are several log-ins on a single line, the first find will concatenate the strings unless there is intervening punctuation in the source text (if not you'll have to add it to the replacement string, eg $1 - with following space).

You will of course need to mark regular expression (and probably match case) in the options.

0

Search for:

<USER_LOGIN_ID>(.*?)</USER_LOGIN_ID>

Replace with:

\1

The expression .*? represents the shortest possible text that falls between the two specified tags. (The question mark makes the matching non-greedy.)

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .