0

I have used \b[A-Z]{2,4}\b to match acronyms using the match case feature, but I am just not able to delete any other text around the acronyms to create a list of those matched entries. Can we also use a stop word list to eliminate all capitalized words that are not acronyms, such as the word NOTE for example?

enter image description here

Please see example of the text below:

GUIDANCE NOTE:
Provision of transportation services to the UN RC and RCO staff
As part of the turn-key solutions, governed by the MoU signed between UN and UNDP on 21 December 2018, UNDP is required to provide transportation services (para 2.2. and 2.3., Annex 1) to the UN RCS offices as follows: which include the following services:
- Provision to the UN RC of 1 (one) vehicle on a full-time basis
- Provision to the UN RC office (RCO) of vehicle(s) on a part-time basis 
Para 2.3. of the MoU states that all services, whether “turnkey” or “pay-as-you-go”, should be provided by UNDP in accordance with UNDP rules, policies and procedures. Therefore, all transportation services provided to the UN RCS offices by UNDP and the use of UNDP vehicles by the UN RC offices will follow the rules and procedures outlined in UNDP Vehicle Management Policy. The Policy can be accessed through the link HERE.
UNDP Country Offices are advised to consider obtaining comprehensive insurance coverage for the full-time vehicle allocated to the UN RC in order to mitigate legal, financial and other risks.
Cost recovery methodology for vehicles provided to RCO on the full-time or part-time basis is provided below.

2 Answers 2

2
  • Ctrl+H
  • Find what: .*?((?!NOTE|HERE)\b[A-Z]{2,4}\b)(?:(?![A-Z]{2,4}).)*
  • Replace with: $1\n
  • CHECK Match case
  • CHECK Wrap around
  • CHECK Regular expression
  • CHECK . matches newline
  • Replace all

Explanation:

.*?                     # 0 or more any character, not greedy
(                       # start group 1
    (?!NOTE|HERE)       # negative lookahead, make sure we haven't NOTE or HERE after
                              # you can add other words pipe separated if needed
    \b                  # word boundary
    [A-Z]{2,4}          # 2 upto 4 uppercases
    \b                  # word boundary
)                       # end group 1
                    #Tempered greedy token
(?:                     # non capture group
    (?![A-Z]{2,4})      # negative lookahead, not 2 upto 4 uppercases
    .                   # any character
)*                      # end group, may apear 0 or more times

Replacement:

$1          # content of group 1 (i.e. the acronym)
\n          # linefeed, you can use \r\n for Windows linebreak

Screenshot (before):

enter image description here

Screenshot (after):

enter image description here

2
  • Great solution indeed.
    – Sam Mouha
    Commented Feb 19, 2020 at 18:16
  • @SamMouha: You're welcome, glad it helps, feel free to mark the answer as accepted, How to accept an answer
    – Toto
    Commented Feb 19, 2020 at 18:17
1

Alternatively, when you are interested in only unique matched values instead of a full list:

[\s\S]*?((?!NOTE|HERE)\b[A-Z]{2,4}\b)(?:(?![A-Z]{2,4}).)*(?![\s\S]*\b\1\b[\s\S]*)

Check here

Follow the steps give by @Toto here

Result - A list of unique acronyms:

enter image description here

2
  • Unique name was not asked, but I give you my vote.
    – Toto
    Commented Feb 20, 2020 at 10:48
  • @Toto, I know it wasn't asked. It complementary really as in my head it would make sense as a follow up =)
    – JvdV
    Commented Feb 20, 2020 at 10:55

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .