7
\$\begingroup\$

You work for a social media platform, and are told to create a program in a language of your choice that will automatically flag certain post titles as "spam".

Your program must take the title as a string as input and output a truthy value if the title is spam, and a falsey value if not.

To qualify as non-spam, a title must conform to the following rules, otherwise it is spam:

  • A title can only contain spaces and the following characters: a-z, A-Z, 0-9, -, _, ., ,, ?, !
  • A title cannot have more than one Capital Letter per word
  • A title cannot have more than one exclamation mark or question mark
  • A title cannot have more than three full-stops (.)
  • A title cannot have more than one comma (,)

Test cases:

Input: How On eaRth diD tHis happeN
Output: False

Input: How on earth did this happen 🔊
Output: True

Input: How ON earth did this happen
Output: True

Input: How on earth did this happen??
Output: True

Input: How on earth did this happen?!
Output: True

Input: How on earth did this happen!!
Output: True

Input: How! on! earth! did! this! happen!
Output: True

Input: How on earth did this happen! !
Output: True

Input: How on earth did this happen?
Output: False

Input: How on earth did this happen!
Output: False

Input: How on earth did this happen...
Output: False

Input: How.on.earth.did.this.happen
Output: True

Input: How.on.earth.did this happen
Output: False

Input: How, on, earth did this happen
Output: True

Input: How, on earth did this happen
Output: False

Input: How_on_earth_did_this_happen
Output: False

Input: How-on-earth-did-this-happen
Output: False

Input: How on earth did (this) happen
Output: True

Input: How on earth did "this" happen
Output: True

Input: How on earth did 'this' happen
Output: True

Input: How on earth did *this* happen
Output: True

Input: How on earth did [this] happen
Output: True

FAQ

Q: What is a valid title?

A: Single-character titles are valid. Your program has to validate a string that is not completely whitespace against the bullet-point rules above.

Q: What is a word?

A: Split the title string by a space. Each item in the resulting array is considered a word.


This is , so the shortest answer in bytes wins.

\$\endgroup\$
1
  • \$\begingroup\$ Suggest test case: 这到底怎么回事 Most regexp based answers which support Unicode would be confused if \w is used. \$\endgroup\$
    – tsh
    Commented Feb 10, 2022 at 7:43

9 Answers 9

6
\$\begingroup\$

JavaScript (Node.js), 72 71 68 64 bytes

-1 byte thanks to @ThisFieldIsRequired

-3 bytes inspired by @Neil's Retina answer

-2 bytes thanks to @emanresuA (see @Neil's answer)

-2 bytes by modifying final test slightly

m=>/[^\w-.,?! ]|[A-Z]\S*[A-Z]|[!?].*[!?]|,.*,|(\..*){4}/.test(m)

Try it online!

Regex abuse FTW.

Here's how it works:

  • [^\w-.,?! ] matches any character that isn't allowed.
  • [A-Z]\S*[A-Z] matches two uppercase letters without a space in between, i.e. two capitals in a single word
  • [!?].*[!?] matches two exclamation/question marks with anything in between
  • ,.*, matches two commas with anything in between
  • (\..*){4} matches four periods with anything in between

If you put these together in a single regex as alternates, you get a spam filter that matches all criteria.

\$\endgroup\$
5
  • 1
    \$\begingroup\$ -1 byte by using test. \$\endgroup\$
    – ophact
    Commented Feb 7, 2022 at 16:38
  • \$\begingroup\$ I thought I had done it switching from search to match but you got even further down. Thanks for that! \$\endgroup\$
    – nununoisy
    Commented Feb 7, 2022 at 16:41
  • \$\begingroup\$ it doesn't save any space, but for fun you can change -., to ,-., because ASCII \$\endgroup\$
    – Dave
    Commented Feb 8, 2022 at 23:35
  • \$\begingroup\$ @Dave I tried it and it broke the code???? Try It Online! \$\endgroup\$
    – Aiden Chow
    Commented Feb 9, 2022 at 4:00
  • \$\begingroup\$ @AidenChow it needs to be ,-. not .-,. The order matters! Try to figure out why 😉 \$\endgroup\$
    – Dave
    Commented Feb 9, 2022 at 18:03
6
\$\begingroup\$

C (gcc), 251 219 215 199 197 193 189 184 181 bytes

-32 bytes thanks to @ceilingcat

-4 bytes by subtracting 43 from c

-16 bytes by moving comparisons inside ternaries

-2 bytes by removing unneeded brackets

-4 bytes by realising that ++x>1 is equivalent to x++

-4 bytes by rearranging outer ternary and adjusting subtracted amount

-5 bytes by moving checks inside loop condition

-3 bytes by using the n array for storing the output.

#define C(l,h)c>l&c<h?n[l]++
f(char*s){int c,n[64]={1};while((c=*s++)&&!(c-=41,c+9?*n=c+8&&c-22?C(55,82)<0:C(23,50):C(6,17)<0:C(4,6)>2:C(2,4):c-4&&c-54:n[1]++:(n[23]=0)));return*n;}

Try it online!

The C macro increments a counter for a character class. I used decimal literals instead of char literals. When a space is encountered the uppercase counter is reset.

\$\endgroup\$
1
  • \$\begingroup\$ Welcome to Code Golf, and nice answer! \$\endgroup\$ Commented Feb 8, 2022 at 4:43
5
\$\begingroup\$

Python, 177 171 bytes

This requires the re module, so that adds an additional 9 bytes.

lambda i:any([re.search('[^a-zA-Z0-9\-.\,\?\! ]',i),*[len(re.findall("[A-Z]",w))>1for w in i.split(" ")],len(re.findall('[?!]'))>1,i.count(".")>3,i.count(",")>1])

Attempt This Online!

\$\endgroup\$
12
  • \$\begingroup\$ Use r'[^\w -.,?!]' to save some bytes (note that the underscore is allowed). Also why do you count 177 bytes when the link says 170? \$\endgroup\$ Commented Feb 7, 2022 at 14:49
  • \$\begingroup\$ @ParclyTaxel because of import re \$\endgroup\$ Commented Feb 7, 2022 at 14:55
  • \$\begingroup\$ @BgilMidol but in that case wouldn't it be 179 bytes? \$\endgroup\$ Commented Feb 7, 2022 at 15:11
  • \$\begingroup\$ @ParclyTaxel I considered that, but the punctuation marks are counted seperately. With that regex, I could have, say, one comma, one question mark, and one exclamation and it would add up to be too much despite those all being under their respective limits. \$\endgroup\$
    – Ginger
    Commented Feb 7, 2022 at 15:52
  • \$\begingroup\$ I also fixed the link. \$\endgroup\$
    – Ginger
    Commented Feb 7, 2022 at 15:52
5
\$\begingroup\$

Retina 0.8.2, 55 53 51 bytes

[^-.,?!\w ]|[?!].*[?!]|,.*,|(\..*){4}|[A-Z]\S*[A-Z]

Try it online! Link includes test cases. Edit: Saved 2 bytes thanks to @Ausername and another 2 bytes thanks to @nununoisy. Simply reports on the number of banned patterns it finds, so for some spam the truthy value might be greater than 1; if this is undesirable, 1` can be prefixed to the program which limits the count to 1. Explanation:

[^-.,?!\w ]     Check for illegal characters.
[?!].*[?!]      Check for two question or exclamation marks.
,.*,            Check for two commas.
(\..*){4}       Check for four or more full stops.
[A-Z]\S*[A-Z]   Check for two uppercase letters in the same word.
\$\endgroup\$
6
  • \$\begingroup\$ Can [^ ] be \w? \$\endgroup\$
    – emanresu A
    Commented Feb 8, 2022 at 10:09
  • \$\begingroup\$ @emanresuA The question says words are only split by spaces, so for instance "O.K." would be an illegal word. \$\endgroup\$
    – Neil
    Commented Feb 8, 2022 at 10:10
  • \$\begingroup\$ Ok. What about \S? \$\endgroup\$
    – emanresu A
    Commented Feb 8, 2022 at 10:10
  • \$\begingroup\$ @emanresuA Yes, that works, thanks! \$\endgroup\$
    – Neil
    Commented Feb 8, 2022 at 10:14
  • \$\begingroup\$ -2 bytes by changing \.(.*\.){3} to (\..*){4}. \$\endgroup\$
    – nununoisy
    Commented Feb 8, 2022 at 16:14
4
\$\begingroup\$

05AB1E, 38 37 36 bytes

žj…,!?©… -.JÃÊ·IS#.uOI®S¢ā£OI'.¢;M2@

Try it online or verify all test cases.

Explanation:

žj              # "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_"
  …,!?          # Push string ",!?"
      ©         # Store it in variable `®` (without popping)
       … -.     # Push string " -."
           J    # Join all three strings on the stack together
            Ã   # Only keep those characters from the (implicit) input
             Ê  # Check if it's now NOT equal to the (implicit) input
              · # Double it (2 if truthy; 0 if falsey)
I               # Push the input
 S              # Convert it to a list of characters
  #             # Split it on spaces
   .u           # Check for each character if it's an uppercased letter
     O          # Sum those checks for each word
 ®S             # Push [",","!","?"] (variable `®` as list of characters)
I  ¢            # Count these characters in the input
    ā           # Push a list in the range [1,length] (without popping): [1,2,3]
     £          # Split the counts into those parts: [[a],[b,c],[]]
                # (a=count of ","; b=count of "!"; c=count of "?")     
      O         # Sum each inner list: [a,b+c,0]
I               # Push the input yet again
 '.¢           '# Count the amount of "." in the input
    ;           # Halve it
M               # Push the largest number of the stack (including within lists)
 2@             # Check if this max is ≥2
                # (after which it is output implicitly as result)
\$\endgroup\$
4
\$\begingroup\$

Charcoal, 52 bytes

��⟦⊙θ¬№⁺ !,-.?_⭆⁶²⍘λφι‹¹⁺№θ!№θ?‹¹№θ,‹³№θ.⊙⪪θ ‹¹LΦι№αλ

Try it online! Link is to verbose version of code. Outputs a Charcoal boolean, i.e. - for spam, nothing if not. Explanation:

⌈⟦

Output if any of the following is true.

⊙θ¬№⁺ !,-.?_⭆⁶²⍘λφι

Check whether any characters aren't contained in the permitted list including the 62 alphanumeric characters used by default for base conversion.

‹¹⁺№θ!№θ?

Check whether there is more than one exclamation or question mark.

‹¹№θ,

Check whether there is more than one comma.

‹³№θ.

Check whether there are more than three full stops.

⊙⪪θ ‹¹LΦι№αλ

Check whether any word has more than one upper case letter.

\$\endgroup\$
3
\$\begingroup\$

Burlesque (no RegEx), 103 88 81 76 74 bytes

Jwd{qsnfl2.<}aljJbc".,?!"qCNZ]^p.+1<=j1<=&&j3<=&&jNBqrifn" -_.,?!"\\z?&&&&

Try it online!

Almost certainly a shorter answer possible. This is pretty brute force.

Jwd{qsnfl2.<}alj  # Check double caps

J       # Duplicate input
wd      # Split into words
{      
 qsn    # isUpper
 fl     # filter length
 2.<    # <2
}       #
al      # All
j       # Swap input to top of stack

Jbc".,?!"qCNZ]^p.+1<=j1<=&&j3<=&&j  # Check char counts

J       # Duplicate input
bc      # Box and repeat infinitely
".,?!"  # String
qCN     # Count occurences
Z]      # Zip with count (return list of counts for each)
^p      # Push list to stack
.+      # Add ?s and !s
1<=j    # ?+! <= 1
1<=&&j  # , <= 1
3<=&&   # . <= 3

NBqrifn" -_.,?!"\\z?  # Check valid chars

NB          # Remove duplicates
qri         # Quoted is alphanum
fn          # Filter not
" -_.,?!"   # Valid non-letter
\\          # List diff
z?          # Empty

&&&&        # Reduce all 3 by ands
\$\endgroup\$
2
\$\begingroup\$

JavaScript (Node.js), 132 bytes

n=>n.split` `.some(w=>/[^\w-.,?!]/.test(w)+(F=C=>w.split(C).length-1,c+=F`.`,d+=F(/[?!]/),e+=F`,`,F(/[A-Z]/)>1),c=d=e=0)|c>3|d>1|e>1

Try it online!

If you want to be completely sure that the answer works, add a backslash before the dash in the first regular expression. The code above passes all test cases, but a comment on the python answer seems to indicate that there should be a backslash or space before the dash. If anyone could confirm or disprove the statement above, it would be very helpful.

\$\endgroup\$
1
  • \$\begingroup\$ The dash is usually used for a range, however in an ECMAScript regex you can't make a character class part of a range, so it gets treated as a dash only if it follows the \w or is at the end of the group. \$\endgroup\$
    – nununoisy
    Commented Feb 7, 2022 at 16:36
2
\$\begingroup\$

Vyxal, 62 58 57 56 bytes

`[^\w.,?! -]`ẎL‛?!øB?ẎL1>?⌈ƛ`[A-Z]`nẎL1>;a?\.O3>?\,O1>Wa

Try it Online!

My first Vyxal answer, and I'm loving this language. So much more intuitive than Jelly. 99% sure this can be golfed more.

Explanation:

`[^\w.,?! -]`?ẎL‛?!øB?ẎL1>?⌈ƛ`[A-Z]`nẎL1>;a?\.O3>?\,O1>Wa ; Takes the word as input
`[^\w.,?! -]`ẎL                                          ; Length of any matched of illegal characters (0 if no matches)
               ‛?!                                       ; The string '?!'
                  øB                                     ; Bracketify: converts '?!' to '[?!]'
                    ?ẎL                                  ; Find all '?' and '!' and count them
                       1>                                ; More than 1?
                         ?⌈                              ; Split the input on spaces
                           ƛ            ;                ; Mapping lambda: maps all the words using the following criteria
                            `[A-Z]`nẎL                   ; How many capital letters in the word?
                                      1>                 ; More than 1?
                                         a               ; Any truthy? (i.e. any words with more than 1 capital letter?)
                                          ?\.O           ; Count full stops in string
                                              3>         ; More than 3?
                                                ?\,O     ; Count commas in string
                                                    1>   ; More than 1?
                                                      W  ; Turn the stack into a list
                                                       a ; Any truthy? (i.e. are any of the conditions true?)

Vyxal, 56 bytes

`[^\w-.,?! ]|[A-Z]\S*[A-Z]|[!?].*[!?]|,.*,|(\..*){4}`?ẎL

Try it Online!

A different version based off of the regex Node.js answer

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.