Well, this is a challenge... Let me try and build it up for you. You have heard of bits (ones and zeros). First, try not think of those as symbols or characters or even digits, think of them as states. Mapped to a technical contexts that means switches. A computer is a bunch of switches.
You cannot express much with single binary states (yes/no, on/off, that's it) so you need to combine them in groups. Like you need words in natural language to say something meaningful, you cannot say much with just the language's basic building blocks, the letters.
So in computers the words are bytes: groups of 8 bits. Why 8? There are technical and economical reasons for this we do not need to get into now.
You can tell there are 2^8 = 256 different combinations of 8 bits. That is plenty for a basic western alphabet. Now all you need is a convention to map letters to those bytes. An early such convention is ASCII. This defines only half of the space by the way (it uses just 7 of the 8 available bits) but that does not matter, what matters is we have a way to express characters as bytes.
A convention like ASCII is called an encoding.
What you see on the screen (the characters) are views. Pretty representations of letters. Try to keep thinking switches. The computer ultimately stores your text as switch states, be it in volatile memory or on disk or whatever medium. A storage medium is an addressable array of switches each of which can only be on or off.
Now, size. The size (the number of bytes needed) will depend on the encoding used.
You loosely use terms like binary format and plain text. Let's address those.
You speak of "binary formats". I am not sure what you mean by that. When I hear people say "it's a binary format" they typically mean they can't read it, when they open the file in a text editor they see just gibberish. The reading program does not know about the encoding. To "store something in binary" just means encode it in a way not obvious to the user but it could be anything.
Plain text means ASCII encoded text. Every byte represents one character, nothing more, nothing less.
We talked about text. Another important type of data is numbers. I do not mean the characters 0123456789, I mean the kind of numbers you can add up. Those are encoded different than text exactly because of the need to do calculations with them. The computer does not do calculations with the characters 0123456789.
Anyway, the important thing to realize is that everything that is stored on a computer is stored as bytes. What the individual bytes represent depends on the application. They can be (parts of) numbers, (parts of) characters or custom data like gender, color or a date. The term "binary" does not mean a lot in this context (ultimately everything in a computer is binary). It typically means "encoded in some unspecified way".
So, can binary be smaller than plain text? Yes! It can be bigger too, it all depends on your encoding. Suppose you have some chat bot that knows a number of sentences, like 10000. If you assign each sentence a number you only need two bytes to identify each, regardless how long each sentence is. So the "binary" version of a chat log of two bots talking to each other could be very small indeed.
I hope this all makes sense.