-1

I can't understand one fact: Source code is text, it gets translated to assembly language(roughly), but if I can see the assembly language that means it is text as well, same story with machine code. What is the moment where this "text" becomes signals in CPU.

3
  • 1
    The question probably belongs in the softwareengineering.stackexchange.com stack exchnage Commented Jun 14, 2023 at 13:31
  • 1
    @Mokubai That depends on the compiler. Some compilers output machine code. Some output assembly. Some output another high-level language (though those are often called transpilers).
    – 8bittree
    Commented Jun 14, 2023 at 17:17
  • everything is comprised from bits. There's no text at the machine level
    – phuclv
    Commented Jun 15, 2023 at 1:28

5 Answers 5

3

Of course it's all text, the difference is level of abstraction. Hello World in assembly vs C

SECTION .data           ; initialised data section

Msg: db "hello world", 10           ; message to print
MsgLen: equ $ - Msg                 ; length of message


SECTION .text           ; code section

global start
start:

    ; printing message, use write()
    ; system call 4 syntax: 
    ; user_ssize_t write(int fd, user_addr_t cbuf, user_size_t nbyte)

push dword MsgLen   
push dword Msg      
push dword 1        
sub esp, 4          
mov eax, 4          
int 80H             
add esp, 16         

; program exit, use sys_exit()
push dword 0        
sub esp, 4          
mov eax, 1         
int 80H     

Versus

#include <stdio.h>

int main(void)
{
    printf("Hello, World!\n");
}

IOW, there is no such thing as a print to screen command and yet C allows us to use it. It means PRINT in fact represents a bunch of lower level code that a CPU can handle. Assembly is closer to instructions a CPU can handle.

We can not tell it to PRINT to screen, we have to tell it how to do it step-by-step. Something like C is closer to human language but it requires the commands to be translated to something a CPU can actually understand. Assembly is the more direct way to tell a CPU what to do, machine language represented in some more textual form.

It's the same levels of low level to high level representation of data we see everywhere in computers. For example, "HELLO" (ascii) represented as HEX "48 45 4C 4C 4F" and binary "01001000 01000101 01001100 01001100 01001111". In general how closer to "computer language" the less readable for us human beings.

Even a level lower the zeros and ones become encoded pulses or electric signals and charges. Simplified, and some abstraction layers aside "HELLO", to be written to for example an SSD is "01001000 01000101 01001100 01001100 01001111" > that is converted to electrical charges to program individual NAND cells. Yes, 0100 etc. is readable text, but it's an interpretation/representation of analog signals.

2

Everything is text, and text is encoded binary... So whatever you see (whether source code, machine code, a music file, or anything in between) is always actually 1s and 0s (On and Off).

The Machine Code is simply the instructions in a language the CPU is created to read (the CPU doesn't read it hex or whatever, it knows to read the 1s and 0s and use each set of binary data to make out the instructions what to do.)

9
  • to be a little more clear, this answer is saying that the opposite of the question is the true way to look at it. In the spirit of the question about levels and meta levels, everything is in a machine form and is then translated and presented to the user as text.
    – Yorik
    Commented Jun 14, 2023 at 14:30
  • True in a sense. But the question is also more specifically how the CPU reads the machine code and the answer to it is that the CPU is built to follow the few instruction types that the machine code has (and the machine code obviously has it/sends it in binary, and each binary sequence - 32 bit seq or 64 bit sequence ... hence the architecture of the CPU - "means" a set of instructions Commented Jun 14, 2023 at 14:34
  • The question is that if we see the text it must be stored as ascii format or whatever. But how does it magically go back to raw, pure machine code once executed???
    – qukert
    Commented Jun 14, 2023 at 15:02
  • To explain better: i think that 'one' and 'zero' as machine code are not stored the same way the ones and zeroes as text are stored
    – qukert
    Commented Jun 14, 2023 at 15:05
  • 2
    I think calling them numbers or 0s and 1s is wrong also. The CPU uses logic gates, a physical thing with low voltage state and high voltage state. A macro-world example might be a gang of light switches. Calling them numbers is also an abstraction.
    – Yorik
    Commented Jun 14, 2023 at 18:19
1

Source code is text,

True, but "source code" is ambiguous. There are high-level programming languages, and then there is assembly language. If you're writing in assembly language, then that assembly language is your source code.

it gets translated to assembly language(roughly), but if I can see the assembly language that means it is text as well, same story with machine code.

Yes, compilers typically convert a high-level programming language to assembly language, which in turn has to be converted to machine code.

Do not conflate assembly language with machine code.
Assembly language has symbolic labels and variable names.
Machine code only uses numeric addresses.
Try hand-writing a program in machine code (of just numbers or even opcode mnemonics), and you'll start to understand the actual difference from assembly language.

A distinguishing feature of a programing language (either high-level or assembly language) is to use statement labels and the ability to name a variable. A statement label allows you to insert or delete other statements, and you let the compiler (or assembler) deal with the address changes. The ability to name a variable means that you let the compiler allocate the storage location and deal with the address references.

What is the moment where this "text" becomes signals in CPU.

The "source code" needs to be compiled and assembled into machine code.

This machine code may exist as a ready-to-load-into-memory-and-execute program known as a binary executable. Such an executable file is typically a standalone program that does not require an OS (see Would an executable need an OS kernel to run?).

Machine code for execution under an OS is typically packaged in a file format that specifies dependencies such as a dynamic linker and shared libraries. The OS will verify permissions, allocate resources, load into memory the executable, perform address relocation and/or linking with shared libraries, and then allow program execution.

Machine code (in main memory) is composed of CPU instructions, and can be displayed as numbers.
A basic CPU continuously performs a cycle of

  • fetching the (next) instruction stored at the memory address in the Program Counter register;
  • increment the Program Counter register;
  • decoding that instruction into an opcode and operand(s);
  • executing that instruction.

Processor cache, instruction pipeline, predictive branching, et cetera are CPU enhancements that should not detract from understanding the basic operation of a digital computer

1

I find the other answers confusing.

Assembly is short for Assembly Language, that's what I first coded with, on the first commercial cpus. This is the text representation of machine code.

Machine code is what the assembly language gets stored as. This is binary, not text. On early cpus it was stored as octets. Now it's stored in multiple of bytes. It is called machine code because it's machine-readable, not human-readable. When you think that you are looking at it, it is generally converted to text as assembly language. Some people refer to the ASCII representation of this (in bits, bytes or words) as machine code. But it is NOT stored as text.

1

From the reply's questions comments:

To explain better: i think that 'one' and 'zero' as machine code are not stored the same way the ones and zeroes as text are stored

That is correct.

Let us create a file test containing the number 0 as ASCII text:

$ echo 0 > test
$ cat test
0

Now, let's look at the file test in binary representation:

$ xxd -b test | cut -d ' ' -f 2
00110000

This binary number can be convert to decimal. Looking it up in the ASCII table, this is the code for the text character 0.

$ python3 -c 'print(chr(48))'
0

Now, let's write a 0 to the file in binary (the way machine code is stored) and look at that:

$ dd if=/dev/zero bs=1 count=1 of=test
1+0 records in
1+0 records out
1 byte copied, 0.000234102 s, 4.3 kB/s
$ $ xxd -b test | cut -d ' ' -f 2
00000000

tl;dr text is stored using an encoding (such as ASCII) to map characters to series of binary digits.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .