Would that mean that unparallel process will run slower on the same clock
cycle CPU with 4 core 8 thread CPU than on 4 core 4 threadCPU because it
is using only half of the core?
Sort of, yes, and no.
If I run 8 undependend single treaded programs on a a 8 core (8 tread) CPU then each core will run one tread and these will run at max speed (e.g. assign all the time, ignoring things like OS which also wants some CPU time).
If I run 8 undependend single treaded programs on a a 4 core (4 tread) CPU then on avarage each core will run two of these. The program will run half as fast.
So far no surprises.
Now with a 4 core (8 tread) CPU the OS thinks that there are 8 cores. It will treads them as the first case. However this is not really the case; half of the cores are not build as regular cores. Typically only part of the functionality is duplicated and if you have bad luck then one of the treads will stall. It will not be any faster then a 4c/4t CPU.
However if you are very lucky (e.g. ALU's are doubled and you are alternatiting between treats fetching information from memory and adding) then both can run at full speed.
On average, this leads to a 30% speed increase.
To make this even a bit more complex: If you programs use large datasets then running more than 4 of them might result in fewer cache hits. The last can really slow things down.
Is there a programmatic way (say OS level) to set CPU to only 1 thread
per core?
Yes, turn of hyper treading.
You can do this in the firmware (e.g. in BIOS or in UEFI), or from the OS.
E.g. for a 8t 4core CPU with cores 0 1 2 3 being the first cores and 4 5 6 7 being the treaded set you could use:
echo 0 > /sys/devices/system/cpu/cpu4/online
echo 0 > /sys/devices/system/cpu/cpu5/online
echo 0 > /sys/devices/system/cpu/cpu6/online
echo 0 > /sys/devices/system/cpu/cpu7/online
IIRC FreeBSD did the same with a syscontrol. For OSX or windows you would need to google around.