Questions tagged [sse]

Ask Question

SSE (Streaming SIMD Extensions) was the first of many similarly-named vector extensions to the x86 instruction set. At this point, SSE more often a catch-all for x86 vector instructions in general, and not a reference to SSE without SSE2, SSE3, etc. (For Server-Sent Events use [server-sent-events] tag instead)

2,364 questions

0 votes

1 answer

21 views

Why CSAPP say Gcc do not use vcvtss2sd?

Computer Systems: A Programmer's Perpective (3rd), in section 3.11.1, say "Suppose the low-order 4 bytes of %xmm0 hold a single-precision value; then it would seem straightforward to use the ...

TouXianGuan

asked yesterday

3 votes

1 answer

85 views

Twice as slow SIMD performance without extra copy

I've been optimizing some code, and stumbled across some peculiar case. Here are the two assembly codes: ; FAST lea rcx,[rsp+50h] call qword ptr [Random_get_float3] ;this function ...

Alex

asked 2 days ago

0 votes

0 answers

16 views

How to implement real-time responses in a Flask-based chatbot with OpenAI Assistants API?

I have a basic chatbot that currently waits for the backend to fully process and generate a response before displaying it to the user. During this wait, the user sees a "Typing..." message. ...

Josh

asked Jul 17 at 18:24

1 vote

0 answers

95 views

Speed-up byte signature scanning in memory using SIMD

I'm searching for various byte patterns in big memory chunks using this code: BOOLEAN Find(const unsigned char* data, SIZE_T data_size, const unsigned char* to_find, SIZE_T to_find_size, SIZE_T* index)...

Kracken

asked Jul 14 at 12:23

8 votes

0 answers

150 views

Why does removing instructions from my SSE intrinsic function make it slower?

Please note that this question is not about YUV422 to RGB conversion! I have this code for a pixel order YUV422 to RGB conversion. static void yuv422ToRGB(unsigned char* img, int width, int height, ...

Crigges

1,233

asked Jun 20 at 15:03

0 votes

0 answers

69 views

GCC generates slow code when targeting more recent sse version

I have very simple test program like below. Just sum all uint8 values in array. GCC seems to generate significantly slower code when targeting sse4 or avx2. Code is significantly faster with ssse3. Is ...

AdamF

2,561

asked May 29 at 11:14

1 vote

0 answers

97 views

How to compile with RAD Studio C++ Builder 12 BCC64 using AVX, SSE, F16C extensions?

I'm just compiling some plain C code under C++ Builder 12 for x64 (the compiler called from the IDE is BCC64.EXE) and when I enable some macros in third party headers related to CPU extensions like ...

rafastar

asked May 27 at 16:44

1 vote

0 answers

84 views

save xmm registers in windows kernel

I am working on a Windows kernel-mode driver and needed to perform floating-point operations using the xmm registers (xmm0, xmm1, and xmm2) To avoid interfering with the kernel or other drivers state, ...

daniel

asked May 26 at 21:57

3 votes

2 answers

136 views

Zero remaining Bytes after first Zero in SSE Register

For this question, I will use the notation 1 for a byte with all ones (0xFF) and 0 for a byte with all zeros. I am looking for a way to zero the remaining bytes in a SSE register after the first zero ...

Crigges

1,233

asked May 24 at 6:54

2 votes

0 answers

56 views

Custom kernel: Stack unaligned, fault on compiler-generated SSE movaps [duplicate]

I'm seeing a weird problem with my kernel where XMM instructions fail as RSP 16 byte alignment constraint is unmet. The function frame starts with an aligned value but as it makes space for the buffer,...

Tretorn

asked Apr 26 at 9:59

1 vote

0 answers

22 views

How to identify the proportion of frequency reduction of a process caused by AVX instructions?

Different types of AVX instructions can cause a decrease in CPU frequency[1]. The proportion of this decrease can be evaluated through the PMU events called `CORE_POWER.LVL0/1/2_TURBO_LICENS. However, ...

Frontier_Setter

asked Apr 24 at 8:42

0 votes

0 answers

85 views

compiler generated assembler

A question about compiler generated assembler: My to-be-optimized main loop includes two memory accesses instead of register. loop: mov xmm, mem // pre-calculated value pushed on the stack pxor xmm, ...

linuxCowboy

asked Apr 23 at 20:59

2 votes

1 answer

88 views

Is there anything more I need to do before using SSE instructions?

I attempted to use an SSE instruction after I enabled the CR4 register bit 18(OSXSAVE) and xsetbv, but it is not working. The CPU has triggered the INT 0x6 interrupt(#UD). Is it because I didn't do ...

sanzenyou

asked Apr 21 at 10:18

0 votes

1 answer

62 views

Set Last Value in __m128 vector register

So I have a set of data with mixed values for packing purposes that goes like this: {(Point_x, Point_y, Point_z, Scalar), (Point_x, Point_y, Point_z, Scalar), (Point_x, Point_y, Point_z, Scalar), ......

yosmo78

asked Apr 21 at 0:03

0 votes

0 answers

33 views

Vector by Scalar Division with -ffast-math

typedef float float4 __attribute__((vector_size(16))); float4 divvs(float4 vector, float scalar) { return vector / scalar; } compiles to // x86 gcc/clang -O3 shufps xmm1, xmm1, 0 divps ...

bockyboh

asked Mar 30 at 21:47

15 30 50 per page

2 3 4 5

…

158 Next

Collectives™ on Stack Overflow

Questions tagged [sse]

Why CSAPP say Gcc do not use vcvtss2sd?

Twice as slow SIMD performance without extra copy

How to implement real-time responses in a Flask-based chatbot with OpenAI Assistants API?

Speed-up byte signature scanning in memory using SIMD

Why does removing instructions from my SSE intrinsic function make it slower?

GCC generates slow code when targeting more recent sse version

How to compile with RAD Studio C++ Builder 12 BCC64 using AVX, SSE, F16C extensions?

save xmm registers in windows kernel

Zero remaining Bytes after first Zero in SSE Register

Custom kernel: Stack unaligned, fault on compiler-generated SSE movaps [duplicate]

How to identify the proportion of frequency reduction of a process caused by AVX instructions?

compiler generated assembler

Is there anything more I need to do before using SSE instructions?

Set Last Value in __m128 vector register

Vector by Scalar Division with -ffast-math

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [sse]

Related Tags