The fastest square root calculator

Question

Write a program which outputs the square root of a given number in the shortest time possible.

Rules

It may not use any builtins or powering with non-integers.

Input format

as a function argument
as a command line argument
as a direct input on stdin or a window

Your entry hasn't to be a complete program, it may be a function or a snippet.
It doesn't need to be exact; a one percent margin of error is allowed.
The test cases will only be positive integers, but your program should output/return floats.
It may return the square root as function return value, print it to stdout or write in a consistent file, which is readable by the user..
Each program should specify the compiler/interpreter it should be run on and compiler flags. If they aren't specified, I'll use the following implementations: (just talking about mainstream langs):

C/C++: gcc/g++ on Linux, no flags
Java: OpenJDK on Linux, no flags
Python: CPython
Ruby: MRI
Assembler: x86 Linux
JS: Node.js

Examples

Input    Output
4        2 (or something between 1.98 and 2.02)
64       8 (between 7.92 and 8.08)
10000    100 (between 99 and 101)
2        bewteen 1.40001 and 1.428

Scoring

Your programs on the same computer with same conditions (Terminal and Iceweasel with this tab open). This question has the tag "Fastest code", so the program, which calculates the square root of a (not yet given) random integer between 1 and 10^10 as fast as possible will win!

The bottleneck is going to be I/O, so for practical purposes this question doesn't have a winning criterion. — Peter Taylor, Commented Jul 17, 2016 at 12:31
You need to run it multiple times with multiple random integers otherwise the scoring is totally arbitrary. You should also have challengers specify a compiler. And for languages like C or C++ you need to specify which optimization flag to enable. — noɥʇʎԀʎzɐɹƆ, Commented Jul 17, 2016 at 17:10
I don't think you've thought this out fully - 10**10 is a 34-bit integer, and the ARMv7 processor that the Raspberry Pi 2B uses only supports 32-bit integers natively. — user45941, Commented Jul 17, 2016 at 17:30
I would vote to reopen if I could. We aren't used to fastest-code, but we have to take risks. — noɥʇʎԀʎzɐɹƆ, Commented Jul 17, 2016 at 17:33
And in order to be able to compare (>0ns) you should make it accept multiple inputs just so we can compare two programs. — noɥʇʎԀʎzɐɹƆ, Commented Jul 17, 2016 at 18:29

miles · Accepted Answer · 2016-07-17 13:42:05Z

9

C

Since input is limited to positive integers between 1 and 10¹⁰, I can use a well-known fast inverse square root algorithm to find the inverse square root of the reciprocal of the input.

I'm not sure what you mean by "only Xfce and the program and a terminal running" but since you stated that functions are acceptable, I provide a function in C that will take an integer argument (that will be casted to float automatically) and outputs a float as the result.

float f(float n) {
    n = 1.0f / n;
    long i;
    float x, y;

    x = n * 0.5f;
    y = n;
    i = *(long *)&y;
    i = 0x5f3759df - (i >> 1);
    y = *(float *)&i;
    y = y * (1.5f - (x * y * y));

    return y;
}

The following is a program that makes uses of the function above to calculate the square roots of the given test cases and compares them to the values given by the sqrtf function in math.h.

Ideone

edited Jul 17, 2016 at 13:42

answered Jul 17, 2016 at 12:39

miles

17k2 gold badges28 silver badges94 bronze badges

1

\$\begingroup\$ See this blog and/or this paper for an explanation of where this magic number comes from \$\endgroup\$
– someonewithpc
Commented Jul 17, 2016 at 15:36
\$\begingroup\$ 'only Xfce and the program and a terminal running' means there are no other processes started by me. \$\endgroup\$
– univalence
Commented Jul 17, 2016 at 16:50
\$\begingroup\$ Knew this algorithm and hoped that someone will post it! ;) \$\endgroup\$
– univalence
Commented Jul 17, 2016 at 17:02
\$\begingroup\$ You need to specify the compiler. \$\endgroup\$
– noɥʇʎԀʎzɐɹƆ
Commented Jul 17, 2016 at 17:11
1

\$\begingroup\$ Using a union for type-punning would be safe. Pointer-casting this way violates strict-aliasing. \$\endgroup\$
– Peter Cordes
Commented Dec 16, 2016 at 21:04

| Show 4 more comments

Leaky Nun · Accepted Answer · 2016-07-17 12:57:38Z

4

Python 3

Why am I using Python? Never mind.

def sqrt(n):
    # 3 iterations of newton's method, hard-coded
    # normalize

    # find highest bit
    highest = 1
    sqrt_highest = 1
    while highest < n:
        highest <<= 2
        sqrt_highest <<= 1

    n /= highest+0.0

    result = (n/4) + 1
    result = (result/2) + (n/(result*2))
    result = (result/2) + (n/(result*2))

    return result*sqrt_highest

Ideone it!

answered Jul 17, 2016 at 12:57

Leaky Nun

49.9k6 gold badges108 silver badges282 bronze badges

2

\$\begingroup\$ did you just use python for a fastest-code challenge? are you out of your mind? \$\endgroup\$
– noɥʇʎԀʎzɐɹƆ
Commented Jul 17, 2016 at 17:09
2

\$\begingroup\$ @AgentCrazyPython I've used Java, so, he's good. \$\endgroup\$
– Addison Crump
Commented Jul 17, 2016 at 20:10

Add a comment |

Community · Accepted Answer · 2017-04-13 12:39:07Z

3

C

Uses unsigned integers when possible for speed.

root.c:

#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define ns(t) (1000000000 * t.tv_sec + t.tv_nsec)

struct timespec then, now;

int output(uint64_t result)
{
    clock_gettime(CLOCK_REALTIME, &now);
  printf("sqrt(x) = %10lu.%2d    real time:%9ld ns\n", result / 100, ((int) (result % 100)), ns(now) - ns(then));
  return 0;
}

//real time:    60597 ns

int main(int argc, char *argv[]) {
    clock_gettime(CLOCK_REALTIME, &then);
    uint64_t num = atol(argv[1]), root = 100, old_root = 0;
    num *= 10000; //multiply by 10k because int is faster
    //and you only need 2 d.p. precision max. (for small numbers)
    while (old_root != root) {
        old_root = root;
        root = (root + num / root) / 2;
    }
    return output(root);
}

timein:

#!/bin/bash

gcc -Wall $1 -lm -o $2 $2.c

./$2 $4

r=$(seq 1 $3)

for i in $r; do ./$2 $4; done > times

awk -v it=$3 '{ sum += $7 } END { print "\n" sum / (it) " ns" }' times

rm times

Usage: ./timein '-march=native -O3' root <iterations> <number>

Inspired by this answer, uses this algorithm.

edited Apr 13, 2017 at 12:39

CommunityBot

1

answered Jul 17, 2016 at 12:45

ASCII-only

5,12921 silver badges43 bronze badges

\$\begingroup\$ Running on a Raspi, -march=native -03 doesn't work. Is this needed? \$\endgroup\$
– univalence
Commented Jul 17, 2016 at 16:53
\$\begingroup\$ @MegaMan I think you typed 'zero three' instead of 'oh three'. \$\endgroup\$
– noɥʇʎԀʎzɐɹƆ
Commented Jul 17, 2016 at 17:12
\$\begingroup\$ @AgentCrazyPython Yep, I did. \$\endgroup\$
– univalence
Commented Jul 17, 2016 at 17:13
\$\begingroup\$ @MegaMan did that fix the problem or what? \$\endgroup\$
– noɥʇʎԀʎzɐɹƆ
Commented Jul 17, 2016 at 17:17
\$\begingroup\$ @AgentCrazyPython Yep, works fine now \$\endgroup\$
– univalence
Commented Jul 17, 2016 at 17:19

| Show 3 more comments

jayprich · Accepted Answer · 2018-05-21 22:20:44Z

C (gcc)

demonstrates one Newton step after a linear approximation on mantissa speed of C is fast but not really relevant unless you have to apply it several million times then it is sensitive to overall context of SIMD vectorisation, memory access patterns & other branching causing pipeline stalls etc

float f(float x){int i=*(int*)&x;i=0x1fb90000+(i>>1);float y=*(float*)&i;return(y+x/y)/2;}
void main() {float y[4]={4,64,10000,2};
for(int j=0;j<4;j++)printf("f(%f)\t=\t%f\n",y[j],f(y[j]));}

TIO

sergiol · Accepted Answer · 2018-05-04 19:06:34Z

1

Adaptation of my own answer on Fast inverse square root

Tcl, 114 bytes

rename binary B
proc R n {B s [B f f $n] i i
B s [B f i [expr 0x5f3759df-($i>>1)]] f y
expr 1/($y*1.5-$n/2*$y**3)}

Try it online!

edited May 4, 2018 at 19:06

answered Apr 25, 2018 at 12:24

sergiol

3,4261 gold badge15 silver badges31 bronze badges

\$\begingroup\$ You don't need bytecount since this isn't code-golf \$\endgroup\$
– ASCII-only
Commented Apr 26, 2018 at 7:53

Add a comment |

Stack Exchange Network

The fastest square root calculator

Rules

Input format

Examples

Scoring

5 Answers 5

C

Python 3

C

C (gcc)

Tcl, 114 bytes

Not the answer you're looking for? Browse other questions tagged
number
arithmetic
fastest-code
integer
or ask your own question.

Linked

Hot Network Questions

Rules

Input format

Examples

Scoring

5 Answers 5

C

Python 3

C

Tcl, 114 bytes

Not the answer you're looking for? Browse other questions tagged numberarithmeticfastest-codeinteger or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
number
arithmetic
fastest-code
integer
or ask your own question.