2
\$\begingroup\$

Write a program which outputs the square root of a given number in the shortest time possible.

Rules

It may not use any builtins or powering with non-integers.

Input format

  • as a function argument
  • as a command line argument
  • as a direct input on stdin or a window

Your entry hasn't to be a complete program, it may be a function or a snippet.
It doesn't need to be exact; a one percent margin of error is allowed.
The test cases will only be positive integers, but your program should output/return floats.
It may return the square root as function return value, print it to stdout or write in a consistent file, which is readable by the user..
Each program should specify the compiler/interpreter it should be run on and compiler flags. If they aren't specified, I'll use the following implementations: (just talking about mainstream langs):

  • C/C++: gcc/g++ on Linux, no flags
  • Java: OpenJDK on Linux, no flags
  • Python: CPython
  • Ruby: MRI
  • Assembler: x86 Linux
  • JS: Node.js

Examples

Input    Output
4        2 (or something between 1.98 and 2.02)
64       8 (between 7.92 and 8.08)
10000    100 (between 99 and 101)
2        bewteen 1.40001 and 1.428

Scoring

Your programs on the same computer with same conditions (Terminal and Iceweasel with this tab open). This question has the tag "Fastest code", so the program, which calculates the square root of a (not yet given) random integer between 1 and 10^10 as fast as possible will win!

\$\endgroup\$
20
  • 6
    \$\begingroup\$ The bottleneck is going to be I/O, so for practical purposes this question doesn't have a winning criterion. \$\endgroup\$ Commented Jul 17, 2016 at 12:31
  • 1
    \$\begingroup\$ You need to run it multiple times with multiple random integers otherwise the scoring is totally arbitrary. You should also have challengers specify a compiler. And for languages like C or C++ you need to specify which optimization flag to enable. \$\endgroup\$ Commented Jul 17, 2016 at 17:10
  • 1
    \$\begingroup\$ I don't think you've thought this out fully - 10**10 is a 34-bit integer, and the ARMv7 processor that the Raspberry Pi 2B uses only supports 32-bit integers natively. \$\endgroup\$
    – user45941
    Commented Jul 17, 2016 at 17:30
  • 1
    \$\begingroup\$ I would vote to reopen if I could. We aren't used to fastest-code, but we have to take risks. \$\endgroup\$ Commented Jul 17, 2016 at 17:33
  • 1
    \$\begingroup\$ And in order to be able to compare (>0ns) you should make it accept multiple inputs just so we can compare two programs. \$\endgroup\$ Commented Jul 17, 2016 at 18:29

5 Answers 5

9
\$\begingroup\$

C

Since input is limited to positive integers between 1 and 1010, I can use a well-known fast inverse square root algorithm to find the inverse square root of the reciprocal of the input.

I'm not sure what you mean by "only Xfce and the program and a terminal running" but since you stated that functions are acceptable, I provide a function in C that will take an integer argument (that will be casted to float automatically) and outputs a float as the result.

float f(float n) {
    n = 1.0f / n;
    long i;
    float x, y;

    x = n * 0.5f;
    y = n;
    i = *(long *)&y;
    i = 0x5f3759df - (i >> 1);
    y = *(float *)&i;
    y = y * (1.5f - (x * y * y));

    return y;
}

The following is a program that makes uses of the function above to calculate the square roots of the given test cases and compares them to the values given by the sqrtf function in math.h.

Ideone

\$\endgroup\$
9
  • 1
    \$\begingroup\$ See this blog and/or this paper for an explanation of where this magic number comes from \$\endgroup\$ Commented Jul 17, 2016 at 15:36
  • \$\begingroup\$ 'only Xfce and the program and a terminal running' means there are no other processes started by me. \$\endgroup\$
    – univalence
    Commented Jul 17, 2016 at 16:50
  • \$\begingroup\$ Knew this algorithm and hoped that someone will post it! ;) \$\endgroup\$
    – univalence
    Commented Jul 17, 2016 at 17:02
  • \$\begingroup\$ You need to specify the compiler. \$\endgroup\$ Commented Jul 17, 2016 at 17:11
  • 1
    \$\begingroup\$ Using a union for type-punning would be safe. Pointer-casting this way violates strict-aliasing. \$\endgroup\$ Commented Dec 16, 2016 at 21:04
4
\$\begingroup\$

Python 3

Why am I using Python? Never mind.

def sqrt(n):
    # 3 iterations of newton's method, hard-coded
    # normalize

    # find highest bit
    highest = 1
    sqrt_highest = 1
    while highest < n:
        highest <<= 2
        sqrt_highest <<= 1

    n /= highest+0.0

    result = (n/4) + 1
    result = (result/2) + (n/(result*2))
    result = (result/2) + (n/(result*2))

    return result*sqrt_highest

Ideone it!

\$\endgroup\$
2
  • 2
    \$\begingroup\$ did you just use python for a fastest-code challenge? are you out of your mind? \$\endgroup\$ Commented Jul 17, 2016 at 17:09
  • 2
    \$\begingroup\$ @AgentCrazyPython I've used Java, so, he's good. \$\endgroup\$ Commented Jul 17, 2016 at 20:10
3
\$\begingroup\$

C

Uses unsigned integers when possible for speed.

root.c:

#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define ns(t) (1000000000 * t.tv_sec + t.tv_nsec)

struct timespec then, now;

int output(uint64_t result)
{
    clock_gettime(CLOCK_REALTIME, &now);
  printf("sqrt(x) = %10lu.%2d    real time:%9ld ns\n", result / 100, ((int) (result % 100)), ns(now) - ns(then));
  return 0;
}

//real time:    60597 ns

int main(int argc, char *argv[]) {
    clock_gettime(CLOCK_REALTIME, &then);
    uint64_t num = atol(argv[1]), root = 100, old_root = 0;
    num *= 10000; //multiply by 10k because int is faster
    //and you only need 2 d.p. precision max. (for small numbers)
    while (old_root != root) {
        old_root = root;
        root = (root + num / root) / 2;
    }
    return output(root);
}

timein:

#!/bin/bash

gcc -Wall $1 -lm -o $2 $2.c

./$2 $4

r=$(seq 1 $3)

for i in $r; do ./$2 $4; done > times

awk -v it=$3 '{ sum += $7 } END { print "\n" sum / (it) " ns" }' times

rm times

Usage: ./timein '-march=native -O3' root <iterations> <number>

Inspired by this answer, uses this algorithm.

\$\endgroup\$
8
  • \$\begingroup\$ Running on a Raspi, -march=native -03 doesn't work. Is this needed? \$\endgroup\$
    – univalence
    Commented Jul 17, 2016 at 16:53
  • \$\begingroup\$ @MegaMan I think you typed 'zero three' instead of 'oh three'. \$\endgroup\$ Commented Jul 17, 2016 at 17:12
  • \$\begingroup\$ @AgentCrazyPython Yep, I did. \$\endgroup\$
    – univalence
    Commented Jul 17, 2016 at 17:13
  • \$\begingroup\$ @MegaMan did that fix the problem or what? \$\endgroup\$ Commented Jul 17, 2016 at 17:17
  • \$\begingroup\$ @AgentCrazyPython Yep, works fine now \$\endgroup\$
    – univalence
    Commented Jul 17, 2016 at 17:19
2
\$\begingroup\$

C (gcc)

demonstrates one Newton step after a linear approximation on mantissa speed of C is fast but not really relevant unless you have to apply it several million times then it is sensitive to overall context of SIMD vectorisation, memory access patterns & other branching causing pipeline stalls etc

float f(float x){int i=*(int*)&x;i=0x1fb90000+(i>>1);float y=*(float*)&i;return(y+x/y)/2;}
void main() {float y[4]={4,64,10000,2};
for(int j=0;j<4;j++)printf("f(%f)\t=\t%f\n",y[j],f(y[j]));}

TIO

\$\endgroup\$
1
\$\begingroup\$

Adaptation of my own answer on Fast inverse square root

Tcl, 114 bytes

rename binary B
proc R n {B s [B f f $n] i i
B s [B f i [expr 0x5f3759df-($i>>1)]] f y
expr 1/($y*1.5-$n/2*$y**3)}

Try it online!

\$\endgroup\$
1
  • \$\begingroup\$ You don't need bytecount since this isn't code-golf \$\endgroup\$
    – ASCII-only
    Commented Apr 26, 2018 at 7:53

Not the answer you're looking for? Browse other questions tagged or ask your own question.