125

I think that it's fundamental for security testers to gather information about how a web application works and eventually what language it's written in.

I know that URL extensions, HTTP headers, session cookies, HTML comments and style-sheets may reveal some information but it's still hard and not assured.

So I was wondering: is there a way to determine what technology and framework are behind a website ?

8
  • 17
    Try www.builtwith.com
    – SnakeDoc
    Commented Mar 11, 2016 at 15:44
  • 33
    My tomcat server returns "CERN httpd" just to mess with people Commented Mar 11, 2016 at 19:57
  • 22
    @HagenvonEitzen If HTML had been a programming language it would have been named HTPL rather than HTML.
    – kasperd
    Commented Mar 12, 2016 at 9:44
  • 10
    I think that it's fundamental for security testers to gather information about how a web application works and what language it's written in. I think that, if even a security tester can't figure out what language the site is built in, that makes it more secure because then no one will know which exploits to try. (Yes, there are occasionally valid use cases for security through obscurity.) Commented Mar 12, 2016 at 14:06
  • 6
    @MasonWheeler: figuring out what language the site is built in will only determine which exploits not to try. That won't make the site more secure. Commented Mar 12, 2016 at 19:17

6 Answers 6

157

There's no way to be 100% sure if you don't have access to the server, so it's about guessing. Here are some clues:

  • File extensions: login.php is most likely a PHP script.
  • HTTP headers: they may leak some information about the language which is running on the server, and some additional details like the version: X-Powered-By: PHP/7.0.0 means that the page was rendered by PHP.
  • HTTP Parameter Pollution: if you managed to guess which server is running, you can refine the guess.
  • Language limits: maximum post data, maximum number variable in GET and POST data, etc. It may be useful if the webmaster kept the default values.
  • Specific input: for example, PHP had some easter eggs.
  • Errors: triggering errors may also leak the language. Warning: Division by zero in /var/www/html/index.php on line 3 is PHP, for example.
  • File uploads: libraries may add metadata if the file is being modified server-side. For example, most sites resize users' avatars, and checking for EXIF data will leak CREATOR: gd-jpeg v1.0 (using IJG JPEG v90), default quality, which may help to guess which language is used.
  • Default filenames: Check if / and /index.php are the same page.
  • Exploits: reading a backup file, or executing arbitrary code on the server.
  • Open source: the website may have been open-sourced and is available somewhere on Internet.
  • About page: the webmaster may have thanked the language community in a "FAQ" or "About" page.
  • Jobs page: the development team may be recruiting, and they may have detailed the technologies they're using.
  • Social Engineering: ask the webmaster!
  • Public profiles: if you know who is working on the website (check LinkedIn and /humans.txt), you can check their public repos or their skills on online profiles (GitHub, LinkedIn, Twitter, ...).

You may also want to know if the website is built with a framework or a CMS, since this will give information about the language used:

  • URLs: directories and pages are specific to certain CMS. For example, if some resources are located in the /wp-content/ directory, it means that WordPress have been used.
  • Session cookies: name and format.
  • CSRF tokens: name and format.
  • Rendered HTML: for example: meta tags order, comments.

Note that all information coming from the server may be altered to trick you. You should always try to use multiple sources to validate your guess.

14
  • 4
    You forget to mention some example that are from Java which use generally a cookie JSESSIONID for their session management. Login URL can betray unlerlying technology too, spring default URL for instance. Those example are for java but are surely true from some others
    – Walfrat
    Commented Mar 11, 2016 at 15:25
  • 27
    Just a note: just because the http headers say they're powered by php, doesn't mean the site actually is. Although this example is more about the server platform, I know of a guy who would make his nginx server return Server: Microsoft-IIS/5.0 with every request so he could trick attackers into using the wrong attacks against the server. "It's too easy!" ~ the attacker. You're right about that! (This just goes to show that you can't trust headers)
    – d0nut
    Commented Mar 11, 2016 at 15:40
  • 3
    Another good one is checking the source to see if there are tell-tale signs of the use of some templating engine specific to a language.
    – mowwwalker
    Commented Mar 11, 2016 at 20:08
  • 9
    You forgot one of the simplest - looking at the jobs page. :) Commented Mar 11, 2016 at 20:40
  • 4
    If there's a hidden field named "__VIEWSTATE", and/or if the buttons say "href=javascript:__doPostBack" it's likely asp.net. Off the top of my head I can't think of comparable "signatures" in other platforms, but, etc.
    – Jay
    Commented Mar 14, 2016 at 4:07
19

For guessing the programming language, you can follow the three steps approach detailed below:

STEP 1 - Search evidences on the site itself

Manually...

  • Search on a site page at the bottom for phrases like:

    -> "Powered by XXX"
    -> "Proudly Powered by XXX"
    -> "Running on XXX"
    -> ...

  • Search on the site if it will attend any conference where they could talk about the website from a technical point of view

...or with the help of a tool

  • Read the HTML code downloaded by your browser

  • Fire up the Network Tab in developer toolbar and study the exchanges made between the browser and the server.

  • Search for some known hidden page:

    wget -head http://the-site.com/private/admin

    If you get 200, the site may be running on a plublicly (free, paid etc) available software.

STEP 2 - Search evidences on the web

Ask search engines for front-end errors

You can look for some errors produced by the website.

  • Some keywords to type in a search engine:

    • Error 500 site:the-site.com
    • Exception site:the-site.com
    • ...
    • <what ever> site:the-site.com
      => You can simply replace "<what ever>" with some known error message produced by the various web technologies.

Ask search engines for back-end errors

You can even guess the technologies used in the backend:

  • ORA-12170 site:the-site.com
    => If you find something, the site may be using Oracle in its backend part.

Ask search engines for website competitors

  • Find what technology is popular in the website industry

  • Find what technology competitors are using

  • Find comparisons of the site with other competitors.
    Those comparisons may talk about technologies in use

Technology survey sites

Those sites can provide great info to the the site you target. They may have already done some part of the job for you.

STEP 3 - Analyze your results

The evidences you have found in step 1 may be wrong because the site owner can alter them. Try to find contradictions between those evidences. Eliminate contradictional evidences.

Merge the evidences in step 2 between the various sources and yours. Again eliminate contradictional evidences.

Resume all your findings in a table like the one below.

+-------------+-----------+------------------+    ...   +----------+-------+--------+
| EVIDENCES   |  ON SITE  |  Search Engine 1              SOURCE n   SCORE   PCT (%)
+-------------+------------------------------+    ...   +----------+-------+--------+
|    PHP 7    |     X     |       X          |                X    |   3   |  300/n
+-------------+------------------------------+    ...   +----------+-------+--------+
|  Wordpress  |           |       X          |                X    |   2   |  200/n
+-------------+------------------------------+    ...   +----------+-------+--------+
     ...
+-------------+------------------------------+    ...   +----------+-------+--------+
|  EVIDENCE m |           |                  |                     |       | (100*SCORE)/n
+-------------+------------------------------+    ...   +----------+-------+--------+

Finally, you will be able to say "I'm confident at XX% that this site runs on YY (EVIDENCE i)".

3
  • This looks like a useful step by step guide, but it's probably a bad idea to present the arbitrary confidence score as a percentage. Even if a server gets a perfect score it could very well be a carefully assembled honeypot, so you shouldn't say you are a 100% confident that it isn't. Commented Jun 3, 2019 at 9:01
  • @AugustJanse How sould the arbitrary confidence score be presented ?
    – Stephan
    Commented Jun 3, 2019 at 20:58
  • Something like "I conclude that this site runs on YY with a confidence score of XX" perhaps? The problem is that the percentage looks a bit too much like a probability. Commented Jun 4, 2019 at 6:42
18

It's simple. Add Wapplyzer extension available for Chrome as well as Firefox.

It tells about programming language, server, analytics tool or about CMS & Frameworks on which website is built.

Give it a try, you will love it.

6
  • 2
    That seems good .. but is it reliable and accurate?
    – storm
    Commented Mar 11, 2016 at 16:10
  • Yes, its very much accurate. I'm using it from last 4 years and even on my own developed websites. Its always accurate. Commented Mar 11, 2016 at 18:03
  • 12
    I don't think it can be considered accurate. We purposely fake our sent headers to return IIS. Have a wp-admin.php even though we don't use Wordpress. And several other honey pots. Our site is actually a Node.js application that returns static content.
    – Bacon Brad
    Commented Mar 11, 2016 at 18:09
  • 2
    @Ahmed it works by scanning the HTML, headers, URL and JavaScript variables on a page. It's only as good as the rulesets used for detection of course, but I've found it to be right almost always. (But, of course, any web page can be set up to pretend to be running something it isn't.) Commented Mar 12, 2016 at 12:00
  • 13
    Social Engineering: ask how to identify the software used to serve web pages on StackExchange and wait for people to tell what their site runs on. Thank you, @BradMetcalf...
    – Arc
    Commented Mar 13, 2016 at 9:36
8

Besides the Wappalizer browser extension, there are several sites that detect what technologies power a given website:

2

The answer is that you can never "Be assured". Whilst 99.9% of the time the highly up voted answers will find the "tells" of the framework behind the site but it's never a certainty.

Basically your browser receives the end results of the codes processing. (html, CSS and JavaScript ) Between you and the code itself sits a webserver (nginx, Apache etc) and potentially a load balancer and a CDN. Because your not interacting directly there is no way for certainty.

If a website is serving content from wp-uploads/ It's a safe bet that it's running Wordpress but it's not a certainty. Perhaps the site was using Wordpress but when it was migrated to something else the wp-uploads/ path was kept to avoid breaking links and bookmarks.

-2

Sometimes you can know, sometimes you cannot.

If the HTML is generated on the client-side, then you can easily tell which language by looking at the source in your web browser. These languages include: ruby on rails, javascript, java, etc. On the client-side the source is open to the user, and it must be honest about which technology it is.

If the HTML is generated on the server-side you may not know which programming language generated it. These languages include: PHP, C++, and many other languages. On the server-side, for as many ways as you can think of to guess which language it is, there are just as many ways to for the technology to hide itself.

Suppose you are a web administrator that wants to hide the server-side technology. Pick one of the techniques listed in another question for attempting to identify the language. For example, the *.php extension for a file. Now, configure your web server to execute C code from a file with a *.php extension. Your users will have no way to view the source (since both languages are equally capable of producing the same output, by Turing completeness), but they will be misled into thinking you are running PHP.

Why would someone want to obfuscate the server-side choice of technology? Because CGI languages have various vulnerabilities that are easier to target if the end-users know which of those languages you are using. Misleading the users about which server-side technologies you are using is a very reasonable security measure.

2
  • 3
    I didn't downvote, but this answer neglects the numerous techniques available for determining the server-side language and tech.
    – user13750
    Commented Mar 13, 2016 at 5:11
  • 2
    For starters, Ruby on Rails and Java are perfectly capable of generating HTML entirely on the server side. Commented Mar 18, 2016 at 3:33

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .