W3Techs
provided by
Q-Success
Home Technologies Reports API Sites Quality Users Blog Forum FAQ Search

Featured products and servicesadvertise here

Blog Categories

All

News
AddThis
AddToAny
Adobe DTM
AdRoll
Advertising Networks
Akamai
Alibaba
Amazon
Amazon CloudFront
Angular
Apache
ASP.NET
ASP.NET Ajax
Baidu Analytics
Baidu Share
Bitrix
Blogger
Bootstrap
CDNJS
CentOS
Character Encodings
China Telecom
China Unicom
Chitika
Client-side Languages
Cloudflare
Cloudflare Server
ColdFusion
Compression
Concrete CMS
Content Delivery
Content Languages
Content Management
Cookies
CSS Frameworks
Data Centers
DataLife Engine
Debian
Default Protocol Https
DigiCert
DigiCert Group
Discuz!
DNS Servers
Dojo
DoubleClick
Drupal
Elementor
Email Servers
Ensighten
ExoClick
Facebook
Facebook Pixel
Fastly
Fedora
Flash
Full Circle Studies
Gemius
Gentoo
GlobalSign
Gmail
GoDaddy Group
Google
Google +1
Google Ads
Google AdSense
Google Analytics
Google Hosted Libraries
Google Servers
Google Tag Manager
GridPane
Gunicorn
Histats
Hostinger
Hotjar
HTML
HTML5
HTTP/2
HTTP/3
IdenTrust
Image File Formats
Infolinks
IPv6
Java
JavaScript
JavaScript Libraries
Joomla
JQuery
JQuery CDN
JsDelivr
Let’s Encrypt
Liferay
LinkedIn
Linux
LiteSpeed
Magento
Markup Languages
Matomo
Matomo Tag Manager
Microsoft
Microsoft-IIS
Modernizr
MooTools
New Relic
Newfold Digital Group
Nginx
Node.js
Operating Systems
OVH
PHP
Pinterest
Plesk
Plone
PNG
PopAds
PrestaShop
Prototype
Python
Quantcast
React
Red Hat
Reverse Proxies
Ruby
RunCloud
Scala
Scientific Linux
Sectigo
Server Locations
Server-side Languages
SharePoint
Shopify
Silverlight
Site Elements
Social Widgets
SPDY
Squarespace
SSL Certificate Authorities
Symantec Group
Tag Managers
Tealium
Team.blue
Tomcat
Top Level Domains
Traffic Analysis Tools
Twitter
TYPO3
Ubuntu
Umeng
Underscore
United Internet
Unix
Unpkg
UTF-8
VBulletin
Web Hosting
Web Panels
Web Servers
WhatsApp
Windows
Wix
WooCommerce
WordPress
WordPress Jetpack
Xandr
XHTML
Yandex.Direct
Yandex.Metrica
YUI Library

Our website sample set has been extended to include all of "the relevant web"

Posted by Matthias Gelbmann on 14 November 2022 in News

Summary:

We used to use the Alexa ranking as a basis of the samples for our surveys. With the end of service for this ranking we were forced to look for alternatives. Our solution is using our definition of the relevant web.

The relevant web

The relevant web consists of the websites that have some meaningful content or functionality. The majority of websites don't have that. They are for example parked domains, registered for future use or to be sold at a profit, and showing only a "domain for sale" page, often loaded with ads. We often see websites that only show the default page of the web server or of the content management system in use. We also see large clusters of websites that have basically the same content (e.g. "buy viagra"), and link to each other, presumably in an attempt to fool search engines. We don't want to count these sites, because the technology they use is not representative for relevant sites, and including them would make our statistics less useful.

In order to determine whether a website is relevant, our algorithm looks for certain signals. The presence of positive signals and the absence of negative signals makes a website relevant.

  • We look at the content of the pages. That can result in positive or negative signals.
  • We look at external links to the site. Links from other relevant sites are a positive signal, links from "bad neighborhood" may be a negative signal.
  • Duplicate websites, that have the same or very similar content than many other sites, are an indication of low quality sites.
  • High popularity of a website is a positive signal.

Popularity ranking

For obtaining the popularity of a website, we have replaced Alexa by the Chrome User Experience Report (CrUX) provided by Google. That report contains sites that have a certain amount of visitors, and also provides a rough popularity metric. In addition to that, we keep using a customized version of the Tranco list to get a second opinion. We use the ranking data not only as indicator for relevance, but also for our dedicated ranking breakdown reports.

The initial results of that change show no big changes in our statistics. That is an indication that our original solution to use the Alexa top 10m sites was already a very good sample for the whole web. Nevertheless, I'm satisfied that we have implemented a way to distinguish between useful and useless sites that does not depend on the data of a single provider.

Share this page


About Us Disclaimer Terms of Use Privacy Policy Advertising Contact
W3Techs on   LinkedIn LinkedIn Twitter Twitter Mastodon Mastodon Bluesky Bluesky
Copyright © 2009-2024 Q-Success