Featured products and servicesadvertise here
Blog CategoriesAllNews AddThis AddToAny Adobe DTM AdRoll Advertising Networks Akamai Alibaba Amazon Amazon CloudFront Angular Apache ASP.NET ASP.NET Ajax Baidu Analytics Baidu Share Bitrix Blogger Bootstrap CDNJS CentOS Character Encodings China Telecom China Unicom Chitika Client-side Languages Cloudflare Cloudflare Server ColdFusion Compression Concrete CMS Content Delivery Content Languages Content Management Cookies CSS Frameworks Data Centers DataLife Engine Debian Default Protocol Https DigiCert DigiCert Group Discuz! DNS Servers Dojo DoubleClick Drupal Elementor Email Servers Ensighten ExoClick Facebook Pixel Fastly Fedora Flash Full Circle Studies Gemius Gentoo GlobalSign Gmail GoDaddy Group Google +1 Google Ads Google AdSense Google Analytics Google Hosted Libraries Google Servers Google Tag Manager GridPane Gunicorn Histats Hostinger Hotjar HTML HTML5 HTTP/2 HTTP/3 IdenTrust Image File Formats Infolinks IPv6 Java JavaScript JavaScript Libraries Joomla JQuery JQuery CDN JsDelivr Let’s Encrypt Liferay Linux LiteSpeed Magento Markup Languages Matomo Matomo Tag Manager Microsoft Microsoft-IIS Modernizr MooTools New Relic Newfold Digital Group Nginx Node.js Operating Systems OVH PHP Plesk Plone PNG PopAds PrestaShop Prototype Python Quantcast React Red Hat Reverse Proxies Ruby RunCloud Scala Scientific Linux Sectigo Server Locations Server-side Languages SharePoint Shopify Silverlight Site Elements Social Widgets SPDY Squarespace SSL Certificate Authorities Symantec Group Tag Managers Tealium Team.blue Tomcat Top Level Domains Traffic Analysis Tools TYPO3 Ubuntu Umeng Underscore United Internet Unix Unpkg UTF-8 VBulletin Web Hosting Web Panels Web Servers Windows Wix WooCommerce WordPress WordPress Jetpack Xandr XHTML Yandex.Direct Yandex.Metrica YUI Library |
Our website sample set has been extended to include all of "the relevant web"Posted by Matthias Gelbmann on 14 November 2022 in NewsThe relevant webThe relevant web consists of the websites that have some meaningful content or functionality. The majority of websites don't have that. They are for example parked domains, registered for future use or to be sold at a profit, and showing only a "domain for sale" page, often loaded with ads. We often see websites that only show the default page of the web server or of the content management system in use. We also see large clusters of websites that have basically the same content (e.g. "buy viagra"), and link to each other, presumably in an attempt to fool search engines. We don't want to count these sites, because the technology they use is not representative for relevant sites, and including them would make our statistics less useful. In order to determine whether a website is relevant, our algorithm looks for certain signals. The presence of positive signals and the absence of negative signals makes a website relevant.
Popularity rankingFor obtaining the popularity of a website, we have replaced Alexa by the Chrome User Experience Report (CrUX) provided by Google. That report contains sites that have a certain amount of visitors, and also provides a rough popularity metric. In addition to that, we keep using a customized version of the Tranco list to get a second opinion. We use the ranking data not only as indicator for relevance, but also for our dedicated ranking breakdown reports. The initial results of that change show no big changes in our statistics. That is an indication that our original solution to use the Alexa top 10m sites was already a very good sample for the whole web. Nevertheless, I'm satisfied that we have implemented a way to distinguish between useful and useless sites that does not depend on the data of a single provider. Share this page |