The Wayback Machine - https://web.archive.org/web/20180129081712/https://www.riskiq.com/platform/architecture/big-data-analytics/

Big Data Analytics

Reveal Actionable Insights From Petabytes of Internet Data

Derived Data Sets

Along with the petabytes of collected data from across the internet, RiskIQ also extracts and analyzes the data to create new data sets that aid in discovering, understanding, and mitigating digital threats. Lists like the RiskIQ Accomplice List which showcases websites and URLs that are not malicious themselves, but that link to or redirect to URLs that host malware, phishing or scams. RiskIQ also maintains our own Phish List, Scam List, and Zero-Day List which include never-before-seen URLs and pages that host phishing, scam, or malware content that we find while crawling the internet with our virtual users.

White Paper: Using Internet Data Sets to Understand Digital Threats

Correlation

Correlation

The RiskIQ platform utilizes correlation algorithms to constantly improve our detection capabilities and virtual user technology.

As the platform and virtual users crawl more websites every day, RiskIQ’s analytic capabilities become more tuned and confident over time. This allows for accurate, automated detection and confirmation of phishing pages, imposters, and scams without the need for any human intervention. In instances where the platform is less confident in its correlation or decision, RiskIQ security analysts step in to review and confirm or dismiss events and detections. This ensures that our customers are protected and the platform evolves intelligently.

Threat Research

Threat Research

RiskIQ’s senior security research team focuses on identifying and investigating new and emerging threats. The team’s research goes into the RiskIQ platform to improve detection for our customers and provide the ability to protect from these threats. Once we find and confirm a new threat, we alert our customers and simultaneously work with the RiskIQ data science team to create or modify the learning algorithms to automatically detect and classify pages containing the new threat without human intervention.

Outside of the RiskIQ platform, RiskIQ’s threat researchers are highly regarded in the information security community. The team publishes research and shares the information we have about threat actor groups with the broader community through news outlets, reports, and PassiveTotal public projects.

 

 

Data Science

data science

The internet is a large place and making sense of it and the data it contains is a daunting challenge. At RiskIQ, we use the huge amount of internet data in unique and innovative ways. The core of many of our products is continuously improving the way we interact with and use this data. Our data science team focuses on bringing new insight to internet data and finding ways that to connect seemingly disparate data sets. By learning from the vast amount of data, we can fine tune our correlation algorithms to detect and alert users to malicious content and infrastructure even before a site is fully weaponized.

 

Mobile App Analysis

Mobile App Analysis

RiskIQ virtual users take inventory of mobile app stores and download applications they encounter on the web as if they were using a mobile device. Using these techniques, the platform is able to link apps between stores and across publishers and platforms. Once downloaded and inventoried, RiskIQ analyzes the applications themselves to determine if there are brand infringing elements, spyware, or malware hiding within the code.

 

Web Page Comparison and Hashing

Minhash Analysis; Web Comparison and HashingUsing a technique called MinHash, the RiskIQ platform finds similarities between websites to locate brand and copyright infringement, phishing pages that look like official pages, and other malicious activity on the web. Using the similarity between pages, RiskIQ determines if two (or more) websites share similar structure, content, or components, which, coupled with our correlation models allows us to classify pages and generate events for customers.

Web Components

When RiskIQ virtual users crawl a website, we extract information about the framework and components of the website itself. This could include the CMS type hosting content, the operating system of the web server that is hosting the content, application frameworks for web applications like Apache, and other information. This information can be helpful in determining potential elements that might be compromised due to vulnerabilities.