Skip to main content

Discovering Digital Footprints

Discovery is the process by which WeCheck's engine actively traverses the open web to locate every publicly accessible signal linked to a subject. Unlike a standard search query that returns indexed results, WeCheck's Discovery Engine orchestrates multiple autonomous AI agents working in parallel to find, connect, and validate data across platforms.


A traditional search engine is reactive and linear — it returns pages that match a keyword. WeCheck's Discovery Engine is proactive and recursive — it follows leads, validates matches, and branches into new sources as it finds them.

The difference in practice: a Google search for "John Smith" returns thousands of unvalidated results. WeCheck's engine finds the specific John Smith you are investigating by anchoring on unique behavioral and visual signals, then maps every connected digital asset to that confirmed identity.


The Multi-Agent Architecture

When a scan is launched, WeCheck deploys four specialized agents that run in parallel:

AgentMission
Mapper AgentScans known social and professional networks (LinkedIn, X, Instagram, Facebook) for profile matches
Deep-Web AgentProbes unindexed forum archives, public databases, news repositories, and blog platforms
Relationship AgentExtracts entities — people, organizations, companies — mentioned in the subject's footprint and maps connections
Sentiment AgentRuns NLP analysis on discovered content to assess tone, context, and behavioral patterns

Each agent reports its findings to a central orchestration layer that merges, deduplicates, and validates results before they appear in the report.


The Discovery Loop

Discovery doesn't stop at the first layer of results. WeCheck's agents perform a recursive investigation loop:

  1. Locate — Find a confirmed handle or profile on Platform A
  2. Extract — Pull references to other handles, platforms, or real-world identifiers
  3. Branch — Follow each new lead to its source platform
  4. Validate — Cross-reference the new find against the confirmed identity (visual + behavioral signals)
  5. Repeat — Continue until no new leads are found or the scan depth limit is reached

This loop is what allows WeCheck to surface hidden connections that a single-pass search would miss entirely — for example, linking a professional LinkedIn profile to an anonymous Reddit account through shared behavioral patterns and network overlap.


Discovery Scope

What WeCheck scans:

  • Social and professional networks (LinkedIn, X/Twitter, Instagram, Facebook, TikTok)
  • Forums and communities (Reddit, StackOverflow, Quora, niche forums)
  • News and media publications (local and international press, press releases)
  • Public image repositories (publicly shared photos and videos)
  • Blogs and long-form writing platforms (Medium, Substack, personal sites)
  • Public code repositories (GitHub, GitLab — public repos and commit history)

What WeCheck explicitly does not scan:

  • Private accounts or content behind authentication walls
  • Deleted or removed content (unless archived by third parties in the public domain)
  • Dark web or encrypted networks
  • Paywalled content
  • Private messages or direct communications of any kind

Cross-Platform Identity Linkage

One of the most powerful aspects of WeCheck's Discovery Engine is its ability to connect profiles across platforms even when the subject has not explicitly linked them. Linkage is established through:

  • Handle Similarity — Derived or variant usernames across platforms (e.g., jsmith_dev on GitHub and jsmith.dev on X)
  • Biographic Overlap — Consistent location mentions, employer references, or personal details across disconnected profiles
  • Network Proximity — The subject interacts with the same group of people or entities across multiple platforms
  • Visual Anchoring — The same face appears in profile images across unlinked accounts

Each cross-platform link is accompanied by a Confidence Score reflecting how strongly the evidence supports the connection. See AI Matching for details on how scores are calculated.


Identity Persistence

WeCheck includes a specialized capability to detect Identity Persistence — cases where a subject has attempted to reduce their digital footprint but left residual signals:

  • Archived or cached versions of deleted posts (via public web archives)
  • Mentions of the subject by name or handle in third-party content they cannot remove
  • Images published by other accounts that include the subject

This capability is particularly valuable in legal discovery and high-stakes vetting scenarios where thoroughness is essential.


Ethics & Boundaries

WeCheck's Discovery Engine is designed around a strict OSINT (Open Source Intelligence) philosophy:

  • Public domain only — Every data point collected is publicly accessible without credentials
  • Minimal footprint — The engine collects only what is relevant to the investigation, not everything it finds
  • No entrapment — WeCheck does not interact with, message, or provoke subjects during discovery
  • Human review — Discovery surfaces signals; final analytical judgment always rests with human professionals