What Is OSINT?
OSINT stands for Open Source Intelligence. It is the practice of collecting and analyzing information from publicly accessible sources. In a cybersecurity context OSINT means gathering data about a target organization using nothing more than what is already exposed to the internet. DNS records, WHOIS data, certificate transparency logs, job postings, breach databases, social media profiles and cached web pages all qualify as open source intelligence.
The term originates from military and government intelligence disciplines. The Office of the Director of National Intelligence defines OSINT as intelligence produced from publicly available information that is collected, exploited and disseminated in a timely manner. In cybersecurity we have narrowed the definition to focus on data that reveals an organization's technical footprint and potential vulnerabilities.
OSINT is not hacking. You are not exploiting vulnerabilities or bypassing access controls. You are reading data that is already public. The problem is that most organizations do not realize how much of their infrastructure is visible from the outside. Subdomains they forgot about, services running on non-standard ports, employee email addresses in breach dumps, technology stack details leaking through HTTP headers. All of it is available to anyone who knows where to look.
Why Run Recon on Yourself
Attackers perform reconnaissance before every engagement. Before a phishing campaign they enumerate your email addresses. Before exploiting a vulnerability they scan your ports and fingerprint your technology stack. Before attempting credential stuffing they check whether your employees appear in known breach databases.
If you have never run external reconnaissance against your own organization you do not know what your attack surface looks like from the outside. You are making security decisions based on an internal view of your infrastructure while attackers operate from the external view. Those two perspectives rarely match.
External recon answers specific questions. How many subdomains do you have and are any of them abandoned? Which ports are open on your public-facing hosts? What technologies and versions are visible in your HTTP response headers? Have any employee credentials appeared in data breaches? Are there old staging or development environments still accessible from the internet?
The 5-Step Recon Process
The following process is what we use at Sherlock Forensics during the reconnaissance phase of every external penetration test. Each step builds on the previous one. Start with the broadest view of the target and narrow down to specific findings.
Step 1: Domain Enumeration
Domain enumeration maps out every subdomain associated with your organization. Most companies have far more subdomains than they realize. Marketing creates landing pages. Development teams spin up staging environments. IT provisions cloud services that automatically generate subdomains. Over time these accumulate and many are forgotten.
Forgotten subdomains are a serious risk. They often run outdated software with unpatched vulnerabilities. Some point to decommissioned infrastructure that has been reassigned, creating subdomain takeover opportunities. An attacker who finds an abandoned subdomain pointing to an unclaimed cloud resource can claim that resource and serve content under your domain name.
- Certificate Transparency Logs
- Every publicly trusted SSL/TLS certificate is logged in certificate transparency (CT) logs. Querying these logs reveals every subdomain that has ever had a certificate issued for it. Use crt.sh to search by your root domain. The results include current and expired certificates which gives you a historical view of subdomains.
- DNS Brute Forcing
- Automated tools resolve common subdomain names (mail, vpn, staging, dev, api, admin) against your domain to find active hosts. Tools like Subfinder and Amass combine multiple data sources including CT logs, search engine scraping and DNS brute forcing into a single enumeration pass.
- Search Engine Dorking
- Google and Bing index subdomains. A search for
site:yourdomain.com -wwwreveals indexed subdomains beyond your main website. Add filters likeinurl:adminorintitle:loginto find administrative interfaces that should not be publicly indexed.
| Tool | Type | Cost | What It Does |
|---|---|---|---|
| crt.sh | Web | Free | Searches certificate transparency logs for subdomains |
| Subfinder | CLI | Free | Passive subdomain discovery using multiple sources |
| Amass | CLI | Free | Comprehensive attack surface mapping and subdomain enumeration |
Step 2: DNS Analysis
Once you have a list of subdomains the next step is to pull their DNS records. DNS is the backbone of your external infrastructure and it leaks more information than most teams realize. A records reveal IP addresses. MX records reveal your mail provider. TXT records reveal your SPF policy and domain verification tokens for third-party services. CNAME records reveal which cloud platforms and SaaS products you use.
Look for dangling CNAME records. A CNAME that points to a cloud service you no longer use is a subdomain takeover vulnerability. If staging.yourdomain.com has a CNAME pointing to yourapp.herokuapp.com and you deleted that Heroku app, an attacker can create a new Heroku app with that name and serve content on your subdomain.
dig yourdomain.com ANY +noall +answer
dig mx yourdomain.com +short
dig txt yourdomain.com +short
dig ns yourdomain.com +short
Run these queries for every subdomain you discovered in Step 1. Automate it with a simple shell loop. Pay attention to IP address ranges. If most of your infrastructure lives in one netblock but a few hosts sit in unexpected ranges those outliers warrant investigation.
- DNSRecon
- An open source DNS enumeration tool that automates zone transfers, reverse lookups, SRV record enumeration and Google dorking for DNS data. It consolidates multiple DNS queries into a single report.
- SecurityTrails
- A web-based DNS intelligence platform that provides current and historical DNS data. Historical records show how your DNS has changed over time which helps identify decommissioned infrastructure that still has stale records pointing to it.
| Tool | Type | Cost | What It Does |
|---|---|---|---|
| DNSRecon | CLI | Free | DNS enumeration, zone transfers and reverse lookups |
| SecurityTrails | Web | Free tier | Current and historical DNS records with API access |
Step 3: Port Discovery
Port scanning identifies which network services are running on your public-facing hosts. Every open port is a potential entry point. A web server on port 443 is expected. An SSH service on port 22 open to the entire internet is a risk. A database port like 3306 (MySQL) or 5432 (PostgreSQL) exposed to the public is a critical finding that needs immediate remediation.
Nmap is the standard tool for port scanning. For a basic external scan against your own infrastructure run the following.
nmap -sS -sV -T4 -p- --open -oN recon-output.txt your.target.ip
The -sS flag performs a SYN scan. The -sV flag probes open ports to determine the service and version. The -p- flag scans all 65535 ports rather than just the default top 1000. The --open flag limits output to ports that are confirmed open. This gives you a complete view of what is accessible from the outside.
Shodan provides a passive alternative. Shodan continuously scans the internet and indexes the results. Search for your IP range or domain on Shodan and you will see what ports and services it has already discovered without sending a single packet from your network. Shodan also shows banner data which reveals software versions, SSL certificate details and sometimes default credentials left in service banners.
| Tool | Type | Cost | What It Does |
|---|---|---|---|
| Nmap | CLI | Free | Port scanning, service detection and version fingerprinting |
| Shodan | Web | Free tier | Passive internet-wide port and service data |
| Masscan | CLI | Free | High-speed port scanning for large IP ranges |
Step 4: Technology Fingerprinting
Technology fingerprinting identifies what software, frameworks and platforms your public-facing systems are running. Web servers announce themselves in HTTP headers. JavaScript libraries are referenced in page source. CMS platforms leave predictable file structures and meta tags. All of this information helps an attacker select the right exploits for your stack.
Check your own HTTP response headers. Many web servers and application frameworks include version numbers by default.
curl -sI https://yourdomain.com | grep -iE "server|x-powered|x-aspnet|x-generator"
If your server responds with Server: Apache/2.4.49 or X-Powered-By: PHP/7.4.3 you have just told every attacker which exact CVEs to look up. Suppressing version information in your server configuration is a basic hardening step that most organizations skip.
The Wappalyzer browser extension identifies technologies used on any website including CMS platforms, JavaScript frameworks, analytics tools, CDN providers and hosting platforms. BuiltWith provides similar detection with historical data showing what technologies a site has used over time.
| Tool | Type | Cost | What It Does |
|---|---|---|---|
| Wappalyzer | Browser extension | Free | Identifies web technologies, frameworks and CMS platforms |
| BuiltWith | Web | Free tier | Technology profiling with historical data |
| WhatWeb | CLI | Free | Web technology fingerprinting from the command line |
Step 5: Credential Leak Checks
Data breaches happen constantly. When a third-party service gets breached and your employees used their work email to register, those credentials end up in breach databases. Attackers use these dumps for credential stuffing attacks. They take the leaked email and password combinations and try them against your VPN, email portal, cloud admin consoles and any other login page they found during port discovery.
Have I Been Pwned is the most widely known breach notification service. It allows you to search by email address or domain to see which breaches contain your employees' credentials. The HIBP API supports domain-wide searches for organizations that verify domain ownership. This gives you a list of every employee email that has appeared in a known breach.
Do not stop at email addresses. Search for your organization's name on code repositories like GitHub. Developers sometimes commit configuration files, API keys, internal URLs and even passwords to public repositories by accident. Tools like TruffleHog scan Git repositories for high-entropy strings that look like secrets.
| Tool | Type | Cost | What It Does |
|---|---|---|---|
| Have I Been Pwned | Web / API | Free | Checks if email addresses appear in known data breaches |
| TruffleHog | CLI | Free | Scans Git repos for leaked secrets and credentials |
What Sherlock Recon Scanner Automates
Running all five steps manually is time-consuming. Our recon scanner automates the entire process into a single workflow. It performs subdomain enumeration across multiple data sources, pulls DNS records for every discovered host, checks for dangling CNAMEs, runs passive port discovery through Shodan integration, fingerprints web technologies on every HTTP-responding host and checks employee email addresses against known breach databases.
The output is a structured report that prioritizes findings by risk. Critical items like exposed database ports and subdomain takeover vulnerabilities appear at the top. Informational items like technology stack details are documented but flagged as lower priority. The report includes remediation guidance for every finding so your team can act on the results immediately.
We run this scanner as the first phase of every penetration test we perform. It gives both our team and the client a shared understanding of the external attack surface before active testing begins.
Sample Recon Walkthrough
Here is a condensed example of what a reconnaissance pass looks like against a fictional organization. The domain is example-corp.com.
Subdomain Enumeration Results
subfinder -d example-corp.com -silent
www.example-corp.com
mail.example-corp.com
vpn.example-corp.com
staging.example-corp.com
api.example-corp.com
dev.example-corp.com
old.example-corp.com
hr.example-corp.com
Eight subdomains discovered. The staging, dev and old subdomains warrant immediate investigation. Staging and development environments frequently run outdated code with debug modes enabled. The old subdomain suggests decommissioned infrastructure that may still be accessible.
DNS Check on Suspicious Subdomain
dig staging.example-corp.com CNAME +short
example-corp-staging.azurewebsites.net
The staging subdomain has a CNAME pointing to Azure App Service. If that App Service instance no longer exists this is a subdomain takeover vulnerability. An attacker can provision a new Azure App Service with that name and serve malicious content on staging.example-corp.com.
Port Scan on Primary Host
nmap -sS -sV -T4 --open www.example-corp.com
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 8.2p1
80/tcp open http nginx 1.18.0
443/tcp open https nginx 1.18.0
3306/tcp open mysql MySQL 5.7.38
MySQL on port 3306 is open to the internet. This is a critical finding. Database services should never be directly accessible from external networks. The SSH service on port 22 should be restricted to specific IP addresses via firewall rules. The nginx version is also exposed in the service banner.
Credential Leak Check
A domain search on Have I Been Pwned reveals 14 employee email addresses across 6 different breaches. Three of those breaches include plaintext or weakly hashed passwords. Any employee who reused their breached password on corporate systems is a credential stuffing target.
What to Do with Your Findings
Reconnaissance data is only valuable if you act on it. Prioritize your findings by exploitability and impact.
- Critical: Fix within 24 hours
- Exposed database ports, subdomain takeover vulnerabilities, leaked credentials that are still valid and admin panels accessible from the internet without multi-factor authentication.
- High: Fix within one week
- SSH open to all source IPs, web servers leaking version information, employee credentials found in recent breach dumps and dangling DNS records pointing to decommissioned services.
- Medium: Fix within 30 days
- Forgotten subdomains running outdated software, missing HTTP security headers, technology stack details visible in response headers and expired SSL certificates on non-production hosts.
- Informational: Document and monitor
- Complete subdomain inventory, technology stack documentation, IP address ranges in use and third-party service integrations visible through DNS.
Create a tracking spreadsheet or ticket for each finding. Assign owners. Set deadlines. Rerun the reconnaissance process after remediation to verify the fixes are effective. Then schedule it to repeat quarterly at minimum.
If your reconnaissance reveals complex vulnerabilities or you are unsure how to remediate specific findings, that is exactly what a professional penetration test is for. Our penetration testing service starts with this reconnaissance process and then moves into active exploitation to demonstrate real-world impact.
Frequently Asked Questions
Is OSINT reconnaissance legal?
OSINT reconnaissance that uses publicly available data is legal in most jurisdictions. You are querying DNS records, reading public web pages and searching breach databases that aggregate already-leaked data. However, active scanning with tools like Nmap against systems you do not own may violate computer fraud laws such as the Computer Fraud and Abuse Act in the United States or the Criminal Code Section 342.1 in Canada. Always get written authorization before scanning any network or host that belongs to another organization. When in doubt limit your scope to your own assets.
What is the difference between passive and active reconnaissance?
Passive reconnaissance collects information without directly interacting with the target. Examples include reading DNS records, searching certificate transparency logs and checking breach databases. Active reconnaissance sends packets or requests directly to the target. Port scanning with Nmap and running vulnerability scanners are active techniques. Passive recon leaves no trace on the target's logs. Active recon may trigger alerts in intrusion detection systems and firewalls. A thorough recon process uses both but always starts passive.
How often should an organization run external reconnaissance on itself?
At minimum run external reconnaissance quarterly. Infrastructure changes constantly. New subdomains get created, cloud services get provisioned and employee credentials appear in new breaches. Organizations with frequent infrastructure changes or high-risk profiles should run automated recon weekly. The NIST Cybersecurity Framework emphasizes continuous monitoring of the external attack surface as a core function of the Identify pillar. The goal is to find what attackers would find before they find it.