Public Documents and Metadata

Online Resources


# Extracting metadata of public documents (pdf,doc,xls,ppt,etc) availables in the target websites

# The tool first perform a query in Google requesting different filetypes that can have useful metadata (pdf, doc, xls,ppt,etc)
# Then will download those documents to the disk and extracts the metadata of the file using specific libraries for
# parsing different file types (Hachoir, Pdfminer, etc)

# Options
# -d: domain to search
#  -t: filetype to download (pdf,doc,xls,ppt,odp,ods,docx,xlsx,pptx)
# -l: limit of results to search (default 200)
# -h: work with documents in directory (use \"yes\" for local analysis)
# -n: limit of files to download
# -o: working directory (location to save downloaded files)
# -f: output file -d -t doc,pdf -l 10 -n 10 -o /tmp/result -f /tmp/result/result.html

Investigate Powerpoint Documents

# You can potentially get lots of data from PPTX files
# 1/ Obvious metadate such as author
# 2/ But also from embedded content such as screenshots

# - Some people often use shapes to hide content. If you can edit, you can delete them and get the content
# - "Crop" feature can also be used to get the full content of a screenshot

# You can also simply unpack the whole document (if using Open XML Format) and get the embedded content


# Searches through git repositories for secrets, digging deep into commit history and branches.
# This is effective at finding secrets accidentally committed.

truffleHog --regex --entropy=False

truffleHog --json --max_depth 10


# Collect metadata about IP
# You have two main functionnalities divided into modules (gather and analyze)

# Load IP file
[>] load /path/to/ip.txt

# List all the gather modules
[>] list gather

# You can then user the gather command to collect from any source
# Shodan is the only module that requires an API key (Just-Metadata/module/intelgathering/
[>] gather
[>] gather shodan

# List all the analysis modules
[>] list analysis

# Then you can use the analyze command
[>] analyze geoinfo
# You can get all gathered info about one IP with the following
[>] ip_info <ip>

# You can save your result to use it later
[>] save
[>] import /path/to/file.state