# Extracting metadata of public documents (pdf,doc,xls,ppt,etc) availables in the target websites# The tool first perform a query in Google requesting different filetypes that can have useful metadata (pdf, doc, xls,ppt,etc)# Then will download those documents to the disk and extracts the metadata of the file using specific libraries for# parsing different file types (Hachoir, Pdfminer, etc)# Options# -d: domain to search# -t: filetype to download (pdf,doc,xls,ppt,odp,ods,docx,xlsx,pptx)# -l: limit of results to search (default 200)# -h: work with documents in directory (use \"yes\" for local analysis)# -n: limit of files to download# -o: working directory (location to save downloaded files)# -f: output file
metagoofil.py -d domain.com -t doc,pdf -l 10 -n 10 -o /tmp/result -f /tmp/result/result.html
Investigate Powerpoint Documents
https://medium.com/@osint/powerpoint-what-data-is-beneath-the-surface-2eb000ef95fb
https://medium.com/week-in-osint/week-in-osint-2020-21-4c92d335116a
# You can potentially get lots of data from PPTX files# 1/ Obvious metadate such as author# 2/ But also from embedded content such as screenshots# - Some people often use shapes to hide content. If you can edit, you can delete them and get the content# - "Crop" feature can also be used to get the full content of a screenshot# You can also simply unpack the whole document (if using Open XML Format) and get the embedded content
TruffleHog
# Searches through git repositories for secrets, digging deep into commit history and branches.# This is effective at finding secrets accidentally committed.
https://github.com/dxa4481/truffleHog
truffleHog --regex --entropy=False https://github.com/dxa4481/truffleHog.git
truffleHog --json --max_depth 10 https://github.com/dxa4481/truffleHog.git
Just-Metadata
# Collect metadata about IP# You have two main functionnalities divided into modules (gather and analyze)# Load IP file[>] load /path/to/ip.txt
# List all the gather modules[>] list gather
# You can then user the gather command to collect from any source# Shodan is the only module that requires an API key (Just-Metadata/module/intelgathering/get_shodan.py)[>] gather
[>] gather shodan
# List all the analysis modules[>] list analysis
# Then you can use the analyze command[>] analyze geoinfo
# You can get all gathered info about one IP with the following[>] ip_info <ip>
# You can save your result to use it later[>] save
[>] import /path/to/file.state