NSA's secret Google tricks revealed in declassified guidebook
The National Security Agency recently declassified a book called "Untangling the Web: A Guide to Internet Research" that contains over 600 pages of tips for researching information online.
According to Wired, the book was released following a Freedom of Information Act (FOIA) request filed in April by MuckRock, a group that specializes in processing public records requests. Sections in the guide include "Introduction to Searching," "Mastering the Art of Search" and "Uncovering the 'Invisible' Internet."
"Untangling the Web" has hundreds of pages of the what were the most advanced search tips in 2007, when the book was originally published.
A section dedicated to hacking Google gives tips on how to search for confidential information by using search queries. Using a command like "filetype:xls site:za confidential" will search for confidential Microsoft Excel files in South Africa, which uses a top-level domain of .za.
"One of the most popular Google hacking technique is to employ stock words and phrases such as proprietary, confidential, not for distribution, do not distribute, along with a search for specific file types, especially Excel spreadsheets, Word documents, and PowerPoint briefings," the author writes.
The author explains that many people believe search engines only index websites (at least at the time of writing), but a change in Google's search years ago started to include Word documents, Excel spreadsheets and PDF files.
Another tip for International research is to search in the native language of the country. In an example using the Russian language, a search in English resulted in 32,000 hits, compared with 1.25 million hits in Russian.
It appears that at least one copy of the guidebook may have already existed online at OpenSourceIntelligence.eu, the website for an organization called Reuser's Information Services (RIS).
A PDF copy of the book that does not have markings and includes the author's name and original links appears to have been online at the RIS site since at least July 2, 2011 -- when the Web page was last modified. The website's founder did not immediately respond to CBS News' request for more information.
Although the NSA's copy has redacted the author's name, Wired reports that the MuckRock's FOIA request shows it was written by Robyn Winder and Charlie Speight.
The guide was originally published in 2007 by the Center for Digital Content. Winder knew that the Internet would be constantly changing and wrote in the introduction: "I cannot emphasize strongly enough that this book was already out of date by the time it was published."