I tried to do a site:site.com [search terms] on Google but site.com has blocked Google from indexing it via its robots.txt. How can I get around this? Can I download and index the whole site myself somehow and then search my own private index?
Asked
Active
Viewed 33 times
0
-
You could likely mirror the site yourself (depending on the size, of course) but the second portion (looking for relevant data) would likely be the portion where things could get difficult. That is, you could theoretically use a tool like `grep` to find text, assuming whatever you wanted to search for was in plaintext in the HTML source code, etc. But any portions of the site that were created with JavaScript would likely be entirely opaque or require work to render or otherwise investigate. – Anaksunaman Aug 07 '20 at 05:51
-
@Anaksunaman Yes, exactly so that is why I am looking for another solution. – d-b Aug 09 '20 at 09:04