EXAMPLES:
GET URLS FROM A GOOGLE SEARCH TERM
ecasbas@cipher:~/proyectos/urldigger$ python urldigger.py -g urldigger http://urldigger.com/ http://code.google.com/p/urldigger/ http://code.google.com/p/urldigger/updates/list http://sniptools.com/vault/urldigger http://www.urldigger.com/articles/81/asshole-of-the-year-nominee-abu-abdullah.html ----OUTPUT CUT-----
GET URLS FROM TWITTER HOT WORDS
ecasbas@cipher:~/proyectos/urldigger$ python urldigger.py -W http://itunes.apple.com/us/album/now-playing/id193558513 http://sourceforge.net/projects/nnplaying/ http://vivapinkfloyd.blogspot.com/2008/06/how-to-make-simple-amarok-now-playing.html http://vivapinkfloyd.blogspot.com/2008/05/how-to-make-simple-amarok-now-playing.html ----OUTPUT CUT-----
GET URLS FROM CRAWLING YOUR SITE
ecasbas@cipher:~/proyectos/urldigger$ python urldigger.py -c http://www.nasa.gov http://www.nasa.gov/about/career/index.html http://www.nasa.gov/about/highlights/bolden_bio.html http://www.nasa.gov/about/highlights/garver_bio.html http://www.nasa.gov/about/highlights/leadership_gallery.html http://www.nasa.gov/about/org_index.html http://www.nasa.gov/about/sites/index.html http://www.nasa.gov/astronauts ----OUTPUT CUT-----
SHOW HOT URLS FROM ALEXA
ecasbas@cipher:~/proyectos/urldigger$ python urldigger.py -H http://realestate.yahoo.com/promo/most-expensive-us-small-town-sagaponack-ny.html http://www.realsimple.com/home-organizing/new-uses-for-old-things/new-uses-penny-00000000027632/index.html?xid=yahoobuzz-rs-012210&xid=yahoo http://movies.yahoo.com/news/usmovies.thehollywoodreporter.com/forbes-lists-biggest-flops-last-five-years http://health.yahoo.com/experts/drmao/23125/soup-therapy-detoxify-lose-weight-and-boost-immunity/ http://answers.yahoo.com/question/index?qid=20100111162407AATTvcJ ----OUTPUT CUT-----
BRUTE FORCE MODE
ecasbas@cipher:~/proyectos/urldigger$ python urldigger.py -b > allurls.txt
Be careful, currently the output is about 18917 urls.
DETECT SPAM OR SPURIOUS CODE IN YOUR SITE
ecasbas@cipher:~/proyectos/urldigger$ python urldigger.py -g "site:uclm.es" Looking for SPAM in........http://publicaciones.uclm.es/ *Suspicious SPAM!!!-----> http://publicaciones.uclm.es/* [(viagra)] Looking for SPAM in........http://www.uclm.es/to/cdeporte/pdf/PublicacionesProfesorado.pdf Looking for SPAM in........http://www.uclm.es/cr/caminos/publicaciones/publicaciones.html Looking for SPAM in........http://www.uclm.es/profesorado/ricardo/Publicaciones.htm Looking for SPAM in........http://publicaciones.uclm.es/index.php?action=module&path_module=modules_Product_index *Suspicious SPAM!!!-----> http://publicaciones.uclm.es/index.php?action=module&path_module=modules_Product_index* Looking for SPAM in........http://www.uclm.es/PROFESORADO/mydiaz/_private/PUBLICACIONES.pdf
NOTE: Functional code only available thorough the source in the repository.
另外撰寫過濾urldigger.py輸出行的程式碼,比如過濾出結尾為.jpg的URL
並將每行URL輸出,接著利用wget 下載所有過濾後的連結。
python urldigger.py -c http://exawarosu.net/archives/7356470.html|python isPicurl.py |while read line; do wget -P /home/stayhigh/mypics $line; done
沒有留言:
張貼留言