wget login to a form (using post) DOES NOT WORK - FOR SOME REASON GETTING CONTENT WITH THE LOAD COOKIES FAILS
wget --save-cookies cookies.txt --post-data 'user=USERNAME&password=PASSWORD' --no-check-certificate https://login.example.com/index.php5
note: look at the source html code for https://wiki.example.com/index.php5 , you may notice that the post form parameters are not the simple user and password (and I'm ignoring the SSL certificate - not really worried somebody is DNS spoofing this specific site)
MANUAL (Python) WORKAROUND SOLUTION FOR MEDIAWIKI:
import cgi
index.php5?title=Special:AllPages , for each group of titles save the html source and then python get_titles.py
may have to manually fix any ampersands &
filename = "all-pages-1.txt" count = 0 with open( filename ) as f: for line in f: if 'a href="/index.php5?title=' in line and not 'title=Special' in line: target = 'title="' start = line.find( target ) start = start + len( target ) end = line.find( '"', start )
print start, " ", end,
print cgi.escape( line[start:end] )
print line[start:end]
count += 1
print count
After you have all of the titles cleaned up paste them into: https://wiki.example.com/index.php5?title=Special:Export this creates a MediaWiki XML export
i.e. MediaWiki uses a special login page, special submit button parameter, hidden form token and unusual form titles:
https://wiki.example.com/index.php5?title=Special:UserLogin
One automated way of getting the Token from a script might be:
!/bin/bash
SERVER="https://wiki.example.com" USER="myuser" PASSWORD="mypassword"
wget -U Mozilla --output-document loginPage.html --no-check-certificate $SERVER/index.php\?title=Special:UserLogin
TOKEN=grep Token loginPage.html | grep -o '[a-z0-9]\{32\}'
printf "Token: $TOKEN \n"
wget --save-cookies cookies.txt --keep-session-cookies -U Mozilla --no-check-certificate \ --post-data "wpName=$USER&wpPassword=$PASSWORD&wpLoginToken=$TOKEN&wpLoginAttempt=Log%20in&wpRemember=1" \ $SERVER/index.php5\?title=Special:UserLogin\&action=submitlogin\&type=login
cat cookies.txt
wget -U Mozilla --output-document temp.html --load-cookies cookies.txt --no-check-certificates $SERVER/index.php5\?title=Main_Page
http://wiki.example.com/index.php?title=Special:UserLogin&action=submitlogin&type=login
title=Special:UserLogin wpLoginAttempt=Log%20in wpRemember=1 wpName= wpPassword=
HTTP cookie file.
Generated by Wget on 2013-07-06 11:02:19.
Edit at your own risk.
wiki.example.com FALSE / FALSE 0 mediawiki_support__session jo5kqsk878o7eil5tpek6b3d51
--no-check-certificate # don't care about SSL -U Mozilla # agent is Mozilla, not wget --output-document # save output with a specific file name