john pfeiffer
  • Home
  • Categories
  • Tags
  • Archives

Wget login post form cookie mediawiki export

wget login to a form (using post) DOES NOT WORK - FOR SOME REASON GETTING CONTENT WITH THE LOAD COOKIES FAILS

wget --save-cookies cookies.txt --post-data 'user=USERNAME&password=PASSWORD' --no-check-certificate https://login.example.com/index.php5

note: look at the source html code for https://wiki.example.com/index.php5 , you may notice that the post form parameters are not the simple user and password (and I'm ignoring the SSL certificate - not really worried somebody is DNS spoofing this specific site)


MANUAL (Python) WORKAROUND SOLUTION FOR MEDIAWIKI:

import cgi

index.php5?title=Special:AllPages , for each group of titles save the html source and then python get_titles.py

may have to manually fix any ampersands &amp

filename = "all-pages-1.txt" count = 0 with open( filename ) as f: for line in f: if 'a href="/index.php5?title=' in line and not 'title=Special' in line: target = 'title="' start = line.find( target ) start = start + len( target ) end = line.find( '"', start )

print start, " ", end,

print cgi.escape( line[start:end] )

  print line[start:end]
  count += 1

print count

After you have all of the titles cleaned up paste them into: https://wiki.example.com/index.php5?title=Special:Export this creates a MediaWiki XML export


i.e. MediaWiki uses a special login page, special submit button parameter, hidden form token and unusual form titles:

https://wiki.example.com/index.php5?title=Special:UserLogin

Log in

You must have cookies enabled to log in

 

One automated way of getting the Token from a script might be:

!/bin/bash

SERVER="https://wiki.example.com" USER="myuser" PASSWORD="mypassword"

wget -U Mozilla --output-document loginPage.html --no-check-certificate $SERVER/index.php\?title=Special:UserLogin TOKEN=grep Token loginPage.html | grep -o '[a-z0-9]\{32\}' printf "Token: $TOKEN \n"

wget --save-cookies cookies.txt --keep-session-cookies -U Mozilla --no-check-certificate \ --post-data "wpName=$USER&wpPassword=$PASSWORD&wpLoginToken=$TOKEN&wpLoginAttempt=Log%20in&wpRemember=1" \ $SERVER/index.php5\?title=Special:UserLogin\&action=submitlogin\&type=login

cat cookies.txt

wget -U Mozilla --output-document temp.html --load-cookies cookies.txt --no-check-certificates $SERVER/index.php5\?title=Main_Page


http://wiki.example.com/index.php?title=Special:UserLogin&action=submitlogin&type=login

title=Special:UserLogin wpLoginAttempt=Log%20in wpRemember=1 wpName= wpPassword=

HTTP cookie file.

Generated by Wget on 2013-07-06 11:02:19.

Edit at your own risk.

wiki.example.com FALSE / FALSE 0 mediawiki_support__session jo5kqsk878o7eil5tpek6b3d51

--no-check-certificate # don't care about SSL -U Mozilla # agent is Mozilla, not wget --output-document # save output with a specific file name

REFERENCE:

http://labs.creativecommons.org/2011/04/30/using-wget-to-login-to-mediawiki/

http://www.noah.org/wiki/MediaWiki_notes#wget


  • « uuid benchmark AES key
  • heap sort »

Published

Jul 6, 2013

Category

linux

~303 words

Tags

  • cookie 1
  • export 9
  • form 20
  • linux 249
  • login 11
  • mediawiki 1
  • post 12
  • wget 7