Thursday, June 20, 2013

How to compile libxml2 for lxml (python) - A Guide

Introduction

In this small tutorial/ guide/ how to I will explain how you can build libxml2 for the use with python lxml under Linux (Debian in my case). I had to do this because I wanted to run the Springer Downloader. I'm kind of a beginner to linux and escpecially to compiling something there so I will write down the problems I had - maybe they will be helpful to someone. Comments on how to improve what I did are welcome of course.


Downloading what we need


lxml and libxml


When you go to the lxml website you will find, that under linux you can download the source of lxml and the two libraries it depends on, as stated here.  Quote:
libxml2 2.6.21 or later. It can be found here: http://xmlsoft.org/downloads.html
libxslt 1.1.15 or later. It can be found here: http://xmlsoft.org/XSLT/downloads.html
-------------------------
  1. So first of all download lxml itself (in my case this was lxml 3.2.1.tgz) and unpack it.

    EDIT: You do not need step 2 & 3. I found out a nicer way - thanks at tovotu for pointing it out so that I tried it again.
  2. Follow the link above to the FTP server and download the newest libxml2-2.9.0.tar.gz. Be sure that you have the .tar.gz file. I had to use 2.9.0, while there was already 2.9.1 out. Otherwise I had errors because it requested the older version of the library - no idea why. Unpack it.
  3. Then, also on the FTP, download libxslt-1.1.28.tar.gz (or newer) and unpack it.

gcc, make, python-dev

Now you should make sure you have a few things installed via apt-get. So open your console and make sure you can use sudo command. Then (for debian-based systems like ubuntu) type in:

  1. sudo apt-get install python2.7
  2. sudo apt-get install make
  3. sudo apt-get install gcc
  4. sudo apt-get install python-dev
  5. sudo apt-get install libxml2
  6. sudo apt-get install libxml2-dev
  7. sudp apt-get install libxslt1.1

    You might have some of those packages already installed.

Compiling

Now we have to compile the two libraries we downloaded before.

EDIT: You do not need step 1 & 2 as I found out, when you installed libxml2-dev. Go to step 3 (lxml 3.2.1.tgz).
  • Go to the folder where you unpacked libxml2-2.9.0.tar.gz to and start a console there  - you most likely can do this via right click in the folder somewhere, otherwise change your directory via 'cd'.
    In the console enter the following (yes I know this can be done in one line):
sudo ./configure
sudo make
sudo make install
 Might produce some errors but most likely it will work.
 
  • Go to the folder where you unpacked libxslt-1.1.28.tar.gz to and start a console there.
    Now type in the same commands as above. Do not do this step before the first one or it will not work!

  • Go to the folder where you unpacked lxml 3.2.1.tgz to and start a console there. Type in:
    sudo python setup.py install
    This will install the python lib into the python directory. 

Fixing an error

Now you can try to run for example the springer downloader. At least for me it failed with this error:
ImportError: /usr/lib/i386-linux-gnu/libxml2.so.2: version `LIBXML2_2.9.0' not found (required by /usr/local/lib/python2.7/dist-packages/lxml/etree.so)
This error is because the libxml2.so.2.9.0 got copied to /usr/local/lib/libxml2.so.2.9.0.
You can see this by typing
sudo updatedb
sudo locate libxml2.so 
I don't know why this is the case. For me there was a  /usr/lib/i386-linux-gnu/libxml2.so.2.8.0 probably because this was installed via the debian package(?).

So we have to move the file to the correct location:
sudo cp /usr/local/lib/libxml2.so.2.9.0 /usr/lib/i386-linux-gnu/libxml2.so.2.9.0 
sudo cp /usr/local/lib/libxml2.so.2 /usr/lib/i386-linux-gnu/libxml2.so.2
 

End

Well that is it. When you need additional packages - for example for the springer downloader I mentioned (pyPdf and cssselect) download them and install them like above via 'sudo python setup.py install'.
 




Sunday, June 9, 2013

How to defeat and kill evercookie in Firefox

Introduction

You might have heard about evercookie. It is a concerning development regarding storing data on your local PC so that you can be identified in the future by websites.
In contrast to normal cookies that exist for years now, evercookie uses various techniques to stay on your PC and is not easily deleted by a normal users.
You can see a demo of evercookie on the developer's website http://samy.pl/evercookie.

There, press the button to create an 'evercookie' and try to get rid of it. Most likely you will have no success doing so.


Example evercookie after its creation.

In my small article I will explain how you can get rid of evercookies.

 

Deleting Evercookies

Firefox settings

Within Firefox (or other browsers) you should enable a few options.
Go into your settings -> privacy and enable "Clear history when Firefox closes" and click on the extra button next to it. There you should enable at least "Cache" and "Cookies".
This will get rid of the cookieData, pngData, etagData and cacheData.

 

Session Manager settings

Using the addon SessionManager - http://sessionmanager.mozdev.org/ - will lead to problems when using the default settings because it will also back-up the evercookie. This might also be the case for TabMixPlus session manager, but I did not investigate this.

1) Go into the SessionManager settings and set the session saving options for session data to 'never' as seen in the screenshot below.




2) In the 'generals' tab and disable the restoring of session cookies as seen in the screenshot below.


Now when you save your sessions with Session Manager - or use the backup after crash function - evercookie will not be restored anymore.

The harder parts

Now there are several places on your PC to store information that a normal user will not know about such as Flash LSO, Silverlight (though silverlight is kinda useless today and you should not have it installed anyway if you don't have a special site using it), and HTML5 web-storage.

HTML5 storage

HTML5 introduced several ways to save data on your local PC in order to make advanced cookies. This has advantages but also disadvantages like user tracking.
You can disable this storage completely for example in Firefox type about:config into the address bar and then search for "dom.storage.enable". You can double-click the value to change it to "false". This will prevent all localData and sessionData from evercookie to be saved.
Most websites will have no problem without those new storage but some poorly written websites like twitch.tv will produce errors.
Those data is stored in your Firefox profile folder in the file "webappsstore.sqlite" and could be edited by a sqlite editor, but it is much easier to simply delete the file when starting Firefox. This has no disadvantages and websites will still all work.
After the Flash part I will give you instructions on how to delete the file easily.

 

Flash

Adobe Flash allows to set LSO - http://en.wikipedia.org/wiki/Local_Shared_Object - aka FlashCookies. Those files are not dependent on the browser you use, so whether you use Firefox or InternetExplorer or another browser, they will always be saved to the same folder on your HDD: "<drive>:\Users\<username>\AppData\Roaming\Macromedia\Flash Player\".
You can safely delete this folder before you start your browser, which will get rid of lsoData from evercookie.

Deleting HTML5 storage and Flash storage with a batch script

I recommend using a simple batch script ( .bat ) for deleting HTML5 storage and Flash storage and then starting Firefox.
You can create a very simple batch script by following steps 1 & 2 from this small tutorial.
When you have notepad open copy and paste the following three lines into your editor and save it as .bat.

rmdir /S /Q <drive>:\Users\<username>\AppData\Roaming\Macromedia\
del /F /S /Q <drive>:\Users\<username>\AppData\Roaming\Mozilla\Firefox\Profiles\<profilename>\webappsstore.sqlite
<drive>:\Program Files (x86)\Firefox\firefox.exe

You have to replace all words in <> brackets with your own data. For example <drive> is most likely C. <username> is your windows user name, <profilename> could be something like 9s8h3ask.default. Go to the folders and find your own data.

The batch file will first remove the whole Flash cache directory (rmdir /S /Q), then it will delete the webappsstor.sqlite file with the HTML5 storage (del /F /S /Q) and then it will start Firefox.
Keep in mind that this will only work when Firefox is not running, otherwise the .sqlite file will be locked.


 The End


After deleting everything.

So as you can see, 'killing' evercookie manually is not that easy but it can be automated like with the batch script easily.
You can (and should) also use AdBlockPlus for Firefox or Chrome and NoScript for Firefox as those two addons will also prevent you from getting tracked in the web.

Have a nice day =)