Home > Uncategorized > Extracting information from the Wikileaks cable.csv file

Extracting information from the Wikileaks cable.csv file

Today the story broke that the diplomatic cables hosted by Wikileaks where leaked and accessible without redactions. You can now download the file from many different sources for example: cryptome.org.

The problem is that this file is huge (1,6g) and its hard to filter out useful cables with a normal text editor. For that purpose I wrote a little program, you can find it here: http://www.cip.ifi.lmu.de/~ateia/cableFinder.jar

Example (for linux):

If you want to find all Cables that come from the embassy in cairo you can use the program like that:

1. Put the cables.csv file in the same folder as cableFinder.jar
2. open a terminal and change the directory to your folder
3. type: java -jar cableFinder.jar cables.csv "FM AMEMBASSY CAIRO" \\n\"\\n
4. 2081 files appear in your folder: “match1, match2, … match2081”

cableFinder.jar takes 3 arguments:

1. the cables.csv file
2. A search string that cables need to contain. If you want only cables from Berlin you can write “FM AMEMBASSY BERLIN”.
3. A delimiter that splits the file where you want it (\\n\”\\n works well to split cables) in windows you have to change it to \n\”\n (thanks for the comment gerheise)

Example for Windows:

1. Preparations
1.1 Download http://cryptome.org/z/z.7z
1.2 If you dont have 7zip download and install it from there: http://www.7-zip.org/download.html
1.3 unzip the z.7z file with 7zip
1.4 Download http://www.cip.ifi.lmu.de/~ateia/cableFinder.jar
1.5 Put it in the same folder as the unziped cables.csv file

2. Check if you have Java installed
2.1 If not download and install it from here: http://www.java.com/en/download/

3. Open a Commandline and run cableFinder.jar
3.1 klick on Start
3.2 Type cmd
3.3 press Enter
3.4 type cd pathToTheFolderWhereTheCables.csvFileIs (replace “pathToTheFolderWhereTheCables.csvFileIs” with the actual path)
3.4 type "C:\Program Files (x86)\Java\jre6\bin\java.exe" -jar cableFinder.jar cables.csv "FM AMEMBASSY CAIRO" \n\"\n

4. If you want to delete all the “matchxxxxxx” files type del match*

Advertisements
Categories: Uncategorized
  1. fritz
    1. September 2011 at 23:15

    Actual JAR-File is 1,7 KB only. It does not work.

    • 1. September 2011 at 23:34

      Hi, it should work even though its only 1,7 KB. The first jar contained some unnecessary classes.

  2. guest
    2. September 2011 at 20:12

    could you post the java code please?

    • 2. September 2011 at 21:07

      Hi, the jar should now include the source files, feel free to improve it and distribute your changes. I didn’t put much effort in it so you should be able to improve it a lot πŸ˜‰

  3. 2. September 2011 at 21:18

    Works excellent on OS X Lion. Thanks a ton!

  4. CL
    2. September 2011 at 21:40

    Thanks! The little program has worked for me in Windows:

    java -Xmx512m -jar cableFinder.jar c.csv “FM AMEMBASSY LIMA” \n\”\n

    ( -Xmx512m is to increase Java Heap size. Maybe for Linux this is also necessary)

  5. 3. September 2011 at 16:36

    Nice πŸ™‚

  6. twiga
    14. September 2011 at 20:42

    Unfortunately only available in German, but maybe still usefull for you:
    You can use A-Z Finder (on Windows) very conveniently.
    This is a free lightning fast offline search engine based on full text indexing.
    Even though not developed since 2006 anymore it is by far the best tool to datamine the cables with ease. No problem to parse the 60 GB unpacked HTML files too (but they are partly censored). The internal index size limit is with 128 TerraByte sufficently large.
    It works on weak hardware like netbooks and the program offers sophisticated searches.
    The 1,6 GB csv can be viewed instanteniously inside.
    This tool is simply a gem.

  7. 22. September 2011 at 21:12

    I found it easier to to import the csv into a MySQL database and query that using Sequel Pro or MySQL Workbench. I’ve written a guide to show how I did this: opening and searching the cables.csv.

  8. 22. September 2012 at 11:01

    is anyone good in the english language here?
    a film-critique of the anti-assange documentary needs some help (use Reply-field for corrections please)
    http://jasminrevolution.wordpress.com/2012/09/11/documentary-review-wikileaks-secrets-and-lies/

  9. 22. September 2012 at 11:03

    not THIS reply-field here, but this on the Blog there πŸ˜‰

    http://jasminrevolution.wordpress.com/2012/09/11/documentary-review-wikileaks-secrets-and-lies/

  10. nwyok
    16. June 2013 at 00:47

    import java.io.BufferedInputStream;
    import java.io.BufferedOutputStream;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.io.PrintWriter;
    import java.util.Scanner;

    public class fileSplitter {

    /**
    * @param args
    * @throws IOException
    */
    public static void main(String[] args) throws IOException {
    if (args.length == 3) {
    File inputFile = new File(args[0]);
    String patternToFind = args[1];
    String valueSeperator = args[2];
    //System.out.println(valueSeperator);

    FileInputStream inputStream = new FileInputStream(inputFile);
    BufferedInputStream inBuffer = new BufferedInputStream(inputStream);
    Scanner fileScanner = new Scanner(inBuffer);
    fileScanner.useDelimiter(valueSeperator);

    int numberOfMatches = 0;
    while (fileScanner.hasNext()) {
    String text = fileScanner.next();

    if (text.contains(patternToFind)) {
    numberOfMatches++;
    String sequenznumber = new String()+numberOfMatches;
    for(int i = sequenznumber.length(); i< 6; i++)
    {
    sequenznumber = "0"+sequenznumber;
    }
    File outputFile = new File("match" + sequenznumber);
    FileOutputStream outPutStream = new FileOutputStream(
    outputFile);
    BufferedOutputStream outBuffer = new BufferedOutputStream(
    outPutStream);
    PrintWriter printer = new PrintWriter(outBuffer);
    printer.write(text);
    printer.close();
    outBuffer.close();
    outPutStream.close();
    }
    }
    fileScanner.close();
    inBuffer.close();
    inputStream.close();

    }else
    {
    System.out.println("incorrect number of parameters");
    }
    }

    }

  11. nwyok
    16. June 2013 at 00:51

    Merci πŸ˜‰

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: