Detailed look at using Circos for IT Investigation

As promised, I am posting a detailed overview of the steps I undertook to create the my Email investigation visualization using Circos that I wrote about here.

To help set the scene, the evidence used to generate the data files for the Circos configuration originated from a number of Exmerged (exported) Exchange mailboxes. These mailboxes were loaded into a copy of AccessData’s FTK v1.8 (sorry, but v2 really was bad for thumbnail high volume graphic analysis and v3 wasn’t released when I stared the investigation). Once the PST’s had been processed then I could use the Graphics tab and review the generated thumbnails of all images found to quickly identify inappropriate content and assign the attachments and the parent email to a bookmark. Once I had finished with the images, I then when back to the Overview tab and reviewed all the movie attachments and again bookmarked any identified material and its parent email. The benefit of the bookmark feature was that after I had analyzed all the Exmerged mailboxes and other content, I could access all the identified evidence in one place.

To prepare the Circos data files I made use of Microsoft Excel to store key information. Why Excel? Because it was handy and its easy to use if you need to sort data in different ways while you are trying to work out what you require. The key worksheet I used was a summary table of all the emails that I had marked as evidence. For each person that sent or was a recipient of an email, a row was created. This meant for most emails there were multiple rows of To/From pairs. To create the Circos configuration files the key columns of the summary table in Excel were:-


This allowed me to track a common email that went to numerous people. If with in the body of the email it could be seen that the content had been forwarded before to internal people, then additional rows were generated for those details, however utilizing the same MessageID. This let me easily track the email flow between people.


Email address of the Sender of the email, or “External” as a group name if it was sent from outside the company to internal employees.


Email address of the Recipient of the email, or “External” as a group name if was sent to someone on the Internet.


This was used for generating the histogram around the outside

Other information that I recorded, that while was not useful for generating the visualization, helped to produce summary tables and raw evidence exports for HR. These columns were:-

Content Type

This was broken into PowerPoint, image or movie.

Src Mailbox

No matter how careful people are at cleaning their own Mailboxes, it only takes one person in the network not to delete emails for everyone involved to be implicated. This was used to identify which Exmerged mailbox the evidence was found in.


Email Subject line

Once I had all this data, I created a new worksheet called “karyotype”. I think I was lucky that I had a very clear idea about what my final visualization was going to look like, so I could attack the problem of generating the graph in steps. The first step was to generate a graph with all the Users arc’s with the right number of segments in them for the total number of emails sent and/or received. To determine all the “Users” I sorted the first worksheet by Sender+MessageID and recorded a the number of emails different emails sent against the User in the new “karyotype” worksheet. Then I did the same for Recipient+MessageID. Once I had a row for each individual User (plus one for “External” users), I created a third column for Total number of emails.


# Sent























This became the basis for creating my “karyotype” file. If you review the tutorials that Martin Krzywinski has created for Circos, you will note that the “karyotype” configuration file is defined as:-


Therefore my karyotype definition became:

When you download the Circos tarball, there is a subdirectory that contains the configuration files for tutorials called “tutorials/”. When making my graphs, I copied the config files from one of the tutorial directories and placed them into my working subdirectory under “tutorials/” called “Investigation”. The benefit of doing this was that when creating the tutorials, Martin Krzywinski had already defined a number of common configuration files like the colors definitions. It is the in this colors.conf that the RGB for the color labels shown in my karyotype file above are defined.

At this point I tried to generate my first graph. First I had located and copied a copy from one of the tutorials the circos.conf, ticks.conf and ideogram.conf files into my working directory. Using circos command and the configuration files that I created/copied into the tutorial subdirectory I ran the program with in the following way.

Which produced the following visualization.

Once I got to this point, it was just a matter of following the tutorials on banding, linking and using histograms to generate the final visualization. I will walk through each of those steps in my next posting.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: