This is the second and likely final post I will do on using Circos for an IT investigation. In the first part of my detailed overview of how I had used Circos, I overviewed the gathering of evidence, and then how that evidence was used to generate the initial Circos visualization. In this post, we will add the extra details step by step that resulted in my final graph.
The first thing we need to do is add the banding to each of the “Users” arc’s. The bands in Circos are defined using the configuration line:-
BAND [Parent CHROMOSOME] BAND-NAME BAND-NAME START-POS END-POS COLOR
Unlike other types of potential visualizations, my requirement is that a single band represents an email. So I needed to add a band configuration that was defined by 1 unit. This produced a band configuration as shown below which was appended to my karyotype file.
If we now run the Circos command again as was shown in part one with the new karyotype file we will end up with a visualization that now includes bands on each of the User’s arc’s.
We are now ready to start defining the links that show the which user sent which email and to who. To draw a line to to link two User segments together, you need to specify the co-ordinates of each end point under a common Link ID as per the following:-
LINKID [Parent 1 CHROMOSOME] StartCo-Ord EndCo-Ord
LINKID [Parent 2 CHROMOSOME] StartCo-Ord EndCo-Ord
When I was creating my link file, I went back to my original Excel spreadsheet and created a dedicated “link” worksheet to track all the connections. I sorted my Message summary worksheet (the first worksheet I used to record all the details of each message) by Message ID, and then by Sender. From there, I worked down my way down the rows by MessageID, adding the appropriate details into my “link” worksheet.
Making use of the fantastic flexibility of Circos, then I created separate link files for each “Sender”. This allowed me to use the link “Rules” option to define different link line formats (i.e. Color) to make it easier to determine who was the sender of an email and who was receiving.
The final part of the visualization was the add the outside histogram. In my previous posted I mentioned that I also recorded the Date/Time in the summary Excel worksheet for each MessageID. I created a new column in my excel spreadsheet and used this formula to determine the age in months since the email had been sent:-
=(YEAR(DATE(2009,10,1))-YEAR(H52))*12 + MONTH(DATE(2009,10,1)) – MONTH(H52)
Note: column H was where the Date time was recorded in the summary sheet.
I then created an additional column to divide the total number of months by 3 to define the age in quarters. I did it this way instead of modifying the original formula because I was playing around with the scale on the histogram to determine the best visual representation while still conveying meaning. To plot the histogram I had to define a plot file. I grouped both the sender and recipient together to ensure the histogram values were correct on either side of the link. The plot elements were defined in the file using the syntax:-
[Parent 1 CHROMOSOME] StartCo-Ord EndCo-Ord HISTOGRAM-Value
[Parent 2 CHROMOSOME] StartCo-Ord EndCo-Ord HISTOGRAM-Value
Once again while defining the histogram “plot” section in my Circos.conf file I made use of Circos’s rules. This time using 3 rules based on the histogram value to change the fill color. This allows people to quickly identify the more recent incidents versus older incidents. This helps, because if someone received a lot of inappropriate material only more than 12 months ago, it can be a different issue for HR than someone who has received a lot in the last 3 months.