Inappropriate Content Visualization – Mark II

Some time ago I wrote a blog explaining the visualization techniques I had developed to help non-technical HR personal interpret the overall scope of a particular investigation. While the specific evidence was perfectly fine to determine if there was a breach of policy, the depth of complicity of the end users actions can sometimes be hard to determine with just assorted evidence. An end user that only sends inappropriate content to one person a number times may be considered different to an end user that forwards one specific piece of inappropriate content to multiple people.

When the latest investigation of this nature appeared on my radar, I had a quick browse to see if there was a better way to automatically generate similar linking diagrams that I had previously created manually in Visio. While looking at I notice the posted by Ben from the Honeynet project in Australia that used the graphic tool Circos.

The Circos tool was created by Martin Krzywinski for visualizing links in genomes. While the Circos seemed to be very flexible in the amount of information that could be visualized, it was very industry specific and the configuration terminology is specific to genetics. While it took a bit of time to reverse engineer the terminology in my head and really start to understand how Circos works, I believe this tool could be of great value in the IT space for all sorts of visualizations of large data sets where you want to show relationships. Because of this, I am planning to follow up this blog with a more detailed explanation of the Circos configuration files I produced in the hope that it helps others make use of this tool.

For my first attempt at using Circos I ended up to mapping 26 different internal users, and grouping all external users as another entity. This produce the following graphic.

While at first glance, the graphic looks impressive and seems to be very complicated, once you understand how to read it, it quickly becomes very useful for showing the overall relationships between who was sending and receiving inappropriate content, and how many “networks” of people were involved.

If we look at the inner band first, you will note that the circumference is broken up into 27 different colored parts. Each part for this visualization represents a different internal end user; or in the case of the light blue where “User 4” would be, all external users to the company. Each colored arc is then broken into smaller segments. Each of these segments represents a specific email (or emails of similar content if the end user also forwarded it on after receipt) that was sent or received. The lines that link different users is colored the same color as the arc that represents the end user that sent the email.

circos-image - arc and links section.png

One of the benefits of the Circos tool is that you can add multiple bands of data in the visualization. Using this ability, I added around the outside of my final graphic a histogram that also shows the age of the email for each segment. As explained in my original post on this topic, adding a time period can be important for HR to determine the appropriate discipline. In my graphic, the histogram for the Y axis is broken into 6 monthly segments. For each email represented, a bar is drawn to show how old (from the date of analysis) the original email in question was received. To make it easier to see trends in time I also used a red bar to represent any emails with in 3 months, orange for 3-9 months and green for any emails from 9-18 months old.

circos-image- historgram arc.png

Circos is a wonderful tool, and I definitely plan to expand my use of it in the future. One of the next projects I want to use it for is to visualize internal WAN traffic (probably netflow data) to better understand the internal traffic inter-relationships.

  1. That’s a seriously impressive visualisation! Are the data and configuration files for Circos available for download anywhere? I’d love to experiment.

    • 5thsentinel said:

      Thanks Alec,

      I will follow up with a step-by-step outline of how I created it soon, but if you are keen to play, I started by playing with the online version of Circos which will generate a ribbon based visualisation for a table of up to 15×15 cells. Not only will you get a copy of the graphic, but you are also provided with the automatically generated config files that were used to create it. This makes it a bit easier to reverse engineer the config options to you are after.

  2. bob_the_web said:

    Hi, I am trying to start with simple email chart with ‘To:’ and ‘From:’ pairings – lots of them. You have done similar charts in Visio, but have you tried anything as simple as this with Circos? Not sure how to start with the ‘table and column’ concept…I am also looking at trying Circos with netflow so maybe we could share experiences with that one! Cheers

    • 5thsentinel said:


      Sorry life has got in the way and I haven’t been able to publish my next blog as quick as I wanted to. I am about 3/4 of the way through the draft and hope to have it up in the next 24 hours. If you wait for the blog it will give some background on how I started to pair things up for my visualization, however it sounds like you are after a table similar to one that I started with by testing the online circos ribbon utility (which also allows you to download the auto-created config files used by Circos) .

      What I did there was did a matrix grid of User1 through to User27 in the rows and columns. I think I based the rows as the “Sender”, so for each row I just added the total number of emails sent from User1 to User2, User3, User4 etc etc. The raw data I used was based on the first worksheet I created is explained in my upcoming posting. Using the output of something like the Sendmail parser on by Raffey might also allow you to create a similar inital table.

      Just keep an eye out over the next couple of days for my next post and let me know if it helps out or not.


      • bob_the_web said:

        Thanks for the reply. I have realised my issue. I have around 2,000 unique email addresses in my dataset and so this is not going to be the right approach. I can see how this will work if I get a much smaller set. I am almost at the ‘data mining’ stage not the forensic stage you are at. I still look forward to seeing the blog! Cheers

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: