In my last blog I introduce the genome visualization tool called Circos created by Martin Krzywinski. In this post I am going to try provide an overview of the Circos tool in such a way that you can safely concentrate on what the genome terminology represents in the configuration files without being concerned about the specific meaning.
On the Circos webpage you will find excellent tutorials that Martin has already created and I have no intention of trying to reinvite the wheel. Instead, I hope to provide you with a type of Rosetta stone that you can reference when reading the tutorials so that you can more easily translate your requirements into the specific configuration changes you need.
At the core of Circos is the karyotype file. This file includes the total data set that you are basing your visualization on. For genetics, the file normally contains all data on the chromosomes, for my email investigation visualization, the karyotype file held the complete data for the 27 users being represented. An alternative data abstract for IT related visualizations may be the results of a traffic capture on a Class C network. In that case, the karyotype would hold each individual IP address and corresponding traffic type (e.g. Web, mail, P2P, FTP …). Do not be worried about having to filter and include only the data that you may want to use while you are still designing the visualization. Circos gives you plenty of flexibility in its configuration files to draw all the data, or only part of the data represented in the karyotype file.
The karyotype configuration file holds the chromosome data;
- Chromosomes would be equivalent to “email Users” in my investigation visualization
- Chromosomes would be equivalent to “IP addresses” in my network traffic example.
The next central term to understand is the ideogram. For Circos, and ideogram is the graphical representation of a chromosome, and potentially its sub-parts (bands). For my inappropriate investigation graph each “User” was represented by an ideogram. Each users ideogram was a different color, and was broken into segments/bands that represented individual emails of interest. In relation to the network traffic example, an ideogram for an IP address (network traffics chromosome) may be represented with different a color for different UDP/TCP ports, or could be shown as all 65535 ports with a single line for active ports.
An ideogram is a graphical representation for a chromosome;
- The ideograms in my investigation visualization were colored different and had bands for individual emails
- In a network traffic visualization the ideogram for an IP address may only represent active ports, or may show all ports with a line showing those with active traffic.
These are the key concepts you need to understand to work get started and work through the tutorials that Martin has already provided on the Circos hompage. In my next posting I will explain the configurations I used to generate the image I presented in my first blog on Circos.