Web Analytics FAQ

Wednesday, September 29, 2010

Omniture Products

These are some omniture web analytics products:

* SiteCatalyst, Omniture's software as a service application, offers Web analytics (client-side analytics).
* SearchCenter+ assists with paid search and content network optimization in systems such as Google's AdWords, Yahoo! Search Marketing, Microsoft Ad Center, and Facebook Ads.
* DataWarehouse, data warehousing of SiteCatalyst data.
* Test&Target, A/B and MVT (multi-variate testing), derived in part from Offermatica and Touch Clarity.[6]
* Test&Target 1:1, Omniture's main behavioural targeting solution, drills down to the individual level of testing.
* Discover, an advanced segmentation tool.
* Insight, a multichannel segmentation tool (both client-side and server-side analytics). Formerly called Discover on Premise, it was derived from Omniture's Visual Sciences acquisition in 2007.
* Insight for Retail, an Insight offering geared toward multiple online and offline retail channels.
* Genesis, a third-party data integration tool (the majority of integrations work with SiteCatalyst).
* Recommendations offers automated product and content recommendations.
* SiteSearch, an on-demand enterprise search product.
* Merchandising, a search and navigation offering for online stores.
* Publish, for web content management.
* Survey, to gather visitor sentiment.
* DigitalPulse, a Web analytics code configuration monitoring tool.
* VISTA, server-side analytics.

Friday, September 17, 2010

Web Visitor Identification Methods !!

Urchin has five different methods for identifying visitors and sessions, depending on available information. Of these, the patent-pending Urchin Traffic Monitor (UTM) is a highly accurate system that was specifically designed to identify unique visitors, sessions, exact paths, and return frequency behavior. There are a number of visitor loyalty and client reports that are only available when using the UTM System. The UTM System is easy to install and is highly recommended for all businesses.

In addition to the UTM System, Urchin can use IP addresses, User-Agents, Usernames, and Session-IDs to identify sessions. The following table compares the abilities of each of the five identification techniques:

Data Model

The underlying model within Urchin for handling unique visitors is based on a hierarchical notion of a unique set of visitors interacting with the website through one or more sessions. Each session can contain one or more hits and pageviews. Pageviews are kept in order so that a path through the website for each session is understood. As shown in the diagram, the Visitor represents an individual's interaction with the website over time. Each unique visitor will have one or more sessions, and within each session is zero or more pageviews that comprise the path the visitor took for that session.

Proxying and Caching

In attempting to identify and track unique visitors and sessions, we are basically going against the nature of the web, which is anonymous interaction. Particularly troublesome to tracking visitors are the increasingly common proxying and caching techniques used by service providers and the browsers, themselves. Proxying hides the actual IP address of the visitor and can use one IP address to represent more than one web user. A user's IP address can change between sessions and in some cases multiple IP addresses will be used to represent a cluster of users. Thus, it is possible that one visitor will have different IP addresses for each hit and/or different IP addresses between sessions.

Caching of pages can occur at several locations. Large providers look to decrease the load on their network by caching or remembering commonly viewed pages and images. For example, if thousands of users from a particular provider are viewing the CNN website, the provider may benefit from caching the static pages and images of the website and delivering those pieces to the users from within the provider's network. This has the effect of pages being delivered without the knowledge of the actual website.

Browser caching adds to the question. Most browsers are configured to only check content once per session. If a visitor lands on the home page of a particular website, clicks to a subpage, and then uses the back-button to go back to the home page, the second request of the home page is most likely never sent to the website server, but pulled from the browser's memory. An analysis of paths may result in an incomplete path missing the cached pages.

In the above diagram, the actual path taken through the website by the client is shown at the top, while the apparent path from the server's point of view is shown at the bottom. In this case, before proceeding to Page-3 the user goes back to the Page-1. The server never sees this request and from its point of view it appears the user went directly from Page-2 to Page-3. There may not even be a link from Page-2 to Page-3.

Visitor Identification Methods

As mentioned previously, Urchin has five different methods for identifying visitors, sessions and paths. The more sophisticated methods which can address the above issues may require special configuration of your website. The following descriptions describe the workings of each method in more detail.

1. IP-Only: The IP-Only method is provided for backward compatibility with Urchin 3, and for basic IT reporting where uniquely identifying sessions is not needed. This method uses only the IP Address to identify visitor sessions. Thirty minutes of inactivity will constitute a new session. The only data requirements for using this method is a timestamp and IP Address of the visitor.

2. IP-Agent: The default method, which requires no additional configuration, uses the IP address and user-agent (browser) information to determine unique sessions. A configurable thirty-minute timeout is used to identify the beginning of a new session for a visitor. While this method is still susceptible to proxying and caching, the addition of the user-agent information can help detect multiple users from one IP address. In addition, this method includes a special AOL filter, which attempts to reduce the impact of their round-robin proxying techniques. This method does not require any additional configuration.

3. Usernames: This method is provided for secure sites that require logins such as Intranets and Extranets. Websites that are only partially protected should not use this method. The Username identification is taken directly from the username field in the log file. This information is generally logged if the website is configured to require authentication. This method uses a thirty-minute period of inactivity to separate sessions from the same username.

4. Session ID: The fourth visitor identification method available in Urchin is the Session ID method, which can use pre- existing unique session identifiers to uniquely identify each session. Many content delivery applications and web servers will provide session ids to manage user interaction with the webserver. These session ids are typically located in the URI query or stored in a Cookie. As long as this information is available in the log data, Urchin can be configured to take advantage of these identifiers. Using session ids provides a much more accurate measurement of unique sessions, but still does not identify returning unique visitors. This method is also susceptible to some forms of caching including the above example.

In many cases, the ability to use session ids may already be available, and thus, the time required to configure this feature may be short. For dynamically generated sites, taking advantage of this feature should be straightforward. The result is more accurate visitor session and path analysis.

5. Urchin Traffic Monitor (UTM): The last method for visitor identification available in Urchin is the Urchin Tracking Module. This system was specifically designed to negate the effects of caching and proxying and allow the server to see every unique click from every visitor without significantly increasing the load on the server. The UTM system tracks return visitor behavior, loyalty and frequency of use. The client-side data collection also provides information on browser capabilities.

The UTM is installed by including a small amount of JavaScript code in each of your webpages. This can be done manually or automatically via server side includes and other template systems. Complete details on installing UTM are covered in the articles later in this section.

Once installed, the Urchin Traffic Monitor is triggered each time someone views a page from the website. The UTM Sensor uniquely identifies each visitor and sends one extra hit for each pageview. This additional hit is very lightweight and most systems will not see any additional load. The Urchin engine identifies these extra hits in the normal log file and uses this additional data to create an exact picture of every step taken by the users. This method also identifies visitors and sessions uniquely so that return visitation behavior can be properly analyzed. While this method takes a little extra time to configure, it highly recommended for comprehensive detailed analytics.

Sunday, September 12, 2010

Measure What Matters: Defining Key Performance Indicators !!

The beauty of Web analytics—and the promise of the Internet—is the ability to capture nearly unlimited amounts of data about your Web site. Without a clear strategy to "measure what matters," your Web analytics initiatives will quickly drown in a sea of data. So how can you turn these incredible data resources into clear and actionable insights? A good place to start is by defining key performance indicators, or KPIs.

This paper is designed to help you understand KPIs and define metrics that support your organizational goals. What data should you focus on? Who needs this information? How different are the data needs and delivery mechanisms between various teams and job functions? And what is the most effective and impactful way to share key metrics across the enterprise?

Sunday, September 5, 2010

What is Hitbox ??

Hitbox is a popular web analytics tool and web analytics product created by WebSideStory, now Adobe, originally for adult entertainment websites. Some of the services have been declared spyware by several anti-spyware organizations, such as Lavasoft and the makers of Spybot - Search & Destroy. It is now widely used by commercial & other organizations across a variety of industrial sectors as a complete and integrated metrics solution for monitoring web traffic and driving marketing.

If a hitbox program is unknowingly downloaded to your computer, you will generally not be aware of its presence; however, it could result in slowing down of your computer. An unseen "hitbox" program may also cause other anti-virus or anti-spyware programs to run or take defensive action, which could also then result in slowing down your computer. For instance you may see "TeaTimer" running in your file manager - which is Spybot's anti-spyware program.

Many major corporations install hitbox applications on your computer without your knowledge, to track various aspects of your internet activity. For example Lexmark remote technical support installs ehg-lexmark.hitbox.com on your computer when implementing a remote repair.

Use of these unseen "hitbox" tracking programs is considered unethical by many people in the web community. However, because most computer users are unaware of them (because they invisibly run in the background) very little is done to stop this practice.

Running a reputable spy-ware program is usually the only way to identify and remove a "hitbox" application from your computer.

Saturday, September 4, 2010

Social Media facts and figures !!

The past couple of weeks I collected a couple of interesting facts and figures on Social Media. As I love this kind of information and lists I thought you might be interested too. My first conclusion based on this list….. Social Media has taken over the internet the last year. If you have some interesting add-ons to this list please feel free to add, I am very interested in more details on social media and networking.

There are currently 350 million Facebook users.
25% of all search results for the top 20 brands are links from social media related websites.
34% of all online bloggers blog about their opinions and views on products or services they use.
Google is number 1 search engine followed by Youtube.
In 2008 12.5% of all US married couples have initiately met eachother through a social network
The total online marketing spend on social media was $350 million in 2006, it is expected that this figure increases to over $2.5 billion in 2011.
63% of all Twitter users is male
In total 2,600,000,000 minutes global users are spending on Facebook daily
Flickr has currently over 3.6 billion user pictures and photos
Wikipedia consists over 13 million articles in around 260 languages.
It is expected YouTube will host approx. 75 billion video streams and receives 375 million unique visitors in 2009.
Just within 9 months there were over 1 billion iPhone apps available to download or purchase.
If Facebook was a country, it would be the fourth largest country in the world. (This while Facebook is ‘banned’ in China)
An American study in 2009 discovered that on average, online students learn easier than students who could collect information throught “face to face”contact.
1 in 6 college students has an online resume
110 million US citizens, or 60% of all internet users, are using social networks.The average user of a social network visits social networks around five days a week and logs in around four times each day with a total login time of 1 hour. A social network addict ( approx 9%) stays logged in the whole day and is “constantly checking out user generated content”
The fastest growing group on Facebook consist of females around the age of 55-65 year old.
Generation Y and Z thinks that e-mail is outdated.
Wikipedia receives more than 60 million unique visitors each month and it is said that content on wikipedia is more reliable than any known printed Encyclopaedia.
Social Media has taken over the number 1 activity on Internet, pornography.
Facebook users have translated the entire Facebook website within 4 weeks from English to Spanish using a Wiki. The costs for Facebook were $ 0.
More than 1.5 million pieces of web content (web links, news stories, blog posts, notes, photos, etc.) are daily shared on Facebook. There are over 6,7 Billion “tweets” sent into the world. Watch the live score; http://popacular.com/gigatweet.
Common used enterprise social media strategies: Discussion Boards 76%, RSS 57%, Ratings and Reviews of articles or site content 47%, Profiles of Social Networking 45%, Photo Albums 39%, Chat 35%, Personal blogs 33%, Video-user submitted 35%, Podcasts 33%, Social Bookmarking 29%, Video Blogs 29%, Widgets 22%, Mobile Video/image text submission 16%, Wikis 16%, Citizen Journalism 12%, Micro-blogging 6%, Virtual Worlds 4%
52% of all social networkers become friended or a fan of at least one brand through social networking
95% of business decision makers worldwide use social networks (source: Forrester Research)
17% of all university and college students have found a suitable job with only their online resume
Around 64% of marketers are using social media for 5 hours or more each week during campaigns, with 39% using it for 10 or more hours per week.
The online bookmarking service, Delicious, has more than five million users and over 150 million unique bookmarked URLs.
Favorite people on Twitter (Ashton Kutcher, Ellen DeGeneres and Britney Spears) have more combined followers than the entire population of Austria.*

source: http://www.visitorintelligence.org/social-media-facts-and-figures/

Thursday, September 2, 2010

What is Clickstream ??

Ideally Clickstream is the recording of what a computer user clicks on while Web browsing or using another software application. As the user clicks anywhere in the webpage or application, the action is logged on a client or inside the Web server, as well as possibly the Web browser, routers, proxy servers, and ad servers. Clickstream analysis is useful for Web activity analysis[1], software testing, market research, and for analyzing employee productivity.

A small observation on the evolution of clickstream tracking: Initial clickstream or click path data had to be gleaned from server log files. Because human and machine traffic were not differentiated, the study of human clicks took a substantial effort. Subsequently javascript technologies were developed which use a tracking cookie to generate a series of signals from browsers. In other words, information was only collected from "real humans" clicking on sites through browsers.

A clickstream is a series of page requests, every page requested generates a signal. These signals can be graphically represented for clickstream reporting. The main point of clickstream tracking is to give webmasters insight into what visitors on their site are doing.This data itself is "neutral" in the sense that any dataset is neutral. The data can be used in various scenarios, one of which is marketing. Additionally, any webmaster, researcher, blogger or person with a website can learn about how to improve their site.

Use of clickstream data can raise privacy concerns, especially since some Internet service providers have resorted to selling users' clickstream data as a way to enhance revenue. There are 10-12 companies that purchase this data, typically for about $0.40/month per user. While this practice may not directly identify individual users, it is often possible to indirectly identify specific users, an example being the AOL search data scandal. Most consumers are unaware of this practice, and its potential for compromising their privacy. In addition, few ISPs publicly admit to this practice.

Since the business world is quickly evolving into a state of e-commerce, analyzing the data of clients that visit a company website is becoming a necessity in order to remain competitive. This analysis can be used to generate two findings for the company, the first being an analysis of a user’s clickstream while using a website to reveal usage patterns, which in turn gives a heightened understanding of customer behaviour. This use of the analysis creates a user profile that aids in understanding the types of people that visit a company’s website.

As discussed in Van den Poel & Buckinx (2005), clickstream analysis can be used to predict whether a customer is likely to purchase from an e-commerce website. Clickstream analysis can also be used to improve customer satisfaction with the website and with the company itself. Both of these uses generate a huge business advantage. It can also be used to assess the effectiveness of advertising on a web page or site.
With the growing corporate knowledge of the importance of clickstreams, the way that they are being monitored and used to build Business Intelligence is evolving. Data mining, column-oriented DBMS, and integrated OLAP systems are being used in conjunction with clickstreams to better record and analyze this data.

Clickstreams can also be used to allow the user to see where they have been and allow them to easily return to a page they have already visited, a function that is already incorporated in most browsers.Unauthorized clickstream data collection is considered to be spyware. However, authorized clickstream data collection comes from organizations that use opt-in panels to generate market research using panelists who agree to share their clickstream data with other companies by downloading and installing specialized clickstream collection agents.

Source: http://en.wikipedia.org/wiki/Clickstream

Wednesday, September 1, 2010

Logfile Analysis vs Page Tagging

Ideally Both logfile analysis programs and page tagging solutions are readily available to companies that wish to perform web analytics. In some cases, the same web analytics company will offer both approaches. The question then arises of which method a company should choose. There are advantages and disadvantages to each approach.

Advantages of logfile analysis

The main advantages of logfile analysis over page tagging are as follows:

• The web server normally already produces logfiles, so the raw data is already available. To collect data via page tagging requires changes to the website.

• The data is on the company's own servers, and is in a standard, rather than a proprietary, format. This makes it easy for a company to switch programs later, use several different programs, and analyze historical data with a new program. Page tagging solutions involve vendor lock-in.

• Logfiles contain information on visits from search engine spiders. Although these should not be reported as part of the human activity, it is useful information for search engine optimization.

• Logfiles require no additional DNS Lookups. Thus there are no external server calls which can slow page load speeds, or result in uncounted page views.

• The web server reliably records every transaction it makes. Page tagging may not be able to record all transactions. Reasons include:
o Page tagging relies on the visitors' browsers co-operating, which a certain proportion may not do (for example, if JavaScript is disabled, or a hosts file prohibits requests to certain servers).
o Tags may be omitted from pages either by oversight or between bouts of additional page tagging.

o It may not be possible to include tags in all pages. Examples include static content such as PDFs or application-generated dynamic pages where re-engineering the application to include tags is not an option.

Advantages of page tagging

The main advantages of page tagging over logfile analysis are as follows.

• Counting is activated by opening the page, not requesting it from the server. If a page is cached, it will not be counted by the server. Cached pages can account for up to one-third of all pageviews. Not counting cached pages seriously skews many site metrics. It is for this reason server-based log analysis is not considered suitable for analysis of human activity on websites.
• Data is gathered via a component ("tag") in the page, usually written in JavaScript, though Java can be used, and increasingly Flash is used.

• It is easier to add additional information to the tag, which can then be collected by the remote server. For example, information about the visitors' screen sizes, or the price of the goods they purchased, can be added in this way. With logfile analysis, information not normally collected by the web server can only be recorded by modifying the URL.

• Page tagging can report on events which do not involve a request to the web server, such as interactions within Flash movies, partial form completion, mouse events such as onClick, onMouseOver, onFocus, onBlur etc.

• The page tagging service manages the process of assigning cookies to visitors; with logfile analysis, the server has to be configured to do this.

• Page tagging is available to companies who do not have access to their own web servers.

• Lately page tagging has become a standard in web analytics .