Data Webhouse Toolkit w/WS: Building the Web-Enabled Data Warehouse - Softcover

Kimball, Ralph

 
9780471376804: Data Webhouse Toolkit w/WS: Building the Web-Enabled Data Warehouse

Inhaltsangabe

"Ralph's latest book ushers in the second wave of the Internet. . . . Bottom line, this book provides the insight to help companies combine Internet-based business intelligence with the bounty of customer data generated from the internet."--William Schmarzo, Director World Wide Solutions, Sales, and Marketing,IBM NUMA-Q.
 
Receiving over 100 million hits a day, the most popular commercial Websites have an excellent opportunity to collect valuable customer data that can help create better service and improve sales. Companies can use this information to determine buying habits, provide customers with recommendations on new products, and much more. Unfortunately, many companies fail to take full advantage of this deluge of information because they lack the necessary resources to effectively analyze it.
 
In this groundbreaking guide, data warehousing's bestselling author, Ralph Kimball, introduces readers to the Data Webhouse--the marriage of the data warehouse and the Web. If designed and deployed correctly, the Webhouse can become the linchpin of the modern, customer-focused company, providing competitive information essential to managers and strategic decision makers. In this book, Dr. Kimball explains the key elements of the Webhouse and provides detailed guidelines for designing, building, and managing the Webhouse. The results are a business better positioned to stay healthy and competitive.
 
In this book, you'll learn methods for:
- Tracking Website user actions
- Determining whether a customer is about to switch to a competitor
- Determining whether a particular Web ad is working
- Capturing data points about customer behavior
- Designing the Website to support Webhousing
- Building clickstream datamarts
- Designing the Webhouse user interface
- Managing and scaling the Webhouse
 
The companion Website at www.wiley.com/compbooks/kimball provides updates on Webhouse technologies and techniques, as well as links to related sites and resources.

Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.

Über die Autorin bzw. den Autor

RALPH KIMBALL, PhD, has been a leading visionary in the data warehouse industry since 1982 and is one of today's most well-known speakers, consultants, and teachers. He writes the "Webhouse Architect" column for Intelligent Enterprise magazine and is the author of the bestselling books The Data Warehouse Toolkit and The Data Warehouse Lifecycle Toolkit (both from Wiley).
RICHARD MERZ is Director of Engineering for the WebCom division of Verio, the world's largest Web hosting company. He has a strong hands-on background in data warehouse architecture and applications, and has been managing the development of Web and e-commerce software for the past five years.

Von der hinteren Coverseite

"Ralph's latest book ushers in the second wave of the Internet. . . . Bottom line, this book provides the insight to help companies combine Internet-based business intelligence with the bounty of customer data generated from the internet."-William Schmarzo, Director World Wide Solutions, Sales, and Marketing,IBM NUMA-Q.
 
Receiving over 100 million hits a day, the most popular commercial Websites have an excellent opportunity to collect valuable customer data that can help create better service and improve sales. Companies can use this information to determine buying habits, provide customers with recommendations on new products, and much more. Unfortunately, many companies fail to take full advantage of this deluge of information because they lack the necessary resources to effectively analyze it.
 
In this groundbreaking guide, data warehousing's bestselling author, Ralph Kimball, introduces readers to the Data Webhouse-the marriage of the data warehouse and the Web. If designed and deployed correctly, the Webhouse can become the linchpin of the modern, customer-focused company, providing competitive information essential to managers and strategic decision makers. In this book, Dr. Kimball explains the key elements of the Webhouse and provides detailed guidelines for designing, building, and managing the Webhouse. The results are a business better positioned to stay healthy and competitive.
 
In this book, you'll learn methods for:
* Tracking Website user actions
* Determining whether a customer is about to switch to a competitor
* Determining whether a particular Web ad is working
* Capturing data points about customer behavior
* Designing the Website to support Webhousing
* Building clickstream datamarts
* Designing the Webhouse user interface
* Managing and scaling the Webhouse
 
The companion Website at www.wiley.com/compbooks/kimball provides updates on Webhouse technologies and techniques, as well as links to related sites and resources.

Aus dem Klappentext

All the proven testing tools and techniques you'll need to ensure that your applications work exactly as they're supposed to!
 
Can you guarantee that the software your company develops works as intended? It's essential that you know the proper techniques for testing software, otherwise you could face lost productivity, lost revenue, and customer dissatisfaction.
 
Leading software testing expert William Perry takes you through a comprehensive eleven-step testing process that contains all of the components you'll need to evaluate your software. This testing process includes numerous workpapers and checklists designed to lead you through all aspects of software testing and can be customized to meet the needs of your organization or for a specific test assignment.
 
From establishing a test strategy to selecting and using testing tools, you'll also find helpful guidelines on how to build an effective testing environment. This includes self- assessments designed to improve deficient capabilities of your software development process and deficient competencies of software testers.
 
Detailed test programs featured in this second edition include:
* Internet/Intranet applications
* Off-the-shelf software
* Multiplatform environments
* System security
* Data warehouse applications
* Client/server systems
* Rapid application development

Auszug. © Genehmigter Nachdruck. Alle Rechte vorbehalten.

Chapter 4: Understanding the Clickstream as a Data Source

One of the sources of data that will feed our data Webhouse is the HTTP clickstream itself - the log records produced by the Web server each time a request is satisfied. In this chapter we'll discuss the content of the clickstream and ways of handling the enormous volume of data that will be generated by a busy Website. We will introduce a clickstream postprocessor that receives raw log data from a Web server and normalizes it into a format in which it can be combined with application-derived data and piped into the data Webhouse. The database volumes required for log processing at an active Website can be equated to the billing system of a large telephone company, both in volume and in complexity. Part Two of this book presents detailed architectures for databases that are capable of event tracking and content delivery for high-activity Websites.

In this chapter we describe how customers and Websites communicate with each other. We also show you how some important third parties like banner ad providers and customer profilers, attach to your session and become part of the available data. We study in some detail how much information can be derived from a cookie and what the limitations of even a "good" cookie may be. We describe what is known as "referral" information, which is a potentially amazing source of insight into why the user arrived at your Website. From the referral information we should be able to sort out the customers who arrived for the right reasons, customers who arrived for the wrong reasons, and perhaps what all of these customers were thinking about when they entered your site. We conclude the chapter by proposing an architecture for processing all of this data in the back room before it can become available in our databases for analysis. Before we describe the specific data elements in the clickstream, it might be useful to review how a Web browser and Website interact.

WEB CLIENT/ SERVER INTERACTIONS - A BRIEF TUTORIAL Understanding the interactions between a Web client (browser) and a Web server (Website) is essential for understanding the source and meaning of the data in the clickstream. Please refer to Figure 4.1 in this discussion. In the illustration we have shown a browser, designated My Browser. We'll look at what happens in a typical interaction from the perspective of myself as a browser user. The browser and Website interact with each other across the Internet using the Web's communication protocol - HyperText Transfer Protocol (HTTP).

Basic Client/ Server Interaction First, I click a button or hypertext link (URL) to a particular Website, shown as action (1) in Figure 4.1. When this HTTP request reaches the Website the server returns the requested item (2). In our illustration, this is a document in hypertext markup language format (HTML) - your-page.html. Once the document is entirely retrieved, my browser scans your-page.html and notices several references to other Web documents that it must fulfill before its work is completed; the browser must retrieve other components of this document in separate requests. Note that the only human action taken here is to click on the original link. All of the rest of the actions that follow in this example are computer-to-computer interactions triggered by the click and managed, for the most part, by instructions carried in the initially downloaded HTML document, your-page. html. In order to speed up Web page responsiveness most browsers will execute these consequential actions in parallel, typically with up to ten or more HTTP requests being serviced concurrently. The browser finds a reference to an image--a logo perhaps--which, from its URL, is located at your-site. com, the same place it retrieved the initial html document. The browser issues another request to the server (3) and the server responds by returning the specified image.

Advertisements The browser continues to the next reference in your-page. html and finds an instruction to retrieve another image from Website banner-ad.com. The browser makes this request (4), and the server at banner-ad.com interprets a request for the image in a special way. Rather than immediately sending back an image, the banner-ad server first issues a cookie request to my browser, requesting the contents of any cookie that might previously have been placed in my PC by banner-ad. com. The ad Website retrieves this cookie, examines its contents, and uses the contents as a key to determine which banner ad I should receive. This decision is based on my interests or on previous ads that I had been sent by this particular ad server. Once the banner-ad server makes a determination of the optimum ad, it returns the selected image to me. The advertisement server then logs which ad it has placed along with the date and the clickstream data from my request. Had the banner-ad server not found its own cookie, it would have sent a new persistent cookie to my browser for future reference, sent a random banner ad, and started a history in its database of interactions with my browser.

The Referrer The HTTP request from my browser to the banner-ad server carried with it a key piece of information known as the referrer. The referrer is the URL of the agent responsible for placing the link on the page. In our example, the referrer is "your-site. com/ yourpage. html". The referrer is not a browser. Because banner-ad. com now knows who the referrer was, it can credit your-site. com for having placed an advertisement on a browser window. This is a single impression. The advertiser can be billed for this impression, with the revenue being shared by the referrer (your-site.com) and the advertising server (banner-ad. com). If you are sharing Web log information with the referring site, it will be valuable to share page attributes as well. In other words, not only do you want the URL of the referring page, but you would like to know what the purpose of the page was. Was it a navigation page, was it a partner's page, or was it a general search page?

The Profiler While the ad server deals primarily in placing appropriate content, the profiler deals in supplying demographic information about Website visitors. In our example, the original HTML document, your-page.html had a hidden field that contained a request to retrieve a specific document from Website profiler.com (5). When this request reached the profiler server, the profile.com server immediately tried to find its cookie in my browser. This cookie contained a userID that had been placed previously by the profiler, which is used to identify me, and serves as a key to personal information contained in the profiler's database. The profiler might either return its profile data to my browser to be sent back to the initial Website, or send a real-time notification to the referrer, your-site.com via an alternative path advising the referrer that I am currently logged onto his site and viewing a specific page (6). This information could also be returned to the HTML document to be returned to the referrer as part of a query string the next time an HTTP request was sent to your-site.com.

Composite Sites Although Figure 4.1 shows three different sites involved in serving the contents of one document, it is possible, indeed likely, that these functions will be combined into fewer servers. It is likely that advertising and profiling be done within the same enterprise, so a single request (and cookie) would suffice to retrieve personal information that would more precisely target the ads that are returned. It is equally possible that a Web page contains references to different ad/ profile services, providing revenue to the referrer from multiple sources.

PROXY SERVERS AND BROWSER CACHES When a browser makes an HTTP request, that request is not always served from the server specified in a URL. Many Internet Service...

„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.