An Introduction to Statistical Modeling for Hotels


Do logical and hospitality mix? For an industry that prides itself on the emotional experience, such left brain enterprises like statistical analysis seem all too illusory. Yet, such data-driven mathematics can yield tremendous results, first and foremost by narrowing your target market to maximize the efficacy of your marketing engine.
I’ve employed Virtual DBS (, a company that specializes in these sorts of deductive analyses, to lend their voice to the discussion, both in terms of a thorough explanation of what they do as well as outlining some of the key benefits of statistical modeling. For this, I sought John Dodd, Executive Vice President, for an in-depth Q&A.
What is the purpose of analytical modeling? Can you describe the basics of it?
The purpose of analytical modeling is typically twofold:
  1. Better Understand Customers or Responders: By statistically profiling a group of customers, respondents and other segments using demographic information appended from third-party compiled data sets, we provide a highly illustrated but mathematically precise explanation of the most prominent characteristics of the customers and prospects with whom our clients communicate. This can enable more relevant copywriting, more efficient media purchasing, more targeted direct marketing and more persuasive creative.
  2. Boost Response Performance: Using modeled mailing lists, email lists, or telemarketing lists scored with predictive modeling algorithms, our clients achieve a statistically significant lift in response performance over established baseline in their direct marketing efforts. This returns a higher volume of new customers from smaller target populations – essentially, saving money and making money at the same time.
What are you looking for in an ideal consumer? 
Typically, the best consumer is one who spends a great deal of money over a long period of time, with high purchase frequency and the most recent purchase having occurred just yesterday (i.e. very high Recency-Frequency-Monetary-Tenure or RFMT). However, every modeling initiative is customized for the specific business rules of each client. As such, the characteristics of an ‘ideal’ consumer vary widely according to the requirements of our clients. In many cases, the consumer isn’t even a consumer – it’s a business (i.e. B2B modeling).
How do you go about pinpointing these ‘cream of the crop’ customers? 
Typically, we start by segmenting our clients’ customers into four roughly equivalent quartiles (i.e. 25% ‘buckets’) or tiers – Platinum, Gold, Silver and Bronze – based on their specific buying behaviors using RFMT.
We profile and index the Platinum, Gold, Silver, and Bronze groups against a statistically relevant prospect universe across a host of demographic (B2C) or firmagraphic (B2B) elements, and then develop a predictive scoring algorithm that ranks prospects based on their degree of resemblance to the platinum customers.
Essentially, this type of model statistically compares and contrasts our clients’ B2B or B2C Platinum, Gold, Silver and Bronze customer groups against each other and against their prospect universe. It includes dozens of charts and graphs that detail the demographic or firmagraphic attributes most prominent of the ‘best’ customer group (i.e. the Platinum customers) and produces a multivariate logistic regression scoring equation that identifies high probability prospects that look most like the Platinum customers. This equation can be applied to virtually any targeted marketing list.
How do your methods differ between B2C and B2B analyses?
B2C models are based on consumer characteristics like age, income, wealth and ethnicity among many others, while B2B models are based on their business counterparts, including industry type (SIC Code, NAICS Code), annual sales, number of employees and many other attributes.
Where do you get your data from?
Virtual DBS receives regularly scheduled updates from multiple U.S. data compilers, and thus has the ability to fill in the gaps found within individual data sources, delivering the most comprehensive coverage possible. Through our powerful merge-purge capabilities, we are able to ‘de-dupe’ these files against each other in order to provide unique records only. Our B2B files include:
  • Total Unique Business Locations De-duped: Approximately 20 Million
  • Total Unique Contacts De-duped: Approximately 40 Million
Because we provide such a high volume of data (maintaining upwards of 1 billion records on our servers), our supply and demand dynamics enable us to price our data at the most competitive rates available. Our B2C files include:
  • Total Unique Consumer Households: 150 Million
  • Hundreds of demographic, geography, lifestyle, and behavioral attributes
As a result of this multi-source approach, our clients receive the most complete market coverage available, as well as a distinct qualitative advantage through the cross-validation of our multiple sources against one another, thus optimizing the demographic accuracy, postal deliverability, and phone contact-ability of the records. We can use this data to append demographics in support analytics, predictive modeling, market analysis, targeted marketing, and many other applications.
Do you enter an analysis with any pre-conceived hypotheses? Or, by letting the data speak for itself, how do you decipher significant relationships amongst the individual clusters?
We try to be as objective as possible upon initiating a new modeling effort. However, we always rely on the anecdotal or qualitative experiences of our clients in formulating the strategies and methodologies associated with a modeling project.
Typical methodologies employed in our modeling efforts include CHAID (Chi-Square Automatic Interaction Detector), ANOVA (Analysis of Variance), MANOVA (Multivariate Analysis of Variance, Correlation Analysis, Cluster Segmentation, Factor Analysis, Discriminant Function Analysis, and Logistic Regression Analysis. In addition, we incorporate a cross-validation approach to our modeling initiatives in that a random sample containing approximately 60% to 70% percent of a client’s customer or responder records is initially drawn to build the model. Findings are then validated on the remaining records.
Obviously, a short Q&A does not provide for a full explanation of these methodologies. Hence, we typically engage our clients in a detailed and highly collaborative process of requirements gathering before embarking on any new modeling projects.
What distinguishes your company’s modeling techniques?
Virtual DBS provides a wide variety of predictive modeling and analytics services to our clients, delivering a number of key advantages over other analytics providers, including:
  1. Value Proposition: One of the tremendous advantages to working with Virtual DBS is that our custom modeling services are priced such that the full value of the modeling fee applies as a ‘credit’ when you commit to using our scoring services, data append services, or highly targeted marketing lists.
  2. Highly Collaborative Interaction: All Virtual DBS modeling initiatives are undertaken with the utmost collaboration, including ongoing facilitation by PhD statisticians from discovery, to commencement, through debriefing, and continuing on to scoring and ongoing ‘tweaking’ of the model for optimal performance.
  3. Full Transparency: Virtual DBS provides complete disclosure of the findings, methodologies, data attributes, and related elements to all of our models. The only exception to this ‘sunshine policy’ would be in migrating scoring algorithms outside of our data production environment, because the appended data elements, software configurations and related scoring mechanics are so highly customized, it is impractical to transport scoring algorithms outside of data center.
  4. New Testing Paradigm: Because direct marketing is an empirical advertising channel, driving response through numerous channels (web, mail, phone), the opportunities presented to our clients to implement systematic, mathematically precise testing in a conservative ‘crawl-walk-run’ approach are quite significant. The modeling approach employed by Virtual DBS presents an optimal platform by which our clients can initiate ongoing test efforts, gain new response intelligence with maximum statistical confidence while minimizing risk.
  5. Robust Scoring Reports: Each time a scoring job is completed, Virtual DBS provides detailed model validation summaries, gains charts depicting expected response lift to be enjoyed through the scoring, meticulous counts by decile, demi-decile and percentile, and full debriefing.
Can you describe the ‘stop-pause’ method? What are its benefits?
As the analytics data production and model development process unfolds, Virtual DBS schedules regular ‘stop-pause’ points in the timeline to ensure consensus with our clients at each key milestone. These points include a thorough ‘data pre-briefing’ kick-off meeting upon commencement of the project, distribution of detailed waterfall reports and related summary charts upon completion of the analytics file build (i.e. data hygiene, match-and-append and field derivations). This also includes a formal debriefing upon completion of the model with a full review of the detailed profile and scoring algorithm prior to finalizing the composition of the score, thus allowing for ‘tweaking’ as necessary prior to executing live scoring jobs.
Can you describe the ‘inside/outside’ technique? 
Because Virtual DBS maintains a multi-source, comprehensive marketing database of virtually every consumer and every business in the US – including several hundred demographic, lifestyle and behavioral attributes – we have the distinct luxury of employing all of these ‘outside’ data elements in conjunction with the proprietary ‘inside’ elements supplied by our clients (product details, transaction values, purchase dates, customer tenure, etc.). This ‘inside/outside’ data approach brings the maximum breadth of intelligence into our modeling efforts, thus bringing maximum predictive power into our propensity scores.
What’s the V-Profiler predictive modeling, and how do you generate its findings?
V-Profiler is a ‘lock-and-load’ modeling tool that Virtual DBS uses to create a predictive model on a very short turnaround time (i.e. just a few days) and at a very low price point. Basically, we take our clients’ customer database (name and address elements), and ‘plug and play’ the data into the V-Profiler platform, which runs an automated demographic match-and-append process, statistically comparing and contrasting the customer records against the relevant reference universe of non-customer prospects in the same vertical markets as the customers. We generate its findings as a PowerPoint deck output from the V-Profiler tool, including a logistic regression predictive scoring algorithm.
How do your analyses interpret social media platforms?
We consider our statistical methodologies to be ‘data agnostic’ in the sense that we can use any and all customer data in our models, provided that the data sets are relevant. Hence, if our clients’ customer or responder data sets include social media characteristics, we can certainly include those attributes in the model.
What are your hopes for the company’s future?
Virtual DBS and our antecedent organizations have been recognized among the fastest growing and most innovative companies in the database and analytics industry in general, and the state of Rhode Island in particular. It is our expectation (and our experience) that we will maintain double-digit growth year-in and year-out in order to keep this trajectory moving up.
Where is the industry headed? What about integration with social media?
While the database marketing and analytics industry continues to morph as new media channels are unveiled (including social media), the mathematics underpinning sound predictive modeling are based on tried-and-true objective statistical principles that have proven to be virtual inviolable for many decades (and, in some cases, even centuries). It is our expectation as a nimble and adaptive organization that we will continue to delivery outstanding results regardless of where new technologies take us.
(Article published in eHotelier on September 26, 2012)


Larry MogelonskyAn Introduction to Statistical Modeling for Hotels