Shopzilla, Inc recently announced that it combined three marketing-centric business units that will operate under the Connexity brand name. Connexity will uniquely combine consumer insights and media buying within the same programmatic platform, helping marketers to learn more about their customers, discover valuable audiences, and activate new consumers at scale. Data from Shopzilla's global portfolio of retail websites connects more than 40 million shoppers with over 100 million products from tens of thousands of retailers and that crucial data is now offering via Connexity. It is important that data was processed as rapidly and efficiently as possible in order to keep up with growing customer engagement. With its 500-terabyte EDW growing by five terabytes a day, Shopzilla’s existing legacy data warehouse had outgrown its capacity, impacting the company’s ability to provide business analytics in a timely and effective manner.
“Our legacy system delivers great performance for analytics and reporting, but didn’t have the bandwidth for the intensive data transformations we needed -- it would take hours to process 100 million products per day,” said Paramjit Singh, director of data for Shopzilla. “We needed enormous processing capabilities, scalability, full redundancy, and extensive storage–at a cost-effective price. Our Cloudera platform provides all that and more, while complementing our current data warehouse system. We were able to reduce latency from days to hours and soon minutes.”
“Cloudera provides an exploration environment for our data scientists that reveals tremendous insights, which would be virtually impossible to obtain otherwise,” explained Singh. “We’re able to answer complex questions on multi-structured data, such as how a user is behaving on a particular site and what ads would be most effective, as well as execute other sophisticated data mining queries. It improves Shopzilla’s ability to provide relevant results to users – a core tenet of our business. Many of the things we do as a business would not be possible without this platform running alongside our Oracle data warehouse.”
This improved processing performance also benefits Shopzilla’s search engine marketing (SEM) activities, allowing the company to score and bid on ten million keywords each day. Reaching over 100 million users, Shopzilla is able to collect billions of data points to create some of the most targeted and rich shopping-intent data available.
“By 2017, US online retail sales will total $434.2 billion. In a data-driven industry such as online retail, which is experiencing such explosive growth, providing profound and timely insights to both shoppers and retailers is key in boosting marketing ROI,” said Alan Saldich, vice president of marketing at Cloudera. “Connecting social and transactional data provides businesses with a 360-degree view of customer behaviors, interactions, interests, and activities in a way that was just not possible before.”
Shopzilla augmented its Oracle EDW with a multi-tenant Cloudera Enterprise system to create a hybrid environment. While Hadoop is the primary engine for data processing and analytics, aggregated data is stored in the EDW using Apache Sqoop for reporting on the back end. Users can access Cloudera Enterprise directly using Apache Pig and Apache Hive, and Shopzilla plans to upgrade to Cloudera Impala and Apache Spark in the near future.
A Cloudera-powered enterprise data hub delivers the most secure, managed, governed, and open data management platform to give customers a choice over legacy data management for storing, accessing, and analyzing any amount and any kind of data in one centralized repository. Cloudera has all of the key attributes necessary for customers to make data the true focal point of any business.