Missed the first part? Check it out here
1. Data Cleaning
Virtually all VOC data requires cleaning of some kind. This typically involves normalization of text data as well as imputation of missing quantitative data using machine learning techniques such as random forest imputation.
2. Exploratory Data Analysis (EDA)
VOC data – and any customer data, for that matter – needs substantial exploration to understand what’s in the data. EDA identifies anomalies, outliers, trends, distributions, statistical characteristics, and the overall shape and nature of the data. How much of it is there? How much of it is usable?
3. Text Mining/Natural Language Processing
VOC data is primarily in the form of unstructured text, such as interview transcripts, chat logs, transcribed audio, etc. Natural language processing techniques, from basic word frequency and n-gram extraction to more sophisticated techniques like vectorization and transformers, extract value from text data, effectively transforming it into quantitative data that can be analyzed mathematically. From that analysis, we extract information such as sentiment, emotional valence, semantically related topics, and even inferences.
4. Forecasting and Predictive Analytics
VOC data by itself presents a historical snapshot of where the customer’s mind was at the point of data collection. Using the previously mentioned techniques, we examine the changes in data over time and incorporate other credible third-party data sources to forecast important VOC metrics, such as search intent for specific language or customer satisfaction numbers from survey data. Techniques used in forecasting range from very simple linear regression to ARIMA-based statistical models and deep learning/neural models at the highest levels of sophistication.
5. Regression Analysis
Some VOC metrics have predictive power, but understanding which metrics matter requires techniques such as advanced regression analysis. Technical tools such as linear regression, logistic regression, stepwise regression, Elasticnet regression, gradient boosting, gradient descent, multiple regression subset analysis, and neural networks make regression models more robust and informative for understanding what matters most in a dataset.
Up next: The 11 components of Voice of Customer Data
Need help with your marketing data and analytics?
You might also enjoy:
Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!
Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new 10-minute or less episodes every week.