Finnhub is an American company with people working in New York, Mumbai, Sydney, and Ho Chi Minh to source, clean and serve the right financial data to our customers. Finnhub make use of state-of-the-art machine learning algorithms to collect, clean, and standardize data across global markets. With data centers around the globe and a diverse workforce, Finnhub provide high quality data with easy access to the biggest clients in the industry ranging from hedge funds, mutual funds to investment banks and S&P companies.
We will be using Finnhub's earnings call audio and transcripts in this analysis. You can sign up for a free Finnhub API account here
Sample API call for data
Our data set has 100,000 earnings call transcripts + audio from 2005 collected by Finnhub and transcribed automatically by their Machine learning algorithm. 86% of the data is quarterly earnings call and the rest are semi-annually calls. The data is in json format and is seprated by speaker which makes it easy to do sentiment analysis on each executive.
We can measure the sentiment in the words used by the participants, especially the CEO. The classic way to measure this sentiment is to use a dictionary developed by Loughran and McDonald in 2011. The special sauce of this dictionary is that it takes into account the fact that CEO and executives will try to deceive algorithm. There are two noteworthy results. First, aggregate conference call sentiment tends to track the MSCI World Index, which suggests direct correlation between the tone used by participants on the conference calls and market sentiment. Second, the prepared remarks by company managers are consistently more positive than the analyst Q&A that follows, a result that will not come as a huge surprise to anyone who has listened to the carefully-vetted opening monologues that companies prepare these days. More research with the data can be found here
Guerard, Markowitz, and Xu  tested several optimization techniques: (1) total risk minimization with no reference to systematic risk, denoted MVM59, which uses mean-variance analysis with a fixed maximum weight (4%) with the same optimization conditions; (2) mean-variance analysis with tracking error at risk, denoted MVTaR; and (3) an equal active weighting tracking error at risk with a maximum deviation of two percent, denoted EAW2TaR. The EAW2TaR optimization technique weights the systematic risk at three times the importance of specific risk and EAW2TaR where x in constraints equation (7) is set to be 2. MVM59, MVTaR, and EAW2TaR have proved to be effective techniques in real-world portfolio construction and management. We seek to maximize the geometric mean of the portfolios, consistent with Latane  and Markowitz 1. We refer to the creation of portfolios with a multifactor model and the generation of the efficient frontier as a Level II test of portfolio construction.
We conducted attribution analysis on the optimized portfolios to assess the contributions made by stock-selection and factor exposures to active risk and returns. Exhibit 5 shows the results for the Analyst Q&A factor, based on a mean-variance optimization. The left-hand chart decomposes the active return into the specific portion, due to stock-selection ability, and the portion due to factor and style exposures. The specific return is positive and significant at the 5% level while all other contributions are insignificant, which strongly suggests the majority of observed alpha in the signal is being produced by picking the correct stocks, and not as the byproduct of embedded factor exposures. The right-hand chart shows the same decomposition for the active risk of the portfolio.
Check out this Finnhub's github page for more details and some of their effort to raise awareness amongst the community of Finnhub here and here.