24
$\begingroup$

When dealing with trade data, for example from TAQ, a common problem is that of determining whether a trade was a buy or a sell. The most commonly used classifier is the Lee-Ready algorithm (Inferring trade direction from intraday data, 1991). Unfortunately, this method is known to be inaccurate: Lee and Radhakrishna (Inferring investor behaviour: evidence from TORQ data, 2000) report that Lee-Ready incorrectly classifies 24% of the trades inside the spread.

How to improve on Lee-Ready's recipie? What are the best algorithms for trade classification?

$\endgroup$
2
  • $\begingroup$ On every trade there is both a buyer and a seller. When you talk of a buy or sell do you mean which side was the aggressor? ( removed liquidity). In the case of intraspread prints even that gets pretty blurry since you have to consider internalizers and pay for order flow models.... $\endgroup$
    – PabTorre
    Commented Sep 28, 2013 at 5:28
  • $\begingroup$ Have a look at the VPIN papers for their method. $\endgroup$ Commented Oct 2, 2013 at 14:12

4 Answers 4

18
$\begingroup$

There are a few approaches to use trade prices and quotes to classify the aggressor as "buy" or "sell." Also, many of these methods have historically had to deal with unsynchronized data streams.

Classification of Approaches

We can largely break trade classification methods into four types:

  • Tick tests, advocated by Finucane (2000), compare a trade price to the previous differing trade price with upticks (downticks) being evidence of a buy (sell);
  • Midpoint tests, advocated by Lee and Ready (1991), compare a trade price to a contemporaneous midpoint (average of bid and ask prices) with trades above (below) the midpoint being classified as buys (sells) and trades at the midpoint being resolved with a tick test;
  • Bid/ask tests, advocated by Ellis, Michaely, and O’Hara (2000), compare a trade price to the contemporaneous bid and ask prices with trades at the ask (bid) being classified as buys (sells) and other trades being resolved by a tick test; and,
  • Modeled tests, advocated by Rosenthal (2012), incorporate all of the above tests in a linear model as well as accounting for autocorrelations and uncertainty about what contemporaneous quotes were at the time of trading.

The Fallback Methods

One of the issues with the LR and EMO approaches is they may be indeterminate: a trade may occur that is not at the contemporaneous bid or ask and may even occur at the midpoint. Both approaches the default to the tick test -- since they are mandated to classify every trade.

Chakrabarty, Li, Nguyen, and Van Ness (2007) modify the fallback rule of EMO: they use the tick test for trades outside the spread or in the middle 40% of the spread; a trade price within the lowest 30% inside the spread is treated as trading at the bid (and so classified as a sell) and a trade price within the highest 30% is treated as trading at the ask (and classified as a buy).

Rosenthal (2012) takes evidence from the tick test, midpoint test with no fallback, and bid/ask test without fallback -- as well as lagged versions of these tests. Furthermore, the bid/ask test term uses a metric for proximity to the bid or ask, which is similar to Chakrabarty et al albeit not asymmetric in how it handles trades outside the spread.

Contemporaneous Quotes?

Another problem mentioned in many of these methods is that trade and quote streams are not synchronous: quote updates may occur at different times than when trades are published. Typically, quotes in these situations are updated quickly while trades are published with some delay.

Some exchanges have justified this by saying the delay allows market makers time to hedge. However, regulation alone offers differing incentives: quotes must always be up-to-date and traders are held to trading at their quotes... while trades must be published within a short timeframe (often on the order of seconds, a relative eternity in modern markets).

What Delay to Use?

The LR method assumes a 5-second delay from quote updates to trade publishing; thus, they look back 5 seconds from trade times. (Note that this is inaccurate in the databases they used since those records only had resolution to the second.) The EMO method uses no delay even though they admit the data have a delay. Vergote (2005) suggests using a 2-second delay, while Henker and Wang (2006) suggest a 1-second delay.

Using a Delay Distribution

The modeled method estimates a delay distribution and uses that to estimate the bid and ask (and hence midpoint) contemporaneous with the time of trading. That distribution suggests mean delays of 5 seconds for Nasdaq stocks and 0.8 seconds for NYSE stocks, with standard deviations of 3.9 seconds and 1.0 seconds.

Searching for the Best Match Among Quotes

A novel approach that ignores delay times is taken by Jurkatis (2018, third essay). This approach modifies the EMO method to instead finds prior quotes whose updates are best matched to the trade to determine whether the trade occurred at the bid or ask. The method also allows for multiple possible matches.

Better Data

Some markets do not have asynchrony between trade and quote data. For example, data from the CME has trades and order book updates in the same stream. In these data, quotes are updated immediately after a trade and trades indicate whether the buyer or seller was the aggressor.

While equities data now has millisecond- or microsecond-resolution timestamps, many equity venues still have delays between trade and quote data streams. While delays have decreased in equity markets, the number of trades has also increased making the delay problem still of concern.

Unusual Results for Odd-Lot Trades

Rosenthal notes that odd-lot trades (trades of a size below a "round lot," typically 100 shares) are harder to classify that round-lot trades. Odd-lot trades at the midpoint are misclassified by most methods to the extent that a "coin flip" would be more accurate. The fact that odd-lot orders were not protected by order handling and display rules is suggested as a reason for the difference in classification accuracy.

O'Hara, Yao, and Ye (2014) note that odd-lot orders are also often missing from data sources such as TAQ and that these orders tend to be associated with more trading based on information asymmetry.

While some venues have begun reflecting odd-lot orders in their data feeds (thus respecting order handling and display rules for odd-lot orders), that is still not universal. As of now, Nasdaq may aggregate odd-lot orders so long as their aggregation is at least a round-lot.

Bulk Volume Classification

Easley, Lopez de Prado, and O'Hara (2016) propose to instead classify groups of trades, which they refer to as bulk volume classification. This approach (and others like it) face a few problems.

First, they are solving a different problem that that posed here. Many different problems in market microstructure require classifying individual trades as buyer- or seller-initiated. Classifying a group of trades estimates the distribution of trade initiators (buyers vs sellers), but it does not classify individual trades. That makes these methods unhelpful for many inferences.

Second, these methods also are not clearly superior to the individual classification approaches mentioned. Andersen and Bondarenko (2015) and Chakrabarty, Pascual, and Shkilko (2015) show that the BVC method has lower classification accuracy than the tick test or LR methods.

Performance

Finally, how do these methods perform? Across many of the articles referenced, there is a consistent pattern of performance:

  • BVC tests do the worst,
  • tick tests do better than BVC tests,
  • midpoint (LR) tests do better than tick tests,
  • quote (EMO) tests do better than midpoint tests, and
  • the Rosenthal (modeled) and Jurkatis methods do the best of all.

Unfortunately, the modeled and Jurkatis methods have not been compared.

$\endgroup$
2
  • $\begingroup$ Great answer! By now my favourite is ... Get better data. $\endgroup$
    – Ryogi
    Commented Aug 1, 2020 at 1:34
  • 1
    $\begingroup$ True, sometimes the data we have are important. Many venues do not reveal the aggressor. We also still need these methods for looking at data from before the aggressor was revealed. That's crucial if we want to, say, estimate price impact models historically to quantify how a new regulation or policy affects the cost of trading. $\endgroup$
    – kurtosis
    Commented Aug 7, 2020 at 17:00
4
$\begingroup$

Don't use TAQ. The reporting times of the trades can be a few seconds delayed. Use the exchange feeds. There you can see which order crossed the spread. The only issue that you will run into is hidden orders. In that case you simply can't tell. (eg: The mkt is 30.00x 30.02 and then you see 30.01 trade. You don't have any way to tell if there was a hidden offer or bid.

For trades from dark pools they are really not possible to track since they all go to the ADF with a potential delay and there's no published book.

$\endgroup$
4
$\begingroup$

Check out this job market paper "Inferring trade directions in fast markets" (3rd in this PhD thesis).

The author uses all possible information on quotes changes to match trades with the corresponding quote. The algorithm is said to outperform the common Lee and Ready (1991), Ellis et al. (2000) and Chakrabarty et al. (2007) algorithms.

$\endgroup$
2
$\begingroup$

The mentioned methods (except for Rosenthal) are implemented in python here: https://github.com/jktis/Trade-Classification-Algorithms

$\endgroup$
1
  • $\begingroup$ Please disclose your affiliation. $\endgroup$
    – Bob Jansen
    Commented Sep 22, 2020 at 7:25

Not the answer you're looking for? Browse other questions tagged or ask your own question.