10/19/2021 | Press release | Archived content
This blog was prepared and written by Geoff Kraker, Technical Application Specialist - Software Platforms, Cytek Biosciences
When planning a long-term study or clinical trial that includes collecting and analyzing samples on a flow cytometer across weeks, months, or years, what are some steps you can take to mitigate the impact of batch effects on the resulting analysis? Here we describe batch effects and how to identify them in your data, tips to prevent them, and possible fixes.
A batch effect is a measurement that has a qualitatively different behavior across experimental conditions while being unrelated to the scientific variables in the study. Some real-life examples of what can cause batch effects include:
These are just a few examples, and while there are almost limitless sources of batch effects, it's possible to eliminate the most likely sources through diligent experimental planning.
Batch effects matter because they can blunt the findings of a study, confound the possible conclusions, and even worse, potentially supplant the presumed experimental source of change as the main conclusion of the study [1]. All these sources often present themselves by proxy, either by experimental group or processing date. This means that the signal across time will change, and it will take some investigating to uncover the real source of the variation between batches as "time between batches" alone is very often not the root cause of the issue.
One of the most simple and effective ways to combat batch effects is to include a "bridge", "anchor", or "validation" sample in each batch. The goal is to have a consistent sample present in each batch so batches can be compared and any shift in the results can be visualized and quantified. How to accomplish this will be addressed later, but it bears emphasizing that this is a simple and effective measure that should be employed in most, if not all, longitudinal studies.
The first step is to determine if there are batch effects in the data set. There are several ways to do this - ranging from simple qualitative approaches to algorithmic-driven evaluation. Here we cover a range of choices.
This is data from three samples acquired over two months and run on the same instrument. Green and Orange were run just a few days apart, while Blue was run7 weeks later. If most, or all, of the populations appear in a similar but not identical location in the plot, there's a chance that this shift is caused by a batch effect. This shift could also be due to biological variation between samples, however that may appear as a difference in only a few of the islands.
This next example contains four files overlaid from a different study with very little change in cluster positions. All four samples were stained and run on the same instrument on the same day. The arrow shows a good example of the difference in population abundance across samples - the middle of the island isn't shifting; the blue sample just has fewer events of that particular phenotype than the orange sample. As above, a batch effect can be especially obvious on a dimensionality reduction plot if samples from the same batch are concatenated (real or virtually) and displayed overlaid on other batches. If the difference between batches needs to be quantified, it's possible to calculate the Jensen-Shannon Divergence of the UMAP/tSNE parameters between samples/batches to get a quantitative comparison of each batch's island positions. The Jensen-Shannon (JS) divergence is an information theory-based, symmetric measure of the similarity between two probability distributions [3] [4] [5].
The best way to fix a batch effect is to stop it before it becomes a problem - an ounce of prevention is worth a pound of cure as the saying goes. It's not possible to completely eliminate all sources of batch variations but implementing a few measures at the start of a study can save time and trouble later when analyzing the data.
Start with experiment planning and control over study execution. Make sure everyone involved with the study (physicians, clinical coordinators, techs, shared resource facilities, etc.) are all on the same page when it comes to sample timing and standard operating procedures. Although it sounds elementary, making sure things like keeping similar timing from bedside to bench and collecting blood in the same type of anticoagulant are vital to the downstream quality of the samples.
It is also incredibly important to ensure that all reagents are titrated correctly for the number and type of cells expected in the samples. If the antibodies are titrated on 100,000 cells per test, and a patient sample comes in with 5,000,000 cells, the sample will likely be understained. It may be worth performing a cell count and then normalizing cell counts per sample to ensure even staining.
A relatively simple way to address batch effects is to ensure a specific peak in a bead control falls near (or in) the same channel on the cytometer before experimental sample acquisition. The overarching goal is to take a particle with a fixed fluorescence and make sure it is detected at the same level before each batch is acquired, leading to consistency across batches from a detection standpoint. Many cytometers including the Cytek® Aurora and Cytek® Northern Lights™ systems have built-in QC programs with this functionality. While this helps control for day-to-day instrument variation, it won't reduce variability in sample preparation or staining. Since changes can happen over the course of the day as well, it is best practice to run an MFI target value test before every sample in the study to verify the cytometer is detecting your channel/s of interest in the correct range.
Another simple way to combat batch effects is to make sure experimental groups are mixed across acquisition sessions. Acquiring all control samples on one day, all group 1 samples the next day, and then all group 2 samples on the final day is a great way of introducing batch effects to an otherwise well controlled experiment. If samples are banked, randomizing which samples are included in which acquisition session is a good way to minimize batch effects. Other experimental design suggestions and best practices are available in Thomas Liechti's CYTO U Lecture from October 2020. [8]
To eliminate batch effects from the staining and acquisition process, fluorescent cell barcoding can be employed. This technique involves uniquely labeling each sample with a set of fluorescent tags, mixing the samples together, staining them all in a single tube, washing, and then acquiring the single tube that contains all samples (or a batch of samples). After acquisition, the data is then de-barcoded by plotting the barcoding channels against each other and drawing gates around each "population" which equates to each original sample.
These papers by Peter Krutzik et al. [9] and David Earl et al. [10] offer some technical perspective and guidance on accomplishing this technically challenging task. If performed effectively, this allows a group of samples to be stained and run under the exact same conditions, eliminating batch effects. Differences in sample collection and storage may still be visible, but barcoding will go a long way towards reducing the effects of differential sample prep and acquisition.
When undertaking a longitudinal study using spectral cytometry, some consideration should be given to what kind of reference controls will be used (whether beads or cells) - either collecting a new set of controls for each batch or sample, or collecting a 'gold-standard' set at the beginning. Choosing correctly can also aid in preventing batch effects. Using one set of initial reference controls might be indicated if the reagents are known to be stable, the samples in the study are well characterized and persistent, and there is an increased risk of technical preparation errors expected during the study. Per batch sets of reference controls may be indicated if the batch number is low, reagent stability is in question, the samples (and their autofluorescence) are poorly characterized, or there is no question that the experiments will be executed with a high technical proficiency.
Like we mentioned above, one commonly used method to both identify and (if necessary) fix batch effects is the inclusion of a "bridge", "anchor", or "validation" sample in each batch. The goal is to have a consistent sample present in each batch so batches can be compared and any shift in the results can be visualized and quantified. This can be achieved in several ways, but commonly investigators working with PBMCs will aliquot and freeze a leukopak or some similar large single source of cells and then for each batch of the study, remove a vial and prep the cells alongside the experimental samples. While not ideal, even if the assay/trial in question only involves fresh samples, the bridge sample must only match itself across batches so may be a suitable method for tracking changes over time. While generally effective for its stated purpose, this method isn't suitable for all situations. Sometimes a cell population of interest is rare enough that including enough cells for each batch from a single source can be difficult, or the antigen of interest might disappear with freeze/thaw processing. In this case, a lyophilized cell control product can be a solution - regardless, choosing a control sample that allows each channel to be tracked in some capacity is essential for the success of this method.
Once these bridge samples are acquired, they can be examined across time in a Levy-Jennings plot to look for changes. They can also be used in algorithms like Harmony [6], cytoNorm [12], or iMUBAC [7] as a guide to which all the experimental samples can be normalized against. These tools will work on any cytometry data, including that generated by the Cytek® Aurora or Northern Lights™ systems.
Looking forward, there will inevitably be newer and more innovative algorithms and strategies for addressing batch effects in longitudinal experiments but in the meantime, with a little planning and the techniques listed above, you should be on your way towards cleaner and more reliable longitudinal data.