Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis

Background: Individuals from minoritized racial and ethnic backgrounds experience pernicious and pervasive health disparities that have emerged, in part, from clinician bias. Objective: We used a natural language processing approach to examine whether linguistic markers in electronic health record (EHR) notes differ based on the race and ethnicity of the patient. To validate this methodological approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias. Methods: In this cross-sectional study, we extracted EHR notes for patients who were aged 18 years or older; had more than 5 years of diabetes diagnosis codes; and received care between


Preprint Settings
1) Would you like to publish your submitted manuscript as preprint?Please make my preprint PDF available to anyone at any time (recommended).
Please make my preprint PDF available only to logged-in users; I understand that my title and abstract will remain visible to all users.Only make the preprint title and abstract visible.
No, I do not wish to publish my submitted manuscript as a preprint.2) If accepted for publication in a JMIR journal, would you like the PDF to be visible to the public?
Yes, please make my accepted manuscript PDF available to anyone at any time (Recommended).
Yes, but please make my accepted manuscript PDF available only to logged-in users; I understand that the title and abstract will remain v Yes, but only make the title and abstract visible (see Important note, above).I understand that if I later pay to participate in <a href="http

INTRODUCTION
Language and communication play a significant, if not primary, role in social relations across different cultures [1].Language has increasingly been recognized as a relevant form of data that describes relations and behavior [2].One of the most intimate communications between individuals occurs between clinicians and patients during clinical visits.However, these encounters may be undermined by different forms of bias directed toward patients from certain racial and ethnic minority groups [3].Generally, bias refers to an evaluation, decision, perception, or action in favor of or against a person or group compared with another.Bias can be blatant, wherein it is characterized by deliberate actions (e.g., racist comments) that are intentionally and overtly discriminatory [4].Bias can also be subtle, including "actions that are ambiguous in intent to harm, difficult to detect, low in intensity, and often unintentional but are nevertheless deleterious" to targets [4].Subtle bias by healthcare clinicians is linked to negative outcomes for racial/ethnic minority, particularly Black Non-Hispanic and Hispanic/Latino, patients [5].

Race and Racial Bias in Medical Interactions:
Health disparities between racial/ethnic groups have historically been attributed to varying levels of socioeconomic status, as well as genetic and biological factors that were thought to predispose groups to different medical conditions.
Research has emerged over the past few decades demonstrating that in fact, there is no biologic basis for racial/ethnic differences.Humans share 99.9% of their genome and the 0.1% variation cannot be explained or elucidated by race [6].Race describes physical traits considered socially significant, and ethnicity denotes a shared cultural heritage, such as language, practices, and beliefs [7].As such, race and ethnicity are social constructs, and since the landmark report Unequal Treatment in 2002 detailed the impact of racial and ethnic discrimination in patient-clinician interactions, research interest in this area has burgeoned [8].Relative to White Non-Hispanic patients, Black Non-Hispanic and Hispanic/Latino patients are less likely to ''engender empathic responses from clinicians, establish rapport with clinicians, receive sufficient information, and be encouraged to participate in medical decision making" [9].A lack of relationship building [10], reduced positive patient and clinician affect [11], decreased patient trust [12], and fewer patient questions [13] are all more likely outcomes for Black Non-Hispanic and Hispanic/Latino patients compared to White Non-Hispanic patients during medical interactions.Indeed, the 2018 National Healthcare Disparities Report revealed that, compared to White Non-Hispanic patients, Black Non-Hispanic patients receive inferior care on 40% of quality measures, and Hispanic/Latino patients receive worse care on 35% of quality measures, many of which indicate biased and discriminatory behaviors by clinicians [14].For example, indicators were worse for Black Non-Hispanic/Hispanic/Latino patients than for White Non-Hispanic patients for measures such as: "physicians sometimes or never showed respect for what they had to say" and "physicians sometimes or never spent enough time with them" [14].Black Non-Hispanic and Hispanic/Latino patients are more likely to report racial/ethnic bias and discrimination during medical encounters compared to White Non-Hispanic patients [15].Yet less is known about the manifestations and details of such experiences during the clinicianpatient interaction [16], and whether racial/ethnic discrepancies in care can be observed in the content of electronic health records (EHRs).Similar to the thesis described in Unequal Treatment, we hypothesize that the mitigation of bias at the clinician level is needed to improve patient outcomes for diverse racial/ethnic populations and narrow the disparities gap.To address bias, researchers need to understand how to measure its existence, and clinicians need to be informed of its manifestations.
Research Contributions: Bias can have many forms--blatant, subtle, malevolent, and/or benevolent-all of which can be indicated by language.With increasing access to EHR documentation and advances in natural language processing, we may be better equipped to identify differences in clinician encounters with patients of diverse racial and ethnic backgrounds.This study searched for linguistic discrepancies in EHRs using a natural language processing approach followed by linear mixed effect model analyses.EHRs are digital summaries of the clinician-patient encounter and include the clinician's assessment of the interaction, as well as the patient's health history.Because the clinician is responsible for inputting information, as well as reviewing the information inputted by other care clinicians in the EHR for each patient encounter, the contents of the EHR may be particularly useful in illuminating biases that clinicians hold towards patients of different racial/ethnic backgrounds.
Although several studies have indicated that clinician bias occurs, particularly in racially/ethnically discordant interactions (i.e., when the patient and clinician are of different racial/ethnic backgrounds), relatively little research has examined the ways in which the clinician may be thinking about the patient and how the clinician's sentiment and cognitions are reflected in the language of the EHR [8,17].EHRs can include many years of patient-clinician interactions, with multiple clinicians having access to them, allowing for biases to be passed on and potentially impact future medical decisions.
Our dataset contains EHR notes for a large sample of White Non-Hispanic, Black Non-Hispanic, and Hispanic/Latino diabetic patients in the Southern United States.The natural language processing tool, Sentiment Analysis and Social Cognition Engine (SEANCE), was applied to assess multiple linguistic markers in the EHR text [18,19].We then explored whether 8 of the 20 SEANCE components (see Table 1) differed for patients of different races/ethnicities.
We hypothesized that the SEANCE components for: negative adjectives, positive adjectives, joy, fear and disgust, politics, respect, trust verbs, well-being, and the mean word count in the note would be indicators of bias, as these concepts have been linked to bias in nonmedical contexts.Ng's review of linguistic racial bias in verbiage offers the rationale for our choice of fear and disgust, politics, respect, and trust verbs as indicators of bias [20] while Li's work examining gender differences in standardized writing assessment provides further support for our use of SEANCE as a tool for examining biases in language [21].We selected the positive and negative adjectives, well-being, politics, and word count indicators as prior research demonstrates that clinicians may be less likely to establish rapport, provide appropriate medications, and are more inclined to show negative attitudes and be dismissive towards Black Non-Hispanic and Hispanic/Latino patients as a result of their unconscious racial/ethnic biases [22][23][24][25].
Specifically, we investigate which aspects of communication differ and whether differences are indicative of biased interactions.Any systematic variation in language can convey differential perceptions, attitudes, and expectations.For example, words like "resistant" or "noncompliant" could reflect bias if (all else being equal) they tend to be used more to reflect people from some racial/ethnic backgrounds than others.This work aims to elucidate for clinicians and researchers where discrepancies in communication emerge in the EHR and whether these differences are indicative of racial/ethnic bias.We also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias.Note.Indices refer to the number of dictionary lists from which the component was developed.
The key indices come from the following dictionary lists: NRC refers to the NRC Emotion Lexicon [18,26], GI refers to the Harvard-IV dictionary list used by the General Inquirer [27], Lu Hui refers to the Hu-Liu polarity word lists [26,27], Laswell refers to the Lasswell dictionary lists [30,31], and GALC refers to the Geneva Affect Label Coder database [32].For a thorough review of the SEANCE indices and corresponding dictionaries see [18].practicing in an urban, academic network of clinics.We chose this disease because of its high prevalence (11.3% in the US) and chose to examine outpatient visits because of the relative scope of annual outpatient visits (1 billion) relative to hospital admissions (32 million) [33][34][35].

METHODS
The demographic variables collected were patient race/ethnicity, sex, and age.Race and ethnicity of patients were defined as 'White Non-Hispanic,' 'Black Non-Hispanic,' or 'Hispanic/Latino' (see Table 2 for a summary of patient demographics).
SEANCE: SEANCE is a lexical scoring algorithm that includes over two-hundred word-vectors (also referred to as indices or features) designed to assess sentiment, cognition, and social order, which were developed from preexisting and widely used databases such as EmoLex and SenticNet [26,36].In addition to the core indices, SEANCE allows for several customized indices including filtering for particular parts of speech and controlling for instances of negation developed [18].We selected these 8 components a-priori (see Table 1 for a description of the selected components).
We chose SEANCE instead of other natural language processing tools, such as Linguistic Inquiry and Word Count (LIWC), because it contains a larger number of core indices taken from multiple lexicons, as well as 20 components, and is based on the most recent improvements in sentiment analysis [18].In their validation of SEANCE, Crossley and colleagues found that SEANCE components demonstrated significantly greater accuracy than LIWC indices (p < 0.001) for three of the four review types examined [18].In addition to the core indices, SEANCE allows for several customized indices including filtering for parts of speech (also known as "POS tagging") and controlling for instances of negation, which LIWC does not offer.We analyzed all words in the EHR (i.e., not single parts of speech), but we did control for negation.This means something like "not good" would be recognized as not being positive, as opposed to LIWC which components and word count) and patient race, controlling for patient age [37,38].We ran an identical analysis, treating 8 different SEANCE components and mean word count in the EHR as the dependent variables, while leaving all other variables consistent across the models.The same steps of entering fixed and random effects were applied across all cross-classified linear mixed effects models with different dependent variables (i.e., negative adjectives, positive adjectives, well-being words, trust verbs, fear and disgust words, joy words, politics words, respect words, and mean word count).
We first ran a null model with only the random intercepts.We then added random effects and applied a crossed design (versus a traditional nested structure), leading us to have intercepts for physicians and patients.Then, we ran a model with the random intercepts as well as the fixed effects.As fixed effects, we entered race and age (without an interaction term) into the model.
For all models examined, the intercept variation can be attributed primarily to different physicians rather than patients.We used a 95% CI to determine statistical significance.To be more conservative, given that we ran multiple tests, we also computed an additional set of CIs at the 99 th percentile.We obtained approval from the University of Texas Health Science Center's Committee for the Protection of Human Subjects (HSC-MS-18-0431) and the Rice University Institutional Review Board (IRB-FY2021-325).Descriptives and Justification for Cross-Classified Analyses: An initial inspection of the data revealed that two physicians were extreme outliers, accounting for 16.53% of the notes in our sample.To ensure that the overrepresentation of these physicians would not bias the results, we removed those notes from the dataset (taking us from our initial sample of 15,460 visits with n=283 physicians and n=1,647 patients to n=281 physicians, n=1,562 patients (Table 2), and n=12,905 visits).The distribution of visits by patients indicates an average of 8 visits (M = 8.27)

RESULTS
per patient with a minimum of 1, median of 5, and a maximum of 97.Physicians see 12 (M = 11.72)patients on average, with a median of 2 and maximum of 143, suggesting a skewed distribution.Despite the relatively large number of patients seen by some physicians, these physicians accounted for substantially fewer patient notes than the two physicians that were previously removed.Patients see 2 (M = 2.11) physicians on average, with a minimum of 1 and a maximum of 12; however, the distribution suggests that 6.6% of patients (n=109) saw 5 or more physicians and most physicians saw between 1 (n=742) and 4 (n=119) patients.In our dataset, patients can have multiple visits to a variety of physicians, indicating that patient visits are not nested within physicians.Further, physicians may see different patients with no consistent overlap of patients between physicians, indicating that physicians are not nested within patients.
Thus, there is no clear hierarchical nesting of patients within physicians (or vice versa), which suggests that a cross-classified design is more appropriate than a traditional hierarchical multilevel model structure.Note.Random effects are presented as estimate and SE.For the fixed effect estimates, cell entries are parameter (beta) estimates, standard error (SE), and 95% confidence intervals.
Caucasian was the referent group for race.Significant effects based on the 95% CIs are denoted by *

Cross-Classified Linear Mixed Effects Model Results:
IFor example, in the negative adjective component model (Table 3), the random effects of patient (σ2 = 0.02) and physician (σ2 = 0.11) indicated that intercept variation in use of negative adjectives is mainly a function of the physician rather than the patient.The physician random effect is over five times as large as the random effect for the patient; the intra-class correlation (ICC) for physicians is 0.41 and the ICC for patients is 0.07 (ICC total = 0.481).This pattern of results in random effects and ICC values for patients and physicians was consistent across the other 8 models.In the remaining discussion of the results, we will focus on the fixed effects.Two of the five relationships (i.e., the significant difference in positive adjectives for Hispanic/Latino and White Non-Hispanic patient notes, the significant difference in trust verbs for Hispanic/Latino and White Non-Hispanic patient notes) that were previously significant at the 95 th percentile had CIs that included zero at the 99th percentile.For three of the SEANCE components, well-being, politics, and respect, and for overall word count, there was not a statistically significant difference between the three races/ethnicities.In contrast, for all the other remaining SEANCE components, there was a statistically significant race/ethnicity effect for either Black, Non-Hispanic or Hispanic/Latino patients relative to White Non-Hispanic patients.Specifically, notes for Black, Non-Hispanic patients contained significantly more negative adjectives and significantly more fear and disgust words compared to the notes for White Non-Hispanic patients.Notes for Hispanic/Latino patients included significantly fewer positive adjectives, trust verbs, and joy words compared to the notes for White Non-Hispanic patients.As such, across most of the SEANCE components, we observed favoritism of White Non-Hispanic patients in terms of note content.
Sentiment Analysis Validation: Twenty-seven participants completed the surveys (see Supplemental Table 1 for the demographics of the participants).On a scale of 1 to 10 with 10 being extremely indicative of bias, participants rated negative adjectives as 8.6, fear and disgust words as 8.1, positive adjectives as 7.9, trust verbs as 7.6, and joy words as 6.8.The means and standard deviations for each of the components are reported in Table 4.The results of this preliminary analysis provide support for the validity of the linguistic components as indicators of bias in EHRs, as our sample of clinicians regard them as highly suggestive of bias if used differently for patients of diverse racial/ethnic backgrounds.

DISCUSSION
We found that the words that physicians use in EHR notes differ based on the racial and/or ethnic backgrounds of patients.Specifically, for Black, Non-Hispanic patients, notes consist of words that convey negativity, fear, and disgust.When seeing Hispanic/Latino patients, physicians use fewer positive words and are less likely to use words that communicate trust and joy.Our findings are consistent with others who have documented that physicians communicate in the EHR differently (more negatively) when caring for patients from some minority groups [9,17], which may ultimately result in adverse and inequitable health outcomes for patients.Our results also align with other papers that found that stigmatizing language is more commonly used in EHRs for minorities [39][40][41][42][43].Those papers used language guidelines [39] and experts [40] to identify stigmatizing language.We came to a similar conclusion by using established language dictionaries and contend that our approach allows for a more comprehensive assessment of language.For example, one prior paper used fifteen descriptors [43].In contrast, our approach encompasses tens of thousands of words, including multiple word lists, positive and negative sentiments, and emotions.Thus, this method does not merely capture the presence or absence of stigmatizing language, but rather, offers a broader glimpse of the clinician-patient relationship.
Furthermore, the validation survey confirms that subject matter experts perceive the types of words included in this study to be indicative of bias when used differentially for patients of diverse racial/ethnic backgrounds.Taken together, these findings indicate that the language used differs for patients based on racial/ethnic backgrounds and that those differences are suggestive of bias.As a result, our paper is the first to use this particular method to examine outpatient, diabetes notes.Because diabetes quality measures already exist, our analysis allows researchers to link bias, in future studies, to differences in quality [44].
EHR notes are important, though imperfect, assessments of physician attitudes toward their patients.With more and more time now being devoted to EHR documentation, physicians are increasingly burned out, which has led to the adoption of more efficient data entry strategies such as using templates, copy-pasting previous text, and inserting preset language [45,46].
Consequently, notes can be standardized, limiting our ability to assess physician attitudes and subconscious biases toward patients.Despite these caveats, notes remain the definitive and often sole account of what happened in the exam room, and based on these data, Black, Non-Hispanic and Hispanic/Latino patients are written about differently than White Non-Hispanic patients.
The method described in this paper offers a scalable blueprint that provides clinicians with data about their interactions with patients and overcomes limitations of other traditional measures of bias.Existing measures require primary data collection through surveys, videotaped encounters, and confederate observations.Surveys assess perceptions of interactions and are prone to retrospective bias and socially desirable responding, while the time-consuming nature of encounters and observations lack scalability and limit the number of clinicians that can receive feedback at any given time.The relevance of alternative measures has also been questioned.For example, critics of the implicit association test have asked whether performance on the test is applicable to real-world contexts [47], which may explain why some change their behavior when confronted with their own biases while others do not [5,48].In contrast, our method uses data that are automatically and universally collected through the course of delivering care and generated by physicians in actual encounters.
Limitations: When interpreting our results, several limitations should be considered.First, due to limitations in our data, we are unable to determine which additional team members, including scribes, medical assistants, and residents, contributed to the notes.However, attending physicians are ultimately responsible for the content and have the authority and responsibility to modify language that is inconsistent with their values.Second, we lack information about physicians in this sample and do not have access to physician demographic characteristics (e.g., their racial/ethnic backgrounds) though this would be an important next step.We attempted to account for this limitation by comparing language within rather than across physicians.Third, we included all language within notes including physical exams, medications, and past medical histories.These sections can be guided by templates or not actively entered by physicians.We retained these parts in case the language within these sections contributed to variation.An alternative approach could assess only the history of present illness, assessment, and plan sections of the note and could yield different results.Additional work is needed to determine whether differential word choices reflect attitudes and behaviors toward patients.EHR notes serve a wide range of purposes.They convey medical information to others, remind physicians of their impressions, communicate plans to patients, provide justification for billing codes, and serve as legal evidence [45].Thus, specific phrases (e.g., worsening, uncontrolled, or adherence) may be required for billing, compliance, and legal purposes and may not reflect bias toward patients.Finally, these results may not be generalizable to other conditions.Our findings may be unique to the language used for diabetes care and by clinicians who manage diabetes.
Determining whether these results persist for different diseases (e.g., cancer, heart disease, and acute injuries) is an important next step.
Directions for Future Research: Additional research is needed to interpret and provide context for this exploratory work.To determine whether these measures are associated with bias, subject matter experts could label notes, using known patterns of bias (e.g., the ratio of collective to personal pronouns, amount, and level of abstraction of speech, and passive versus active voice) [49].Further research is needed to understand whether biased language in notes reflects biased behaviors during encounters as well as inequitable health outcomes for some racial/ethnic minorities.Conducting further experiments (for example, with research actors as patients in a mock medical visit) help determine whether biased language in notes reflects manifestations of bias during encounters (e.g., less eye contact, hostile language, or less time spent on education and counseling).If bias is confirmed, we need to determine whether clinicians who use differential language provide worse care and quality for minorities.Ultimately, this tool may be used to identify and mitigate bias.Future studies should assess whether receiving feedback using this method leads to behavior change and whether changing the language used in EHR notes leads to changes in patient interactions.While many strategies for reducing bias exist -such as affirming egalitarian goals, seeking common-group identities, perspective-taking, and individuation -it is unclear which approach best complements our proposed method [5].

Conclusion:
In this novel, exploratory work, we used natural language processing and found that compared to encounters with White Non-Hispanic patients, physicians use language conveying more negativity, fear, and disgust in their encounters with some racial and ethnic minorities.If confirmed in future studies, these features could be used to make clinicians aware of their biases with the goal of reducing racial/ethnic discrimination and resulting health inequities.

Sample:
This is a cross-sectional study utilizing EHR derived physician notation of outpatient clinical encounters.We extracted EHR encounters (n = 15,460) for patients (n = 1,647), 18 years of age or older, with more than five years of diabetes diagnosis codes, who received care, between 2006 and 2014, from family physicians, general internists, or endocrinologists

[ 18 ]
. Because SEANCE computes such a large quantity of indices,Crossley et al. (2017) developed 20 components from all the indices using principal component analysis (PCA)[18].These components are essentially clusters of related indices in SEANCE and allow users to interpret the SEANCE output at a more macro-level.This process enabled them to summarize the SEANCE indices into a smaller and more interpretable set of variables.In Crossley and colleagues' PCA, they retained even the smallest components, setting a conservative cutoff point for inclusion (i.e., 1% for variance explained by each component)[18].The analyses for the current research were run on a subset of 8 of the 20 components thatCrossley et al. (2017) effects are presented as estimate and SE.For the fixed effect estimates, cell entries are parameter (beta) estimates, standard error (SE), and 95% confidence intervals.Caucasian was the referent group for race.Significant effects based on the 95% CIs are denoted by *

Table 4 . Subject Matter Expert Assessment of Bias Based on Specific Linguistic Markers
Note: scale ranges from 1 (Not at all indicative of bias) to 10 (Extremely indicative of bias).