Health information is massive with valuable numeric and textual data, and because of the increased data velocity in today’s health care environment, a health plan’s data is replaced by new data within three years. But industry experts say 80 percent of health plan data remains trapped in unstructured text, such as clinical notes, electronic medical records and call center logs.

Text mining transforms unstructured data into a format that opens it to analysis. Medical reports, e-mails and research articles then can be used just like other health care data, such as age, gender, blood pressure and cholesterol levels, to perform data mining, clustering, neural networks, decision trees and regression analysis on rich data. Imagine how robust an investigation could become if a patient’s symptoms in clinical notes differed from the billing submitted to the payer. Imagine the integrity issues uncovered if the patient appeals a non-covered service, but after the appeal is denied, the doctor re-files the claim using the same date of service but different treatments.

Text mining processes data in four phases: cleansing, parsing, transformation and document clustering. Cleansing and parsing create a term-by-document frequency matrix, then mathematically transforms it to create structured information for data mining. Finally, the text collection is clustered and can be explored, reported on and visualized.

Consider a chiropractor’s office calling a health insurer to determine remaining benefit levels on an entire family within the same month. The insurer then is billed for the maximum remaining benefit amount on each family member for treatment of identical back pain. With supporting technology, rules and models can be created from this information, flagging providers who call using similar terms each time and generating leads for investigations.

Terms change over time. The use of identical terms across multiple providers may indicate that they are sharing information; a decrease in the use of some terms may indicate that a bad actor has deciphered the rule. For example, the phrase “unspecified soft tissue injury” increasingly used among connected providers could be a good indication that they are colluding. If a rule is implemented flagging claims with the phrase “unspecified soft tissue injury,” followed by a decrease in the same activity, it could indicate they have discovered the rule and thus changed their behavior.

Transforming the term/phrase/topic matrix uses various methods from simple weighting schemes (high weight to rare terms and low weight to frequent terms), to dimension reduction techniques used to reduce massive amounts of sparse data, such as a term or phrase by document matrix, to just those dimensions that differentiate the documents. The result is a matrix suitable for data mining.

Topics are compact representations summarizing the main ideas. Topics common to a cluster can be used to identify bad actors who reuse the same text for different problems, or who may be colluding, as evidenced by frequent appearance in the same cluster.

Auto Claims Fraud

Certain words are associated with an increased likelihood of insurance fraud, such as “back injuries” or terms like “phantom vehicle,” while others, such as “burns” or “fractures,” are associated with decreased likelihood of fraud.

In one case, the current flagging mechanism for investigated claims had a false positive rate of 95 percent, meaning only 5 percent of the leads turned out to be fraud. Using predictive modeling without text mining, the false positive rate was 73 percent and by adding text mining for specific terms, we reduced the false positive rate to 46 percent; thus, more than half of the leads now turned out to be actual fraud. Adding textual information to the predictive model doubled effectiveness.

Creative Billing in Health Care Claims

Medical notes from multiple points in treatment can reveal strange behavior. For example, nurse pre-certification notes can be mined for diseases and compared with patient history and with subsequent claims automatically filed. Say a doctor’s office called the health plan to pre-certify a surgical hospital stay, telling the pre-certification nurse the patient was overweight, diabetic and hypertensive. Those complications indicated a five-day stay was authorized. After the claim was paid, upon retrospective audit no complications were present on the electronically filed claim, so the stay should have been a one-day overnight surgery.

In any of the preceding examples, text mining would have significantly increased the robustness and speed of claims review and could have prevented payment beforehand.

Health plans should embrace text mining to unlock the power of data and improve integrity and quality of care, as well as contain costs. Payers are best positioned to avoid letting the words get in the way.

Jay King is director in the advanced analytics lab and Julie Malida is principal in the security intelligence practice at the SAS Institute. 

This column originally appeared in the Vantage Point section of Insurance Networking News. Published with permission.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access