Pearson's new readability formula

The Reading Maturity Metric

ON 24 June, 2013, Pearson Education released a beta version of its accurate and easy-to-use readability formula, Reading Maturity Metric. You can get a free subscription here: www.readingmaturity.com

Existing formulas use surface features such as word frequency and the average length of words and sentences. That is not to say that such features cannot be used to usefully predict text difficulty. They do. The new formula, however, stands on the latest insights on how we use language and results in greater accuracy.

With a correlation coefficient (r=) of .94 with comprehension as measured by reading tests, and with a Standard Error of 1.67, the new Pearson formula clocks in as more reliable than today's popular formulas such as the Dale-Chall, Flesch, Fog, and SMOG (See following article).

How it works

Photo: Dr. Landauer
Setting new standards for readability assessment: Thomas Landauer, leader of the team that developed the Reading Maturity Metric.

Working in Bell Labs in the 1970s, Thomas Landauer and his associates were grappling with the problem of cataloging and retrieving information from computerized databases. Digital word searches were too inefficient, often returning the wrong records.

Many words have more than one meaning, and different words can be used to describe the same concept. It seems that individual words are often poorly related to the concepts they are used to describe.

Landauer's team turned their attention to the very old problem of how we create meaning for words.

We have known for a long time that we can infer the meaning of a word from its use and context in a sentence. Vocabulary tests have shown that a 7th-grader, for example, can identify the meaning of 10 to 15 new words a day, many without previously having seen them. How does the brain do that?

Dr. Landauer and his team surmised that every text has a "latent semantic structure," that enables the brain to recognize the meaning of new words. They attempted to simulate the brain's ability to analyze that structure with a method they call it "Latent Semantic Analysis" (LSA).

Using LSA, they can catalog a word or text of any size by reducing it to a single point (vector) in an imaginary 3-D space. The position of its vector in this 3-D space determines its "meaning" and its relationship to other texts.

The meaning of a paragraph in this system can be said to be the average of all the words it contains. The meaning of a word is the average of all the paragraphs it appears in.

To test LSA, they applied it to a large body of texts. They found that it did very well and was just as effective as a 7th-grade student in learning new words.

Visualization of LSA space
Visualization of an imaginary 3-D catalog used in Latent Semantic Analysis. The position of a vector determines its "meaning," its relationship to other texts.

The first application of this method, Latent Semantic Indexing, is now widely used in online search engines such as Google.

LSA and its progeny are also widely used in other fields such as education, natural-language processing, identity protection, and human-motion detection.

In applying LSA to readability, Dr. Landauer and his team first used LSA to catalog hundreds of graded reading tests, with their respective vectors now located in an imaginary 3-D catalog.

When you submit your text to the online Reading Maturity Metric, it catalogs your text using the same method. Finally, by measuring its proximity to the graded reading-test vectors in the maginary 3-D space, it reports a grade level.

To read Dr. Landauer's white paper on the new formula, go here:

Computerized and online

Reliability of readability formulas

THE results of a computerized readability formula can often be different from the same formula applied by hand. Different computerized versions often use different methods to determine variables such as the length of words, sentences, and syllables.

The following includes the results of analyzing the reliability of various popular computerized formulas. The reliability of formulas ore often assessed by how well their results correlate with with a range of normed reading tests, the difficulty levels of which are already known.

The following formulas were assessed using the Pearson product-moment correlation coefficient (r=) on the results of different formulas applied to the 51 normed texts in The Qualitative Assessment of Reading Difficulty: A Practical Guide for Teachers and Writers by Jeanne Chall and colleagues, published in 1996.

Correlation coefficients indicate how well a readability formula "tracks" the difference between one text and another. A coefficient can go from zero to one with zero indicating no correlation, and 1.0 indicating perfect correlation. A correlation must be at least .50 to be considered significant.

The Standard Error indicates the range within which the formula results are accurate. For example, a Standard Error of 2.0 indicates the formula is accurate within two grades.

Reliability Correlations

  1. Pearson Education

    Free with registration at: http://www.readingmaturity.com

    Formula r=Std. Err.
    Reading Maturity Metric .94 1.67

  2. Readability Calculations,

    Available on CD from http://www.micropowerandlight.com

    Formula r=Std. Err.
    New Dale-Chall.931.75
    Flesch-Kincaid .91 1.90
    Gunning Fog.902.00
    SMOG.88 2.82
    Flesch Reading Ease-.862.86
    Fry Graph.852.31

  3. Microsoft Office 2003

    Formula r=Std. Err.
    Flesch-Kincaid .90 2.07
    Flesch Reading Ease -.84 2.54

  4. Lexile Framework

    Available free with registration at: https://www.lexile.com/analyzer/

  5. Formula r=. Std. Err.
    Lexile Analyzer .90 2.00

  6. Okapi Readability Statistics

    Avaliable online: http://tinyurl.com/o2c86jd

    Formula r=Std. Err.
    Original Dale-Chall .90 2.11

  7. Readability Formulas

    Available free online at: http://www.readabilityformulas.com

  8. Formula r=Std. Err.
    Spache .91 2.12
    Gunning Fog .90 1.92
    Flesch-Kincaid .89 2.02
    SMOG .88 2.17
    Automated Readability Index .88 2.11
    Readability Text Consensus Tool .88 2.14
    Dale-Chall .88 2.27
    Linsear Write Formula .87 2.24
    Flesch Reading Ease -.86 2.30
    Fry Graph .84 2.14
    Raygor .82 2.58
    Coleman-Liau Index .72 3.17

Plain language in the news

New Pearson readability formula: http://tinyurl.com/o86r694

Google using readability formulas to rate Web sites: http://tinyurl.com/m9ra7nz

How to write content for a business audience: http://tinyurl.com/ndlacbq

Arbitration clauses more difficult to understand: http://tinyurl.com/k53kplm

Australian website privacy statements fail in readability: http://tinyurl.com/mwroaje

British town of Pendle gets Plain English award: http://tinyurl.com/oom5m5e

Plain English saves time and money: http://tinyurl.com/lx5vfjp

Third-circuit court wants plain-language agreements: http://tinyurl.com/mko7bdj

Most federal agencies failing to communicate with the public: http://tinyurl.com/pdawd7g

Why nobody can read Obamacare: http://tinyurl.com/k64yttr

President Xi Jinping promotes plain Chinese: http://tinyurl.com/q65eqkj

Why blogging is good for lawyers: http://tinyurl.com/l27e4bp

GAO wants plain language for pension reporting: http://tinyurl.com/n9ucmwk

Canadian wireless law requires plain-language contracts: http://tinyurl.com/lkrpy5h

"Content" is bad for business: http://tinyurl.com/ntno2sj

Nigerian bank blames financial illiteracy for crisis: http://tinyurl.com/mduxzv2