Missing Document Language: Addressing the issues
Learn how to set the document language to ensure all content can be correctly interpreted
Time estimate: 4 to 6 mins
Lesson contents:
Part 1 | Missing Document Language: Understanding the issues
Part 2 | Missing Document Language: Addressing the issues
This training helps:
Anyone who creates and manages webpages and content.
- Developers
- Content management system managers
- People who manage a website
- … and more
Review of “Understanding missing document language”
About document language
Document language is simple code that identifies the language of the web page.
This is used by:
- Browsers
- Assistive technology
- Third-party APIs like media players with captions or grammar checkers
When the document language is available, it allows the code to present the information correctly.
For example, a website set to “US English” will be correctly interpreted by grammar checkers and assistive technology.
How missing document language can impact assistive technology users
If language is not set, a device will read content based on its default settings. For example, if your computer is set to United States English, screen readers, voice assistants, and other assistive technologies will default to speaking with U.S. English pronunciation.
Play the audio below for an example. In this scenario, a screen reader announces Spanish in an U.S. English accent that’s hard to understand.
[Screen reader pronunciation of Spanish in the intended Spanish accent] “La selección masculina de fútbol de México es el equipo formado por jugadores de nacionalidad mexicana que representa a la Federación Mexicana de Fútbol (FMF).”
[Screen reader pronunciation of Spanish in an English accent] ”La seleck-eon masculine-a de foot-bowl de Mexico es el equip-o for-mad-o por juga-dores de national-y-dad mexican-a que represent-a a la Federation Mexican-a de foot-bowl (FMF).”
The format for document languages
In all cases, you will set the document language in the <html>
element.
It typically consists of two or three parts:
Attribute: lang
Language: An ISO language code reference. For example, “en
” for English and “es
” for Spanish.
*Region (optional): A region code, connected to the language code with a hyphen. Examples are “US
” for American English or “MX
” for Mexican Spanish.
Setting up document language
Language code only
< html lang = “en” >
The most common way to assign language is to use the language value alone like “en.”
Language and region code
< html lang = “en–US” >
The same language can be used or spoken differently by location and it may benefit from adding the regional code like “US.”
Note: The region attribute is optional, but we encourage using it wherever possible. It ensures the content appears based on the correct regional dialect.
Tips to fix missing document language
Recall the scenario with the audio recording at the start of this lesson. It is clear that the Spanish version of the site doesn’t have the correct document language. The next step is to figure out why and address it.
Steps:
Check the format
Validate the code
Add the correct language attribute
1. Check the format
A simple mistake like omitting the language can cause issues. Do a manual inspection since automated tools can’t detect all issues.
a. Is the document language missing?
Automated tools can detect if the document language code is missing. This happens when there is no “lang” listed.

Code is not present
< html >

Code is present
< html lang = “ar” >
b. Is the element correct?
Sometimes, the document language is placed in the wrong location like the body element, instead of the html element. Not all automated tools can flag this error, so a manual review is recommended.

Wrong element
< body lang = “es” >

Correct element
< html lang = “es” >
c. Is there a hyphen for regional codes?
If you’re using a regional code, it needs to be connected to the language code with a hyphen.

Missing hyphen
< html lang = “esMX” >

With hyphen
< html lang = “es-MX” >
2. Validate the code
A simple mistake like entering it improperly can cause issues. Do a manual inspection since automated tools can’t detect all issues.
a. Identify the language code
For this activity, pretend the language code was set to “ed” instead of “es.”

Code is not present
< html >

Code is present
< html lang = “ar” >
b. Confirm the correct language code
In a separate window, navigate to a language code database. We compare two tools. Feel free to choose whichever you prefer.

Pro: This app’s functions narrow and match results. It’s better at finding and verifying codes.
Con: It shows the language and region code separately.

Pro: The language and region codes appear in proper format. The page lets you view all the region codes in one place.
Con: The only way to navigate is to scroll or use Ctrl + F to find a code, which can lead to unrelated results.
c. Use the tool to find and verify the code
1. Look up the language code
Use the “Look up” function and type in the language or region code like “ed.”
If it is an invalid code, it will show no matching results and it may be incorrectly spelled. Proceed to step 2.
If it is a valid code, it will show matching results for both language codes and region codes. For example, “es” matches with the language code for Spanish and the region code for Spain.
Note: Find will only search codes like “ed,” not the full name like “Spanish.”

Invalid code


Valid code

2. Find the language code
Use “Find” to search up codes by the name of the language like “Spanish.”

3. Optional: Find the specific region
Use “Find” to search up codes by the name of the region like “Mexico.”
To confirm it’s a region or language code, you can select it to see its type: “region.” Now, you know the region subtag to use is “MX.”

3. Add the correct language attribute
After you identify the issue with your code, add the correct language code.
Correct Examples
With the previous example, it can be “es” for Spanish language and “MX” for Mexico to represent the region code.

Language code only
< html lang = “es” >

Optional: Language and region code
< html lang = “es-MX” >
Frequently asked questions
Is any part of the code case-sensitive?
The code is not case-sensitive. Many use lowercase for the language and upper case for the region. (Example: es-MX
)
Does the language or region code always have to be two letters?
No, but screen readers do not provide broad support for all language code formats. For instance, yue
and zh-yue
both mean Cantonese, however zh
has broader language support than yue
.
Do website builder platforms like WordPress automatically set the document language?
Some platforms check code structure to detect if the language code is correct. It’s good practice to manually check the databases to make sure.
Takeaways
A simple mistake like omitting the document language or using the wrong one can cause formatting issues. Here are some questions to help you ensure the document language is correct.
Questions to ask yourself:
- What is the primary language of the content?
- Does the code reference the correct language?
- Does the code use the right format?
Practice exercises
1. Provide the following region code for a website in Mandarin Chinese.” (only one region code)
Hong Kong: <html lang="zh-____">
Scenario:
An engineer wants to create speech synthesizers that are localized to specific regions to be more inclusive to dialects worldwide.
Use Validate the code in Language Code Table tool to provide the region code for Hong Kong screen reader localization.
They achieve this by using the region subtag. Use Validate the code in Language Code Table tool to provide the region codes for Hong Kong screen reader localization.
The correct answer is HK. This is the value assigned to this region. The region code is optional, but it highly recommended to add where possible.
2. Find and apply the correct language code for this text written for a blockquote in non-U.S. English:
“Um passo à frente e você não está mais no mesmo lugar”
Scenario:
You are tasked with the job of fixing missing document language on a website. No one on the team can identify the language. Use a language detection tool to determine the correct language for the blockquote, and enter the correct language code.
The text is written in Portuguese and the language code is “pt” Applying this to a document will guarantee that other technologies can translate it if needed.
Resources
We’d love to hear from you.
Help improve our training and take our survey!
Explore our other training
Alternative text
Describe images accessibly, and making them available to everyone
Ambiguous links
Make link destinations clear for everyone