How ATS Works4 min read

How Resume Parsing Works: The Technology Behind ATS Screening

Resume parsing is the technology that converts your formatted resume document into structured data that an ATS can search, filter, and score. It combines Natural Language Processing (NLP), pattern recognition, and machine learning to extract meaning from unstructured text. Understanding this technology helps you create resumes that parse accurately every time.

The Three Stages of Resume Parsing

Resume parsing follows three sequential stages: text extraction, section identification, and entity recognition. Each stage builds on the previous one, and errors at any stage cascade through the entire process.

Text extraction is the foundation. The parser opens your file and converts it into plain text, stripping all formatting, images, and layout information. For DOCX files, this means parsing the XML structure. For PDFs, the system extracts the text layer, which can be problematic if the PDF was created from a design tool.

Section identification comes next. The parser scans the extracted text for headings and structural patterns that indicate different resume sections—contact information, summary, work experience, education, skills, and certifications. This is where standard headings become critical.

Natural Language Processing in Parsing

NLP is the core technology that enables parsers to understand resume content. Named Entity Recognition (NER) identifies specific types of information: person names, organization names, locations, dates, email addresses, and phone numbers.

Part-of-speech tagging helps the parser understand the grammatical structure of sentences, distinguishing job titles from company names and skills from general descriptions. For example, in 'Senior Software Engineer at Google, Mountain View,' the parser must identify 'Senior Software Engineer' as a title, 'Google' as an organization, and 'Mountain View' as a location.

More advanced parsers use machine learning models trained on millions of resumes to improve accuracy. These models learn patterns specific to resume writing, such as the typical structure of experience entries and the format of date ranges.

Named Entity Recognition (NER): identifies people, organizations, locations, dates
Part-of-speech tagging: distinguishes titles from companies from skills
Pattern matching: recognizes phone numbers, emails, URLs, date formats
Contextual analysis: determines meaning based on surrounding text
Machine learning models: improve accuracy based on training data

Common Parsing Engines and Their Accuracy

Different ATS platforms use different parsing engines, each with varying levels of accuracy. Some build proprietary parsers, while others license third-party engines like Sovren (now Textkernel), Daxtra, or HireAbility.

Sovren/Textkernel is one of the most widely used parsing engines, known for high accuracy across multiple languages and formats. Daxtra is popular in staffing and recruiting firms. HireAbility focuses on high-volume parsing with fast processing speeds.

Parsing accuracy typically ranges from 70-95% depending on the resume format and complexity. Simple, single-column resumes with standard headings parse at 90-95% accuracy. Complex, multi-column resumes with graphics may parse at only 60-70% accuracy, with significant data misclassification.

Parsing Engine	Used By	Accuracy Range
Textkernel (Sovren)	iCIMS, SmartRecruiters	85-95%
Daxtra	Bullhorn, JobAdder	80-92%
HireAbility	Various ATS platforms	78-90%
Proprietary (Greenhouse)	Greenhouse	82-93%
Proprietary (Workday)	Workday	80-90%

What Breaks Parsing: Technical Deep Dive

Understanding why certain formats break parsing requires knowing how text extraction works at a technical level. In a DOCX file, text is stored in XML tags that indicate paragraphs, runs of text, and formatting. The parser reads these tags sequentially, so the text order matches the visual order in most cases.

PDFs are more problematic because text positioning is specified by coordinates rather than document flow. A two-column PDF stores text as individual characters positioned at specific x,y coordinates. The parser must reconstruct the reading order, which can fail for complex layouts.

Tables in any format create parsing challenges because the parser must determine how to linearize a two-dimensional structure. Some parsers read left-to-right row by row, while others read column by column. If your resume uses tables for layout, the parser may merge unrelated information.

Pro Tips

Use a single-column layout to ensure the parser reads your content in the correct order

Stick to standard fonts (Arial, Calibri, Times New Roman) that are universally supported by text extraction engines

Use standard bullet points (•) rather than custom symbols or dashes, as some parsers don't recognize non-standard list markers

Include dates in a consistent, recognizable format (e.g., 'Jan 2020 – Present' or '01/2020 – Present')

Test your resume by uploading it to a free ATS checker to see how well it parses

Common Mistakes to Avoid

Using tables for resume layout—even 'invisible' tables with no borders still confuse parsers

Saving your resume from Google Docs as PDF, which can create PDFs with inconsistent text layers

Using special characters or Unicode symbols that the parser replaces with garbled text

Placing your name as an image or WordArt rather than plain text

Frequently Asked Questions

How accurate is resume parsing?

Parsing accuracy ranges from 70-95% depending on the resume format. Simple, single-column resumes with standard headings achieve 90-95% accuracy. Complex layouts with tables, columns, or graphics may only parse at 60-70% accuracy.

Can the ATS parse resumes in languages other than English?

Most enterprise ATS platforms support multiple languages, but parsing accuracy varies. English has the most training data and highest accuracy. For non-English resumes, use the simplest possible formatting to maximize parsing accuracy.

Does the parser understand context or just extract text?

Modern parsers use NLP to understand context to some degree—they can distinguish job titles from company names and identify date ranges. However, they don't understand career narratives or achievement significance the way humans do.