How Resume Parsing Works: The Technology Behind ATS Screening
Resume parsing is the technology that converts your formatted resume document into structured data that an ATS can search, filter, and score. It combines Natural Language Processing (NLP), pattern recognition, and machine learning to extract meaning from unstructured text. Understanding this technology helps you create resumes that parse accurately every time.
The Three Stages of Resume Parsing
Resume parsing follows three sequential stages: text extraction, section identification, and entity recognition. Each stage builds on the previous one, and errors at any stage cascade through the entire process.
Text extraction is the foundation. The parser opens your file and converts it into plain text, stripping all formatting, images, and layout information. For DOCX files, this means parsing the XML structure. For PDFs, the system extracts the text layer, which can be problematic if the PDF was created from a design tool.
Section identification comes next. The parser scans the extracted text for headings and structural patterns that indicate different resume sections—contact information, summary, work experience, education, skills, and certifications. This is where standard headings become critical.
Natural Language Processing in Parsing
NLP is the core technology that enables parsers to understand resume content. Named Entity Recognition (NER) identifies specific types of information: person names, organization names, locations, dates, email addresses, and phone numbers.
Part-of-speech tagging helps the parser understand the grammatical structure of sentences, distinguishing job titles from company names and skills from general descriptions. For example, in 'Senior Software Engineer at Google, Mountain View,' the parser must identify 'Senior Software Engineer' as a title, 'Google' as an organization, and 'Mountain View' as a location.
More advanced parsers use machine learning models trained on millions of resumes to improve accuracy. These models learn patterns specific to resume writing, such as the typical structure of experience entries and the format of date ranges.
- Named Entity Recognition (NER): identifies people, organizations, locations, dates
- Part-of-speech tagging: distinguishes titles from companies from skills
- Pattern matching: recognizes phone numbers, emails, URLs, date formats
- Contextual analysis: determines meaning based on surrounding text
- Machine learning models: improve accuracy based on training data
Common Parsing Engines and Their Accuracy
Different ATS platforms use different parsing engines, each with varying levels of accuracy. Some build proprietary parsers, while others license third-party engines like Sovren (now Textkernel), Daxtra, or HireAbility.
Sovren/Textkernel is one of the most widely used parsing engines, known for high accuracy across multiple languages and formats. Daxtra is popular in staffing and recruiting firms. HireAbility focuses on high-volume parsing with fast processing speeds.
Parsing accuracy typically ranges from 70-95% depending on the resume format and complexity. Simple, single-column resumes with standard headings parse at 90-95% accuracy. Complex, multi-column resumes with graphics may parse at only 60-70% accuracy, with significant data misclassification.
| Parsing Engine | Used By | Accuracy Range |
|---|---|---|
| Textkernel (Sovren) | iCIMS, SmartRecruiters | 85-95% |
| Daxtra | Bullhorn, JobAdder | 80-92% |
| HireAbility | Various ATS platforms | 78-90% |
| Proprietary (Greenhouse) | Greenhouse | 82-93% |
| Proprietary (Workday) | Workday | 80-90% |
What Breaks Parsing: Technical Deep Dive
Understanding why certain formats break parsing requires knowing how text extraction works at a technical level. In a DOCX file, text is stored in XML tags that indicate paragraphs, runs of text, and formatting. The parser reads these tags sequentially, so the text order matches the visual order in most cases.
PDFs are more problematic because text positioning is specified by coordinates rather than document flow. A two-column PDF stores text as individual characters positioned at specific x,y coordinates. The parser must reconstruct the reading order, which can fail for complex layouts.
Tables in any format create parsing challenges because the parser must determine how to linearize a two-dimensional structure. Some parsers read left-to-right row by row, while others read column by column. If your resume uses tables for layout, the parser may merge unrelated information.
Pro Tips
Use a single-column layout to ensure the parser reads your content in the correct order
Stick to standard fonts (Arial, Calibri, Times New Roman) that are universally supported by text extraction engines
Use standard bullet points (•) rather than custom symbols or dashes, as some parsers don't recognize non-standard list markers
Include dates in a consistent, recognizable format (e.g., 'Jan 2020 – Present' or '01/2020 – Present')
Test your resume by uploading it to a free ATS checker to see how well it parses
Common Mistakes to Avoid
Using tables for resume layout—even 'invisible' tables with no borders still confuse parsers
Saving your resume from Google Docs as PDF, which can create PDFs with inconsistent text layers
Using special characters or Unicode symbols that the parser replaces with garbled text
Placing your name as an image or WordArt rather than plain text

