Sitefinity Text Extraction from Documents

This blog explains how to utilize Sitefinity’s native API to extract text from files from document libraries. Common files with text content have extensions - .docx, .xls, .pdf, .csv. This is particularly useful if you are implementing a custom search module. Another use is to extract text for updating summary and other metadata for the files for SEO. Previously we have tried using ‘Adobe iFilter’ to extract text from a PDF document, but it is limited to PDF documents. Also, Adobe iFilter needs to be installed as a plugin in the server and has additional overheads to make it work.

There are a few related Knowledge Base articles provided by Progress for this. It makes sense to Sitefinity’s native API, because those are the same ones Sitefinity uses for indexing these documents internally as well, for the search indexes. Note the disclaimer mentioned in KB articles: this code relies on libraries shipped with Sitefinity. Those libraries can, however, change without notice. If you want API stability, I recommend using a third party PDF library.

Note: Additional libraries are sometimes required to get this working, for example for processing excel:

Telerik.Windows.Documents.Spreadsheet.dll
Telerik.Windows.Documents.Spreadsheet.FormatProviders.OpenXml.dll
Telerik.Windows.Documents.Core.dll

Full source code example is available here:

https://docs.sitefinity.com/tutorial-index-the-contents-of-excel-files

Here are the related KB articles:

https://knowledgebase.progress.com/articles/Article/How-to-programmatically-parse-PDF-documents

https://knowledgebase.progress.com/articles/Article/search-index-csv-file

Here are the default Sitefinity settings for the DocumentService text extraction settings. You can see PDF text extraction libraries are available by default.

Join the 20,000+ People Who Get Web Tips in Their Inbox Weekly

* indicates required field.

Featured Posts

Artificial Intelligence

SEO

Strategy & Insights

Jul 17, 2026

Sitecore & Scrunch: Solving Brand Visibility Through AI Search Optimization

Bryan Winger

Time to read 6 min

Sitecore & Scrunch: Solving Brand Visibility Through AI Search Optimization
Artificial Intelligence

Digital Marketing

SEO

Jul 14, 2026

AI in SEO: Practical SEO Automation, Tools, & Workflows for Organic Growth

Kevin Williams

Time to read 15 min

AI in SEO: Practical SEO Automation, Tools, & Workflows for Organic Growth
Artificial Intelligence

SEO

Web & Application Development

Jul 10, 2026

What is Programmatic SEO? A Complete Guide with Examples & Tools

Lauren O'Brien

Time to read 9 min

What is Programmatic SEO? A Complete Guide with Examples & Tools

Sitefinity Text Extraction from Documents

Join the 20,000+ People Who Get Web Tips in Their Inbox Weekly

Trending Topics

About the Author

Staff

Featured Posts

Sitecore & Scrunch: Solving Brand Visibility Through AI Search Optimization

AI in SEO: Practical SEO Automation, Tools, & Workflows for Organic Growth

What is Programmatic SEO? A Complete Guide with Examples & Tools

Sitefinity Text Extraction from Documents

Share this with others.

Join the 20,000+ People Who Get Web Tips in Their Inbox Weekly

Related Blog Insights

What is Programmatic SEO? A Complete Guide with Examples & Tools

Streamlining Website Production with Web & UX Design Frameworks

How to Become a Sitecore MVP: Community Insights, Mentorship, & Advice from Americaneagle.com's MVPs

Trending Topics

About the Author

Staff

Featured Posts

Sitecore & Scrunch: Solving Brand Visibility Through AI Search Optimization

AI in SEO: Practical SEO Automation, Tools, & Workflows for Organic Growth

What is Programmatic SEO? A Complete Guide with Examples & Tools