Mistral Adds a New API That Turns Any PDF Document Into an AI-Ready Markdown File
(Friday March 07, 2025 @05:00AM (BeauHD)
from the next-big-leap dept.)
- Reference: 0176639913
- News link: https://slashdot.org/story/25/03/07/0426243/mistral-adds-a-new-api-that-turns-any-pdf-document-into-an-ai-ready-markdown-file
- Source link:
Mistral has launched a new multimodal [1]OCR API that [2]converts complex PDF documents into AI-friendly Markdown files . The API is designed for efficiency, handles visual elements like illustrations, supports complex formatting such as mathematical expressions, and reportedly outperforms similar offerings from major competitors. TechCrunch reports:
> Unlike most OCR APIs, Mistral OCR is a multimodal API, meaning that it can detect when there are illustrations and photos intertwined with blocks of text. The OCR API creates bounding boxes around these graphical elements and includes them in the output. Mistral OCR also doesn't just output a big wall of text; the output is formatted in Markdown, a formatting syntax that developers use to add links, headers, and other formatting elements to a plain text file.
>
> Mistral OCR is available on Mistral's own API platform or through its cloud partners (AWS, Azure, Google Cloud Vertex, etc.). And for companies working with classified or sensitive data, Mistral offers on-premise deployment. According to the Paris-based AI company, Mistral OCR performs better than APIs from Google, Microsoft, and OpenAI. The company has tested its OCR model with complex documents that include mathematical expressions (LaTeX formatting), advanced layouts, or tables. It is also supposed to perform better with non-English documents. [...]
>
> Mistral is also using Mistral OCR for its own AI assistant Le Chat. When a user uploads a PDF file, the company uses Mistral OCR in the background to understand what's in the document before processing the text. Companies and developers will most likely use Mistral OCR with a RAG (aka Retrieval-Augmented Generation) system to use multimodal documents as input in an LLM. And there are many potential use cases. For instance, we could envisage law firms using it to help them swiftly plough through huge volumes of documents.
"Over the years, organizations have accumulated numerous documents, often in PDF or slide formats, which are inaccessible to LLMs, particularly RAG systems. With Mistral OCR, our customers can now convert rich and complex documents into readable content in all languages," said Mistral co-founder and chief science officer Guillaume Lample.
"This is a crucial step toward the widespread adoption of AI assistants in companies that need to simplify access to their vast internal documentation," he added.
[1] https://mistral.ai/news/mistral-ocr
[2] https://techcrunch.com/2025/03/06/mistrals-new-ocr-api-turns-any-pdf-document-into-an-ai-ready-markdown-file/
> Unlike most OCR APIs, Mistral OCR is a multimodal API, meaning that it can detect when there are illustrations and photos intertwined with blocks of text. The OCR API creates bounding boxes around these graphical elements and includes them in the output. Mistral OCR also doesn't just output a big wall of text; the output is formatted in Markdown, a formatting syntax that developers use to add links, headers, and other formatting elements to a plain text file.
>
> Mistral OCR is available on Mistral's own API platform or through its cloud partners (AWS, Azure, Google Cloud Vertex, etc.). And for companies working with classified or sensitive data, Mistral offers on-premise deployment. According to the Paris-based AI company, Mistral OCR performs better than APIs from Google, Microsoft, and OpenAI. The company has tested its OCR model with complex documents that include mathematical expressions (LaTeX formatting), advanced layouts, or tables. It is also supposed to perform better with non-English documents. [...]
>
> Mistral is also using Mistral OCR for its own AI assistant Le Chat. When a user uploads a PDF file, the company uses Mistral OCR in the background to understand what's in the document before processing the text. Companies and developers will most likely use Mistral OCR with a RAG (aka Retrieval-Augmented Generation) system to use multimodal documents as input in an LLM. And there are many potential use cases. For instance, we could envisage law firms using it to help them swiftly plough through huge volumes of documents.
"Over the years, organizations have accumulated numerous documents, often in PDF or slide formats, which are inaccessible to LLMs, particularly RAG systems. With Mistral OCR, our customers can now convert rich and complex documents into readable content in all languages," said Mistral co-founder and chief science officer Guillaume Lample.
"This is a crucial step toward the widespread adoption of AI assistants in companies that need to simplify access to their vast internal documentation," he added.
[1] https://mistral.ai/news/mistral-ocr
[2] https://techcrunch.com/2025/03/06/mistrals-new-ocr-api-turns-any-pdf-document-into-an-ai-ready-markdown-file/
since when is everything API ? (Score:2)
I get that AI is costly, hard to run on the average desktop, and hence APIs. However, it seems we've shifted as a community to fully supporting proprietary web APIs where we not only don't have source code but even lack the binaries. Couldn't slashdot moderators help by promoting truly open source tools, especially AI, that can be run locally, even if it requires a 10k hardware? It seems very reasonable if a company sells API as cost-sharing model. Even so, at least we could support entities that fully ope
Re: (Score:2)
"Couldn't slashdot moderators help by promoting truly open source tools"
Hmm. We can assume that moderator help would be marking any comment not mentioning Open Source as "-1 Troll" then ?
Re: (Score:2)
[ Sorry, second comment, different point. ]
I agree partly. I will not use this proprietary API but will use open source libraries to do the same job.
BUT:
As a developer, had I written, say, something to do this and wanted to earn money to fund myself to do further development, the simple, regrettable fact is that I think I could make more by releasing it as a proprietary API than by releasing it as an open source library.
To help Open Source, that is what we have to change.