
Recently, DeepSeek released a new multimodal model—DeepSeek-OCR—on the AI open-source platform Hugging Face, sparking widespread discussion within the industry.
On Huawei’s academic platform “Huang Danian Tea House,” technical experts highlighted that the efficient decoding of the model’s core component—the visual encoder—provides a clear technical pathway for integrating optical computing and quantum computing into the LLM (large language model) field.
On October 29, a representative from Turing Quantum told the 21st Century Business Herald that DeepSeek-OCR can more effectively leverage the high parallelism and low power consumption advantages of optical computing. They believe that applications combining optical computing chips and large models will emerge soon.
A Breakthrough in Optical Compression
Context length has long been a major bottleneck affecting the performance of large models. If the context window is too small, the model cannot fully process user inputs (such as articles), which impacts reasoning accuracy.
To address this issue, various techniques such as sparse attention and retrieval-augmented generation have been proposed. This time, DeepSeek introduced the “Contexts Optical Compression” technique, which treats text as images to achieve efficient information compression—theoretically enabling infinite context.
Qiao Nan, a technical expert on Huang Danian Tea House, believes that the new model from DeepSeek essentially simulates the human brain’s forgetting mechanism.
By processing text as images, it achieves 7–20 times token compression. For example, one page of text typically requires 2,000–5,000 text tokens, but after conversion to an image, it only needs 200–400 visual tokens. At 10x compression, 97% decoding accuracy is maintained, while at 20x compression, 60% accuracy is retained. This is the key to implementing the LLM Memory forgetting mechanism.
According to Qiao Nan, by rendering historical context from multi-turn conversations into images, LLMs can forget like humans. Recent conversations (within k rounds) remain in high-resolution text form, while older history (beyond k rounds) is compressed into images. Over time, these “memory images” can be progressively downsampled or reduced (becoming blurrier), occupying fewer tokens and simulating the biological forgetting curve in human memory—where recent information remains clear, and distant information naturally fades.
The Turing Quantum representative also stated, “DeepSeek-OCR technology renders text as images and processes them as visual information, significantly reducing the number of data segmentation and assembly operations, thereby lowering the overall computational load. This data encoding mechanism reduces the direct pressure on backend computing hardware (whether electronic or optical chips) in terms of scale and precision.”
Furthermore, regarding optical computing, the representative added, “(This model) can also reduce the number of photoelectric conversions, allowing the high parallelism and low power consumption advantages of optical computing to be more effectively utilized. We believe that applications combining optical computing chips and large models will soon emerge.”
Possibly Triggering a Hardware Revolution
By transforming text problems into image problems, DeepSeek’s OCR technology may pave the way for optical computing chips to enter the large language model domain.
Optical computing chips are regarded as a promising technology in the “post-Moore’s Law era,” leveraging light-speed transmission, high parallelism, and low power consumption to offer new possibilities for compute-intensive tasks like AI.
Qiao Nan believes that one of the core advantages of optical computing is its ability to perform specific computations, such as Fourier transforms and large-scale parallel processing, at extremely high speeds and with minimal power consumption. Previously, the main obstacle to introducing optical computing was the excessively long sequential context, which optical chips couldn’t handle efficiently. Now, DeepSeek-OCR’s native optical encoding mechanism solves this fundamental issue.
In Qiao Nan’s vision, the DeepEncoder (visual encoder) part of DeepSeek-OCR could become a module ideally executed by an optical coprocessor, while text decoding (the Decoder part) would still be handled by electronic chips (GPU/NPU). Additionally, memory compression (text → image → visual token) could be entirely assigned to optical computing chips, achieving optimal task division.
However, due to limitations in technology, manufacturing, and ecosystem development, optical computing chips are still in the early stages of industrialization.
The Turing Quantum representative mentioned that optical chips currently face two main challenges: First, advanced photoelectric integrated packaging is needed to efficiently integrate light sources, chips, and detection devices on a single chip, ensuring stable collaboration with electronic control units. Second, the software ecosystem for optical computing is not yet mature enough, making large-scale development and optimization of optical computing applications challenging.
It is understood that major domestic players include Xizhi Technology, Turing Quantum, and Guangbenwei, while international companies like Lightmatter, Lumai, and Cerebras Systems are also active in this field.
Turing Quantum has been conducting research around thin-film lithium niobate (TFLN), with full-process capabilities from design, layout, and tape-out to testing and packaging. After years of technological iteration and optimization, the company has achieved large-scale production of TFLN products.
The Turing Quantum representative candidly stated, “Optical computing chips have entered the early lane of industrialization, but it may take another 3–5 years to overcome engineering, cost, and ecosystem challenges before they can compete with GPUs in data centers.”
