Traditional Screenshot Translation
- Capture, upload, then switch windows
- Text is detached from original context
- Repeated actions slow down reading flow
MASK OFFICIAL
Translate text with masks and understand visuals with VLM. Privacy-first and on-device.
ドカン!!
このプロトタイプ、来週までに仕上げる。
Boom!
This prototype must be finished by next week.
Select only what matters. Translation appears in place without breaking visual continuity.
Beyond OCR, mask understands visual context in comics, charts, and handwritten notes.
OCR and key visual processing run locally by default, so your images stay on your device.
Switch across OpenAI, Gemini, Qwen, DeepSeek, and Ollama based on speed and quality needs.
Traditional OCR reads text only. mask VLM reads text with scene context.
| Scenario | Traditional Tool | mask (VLM) |
|---|---|---|
| Handwritten notes + sketch | Fragmented words, awkward output | Understands intent and generates coherent translation |
| Comic sound effects | Misses text or outputs garbled content | Interprets “ドカン” and translates to “Boom!” |
| Data charts | Translates title only | Summarizes chart trends in readable language |
Local-first processing pipeline with no image upload by default.
Translate dialog areas instantly while preserving original style.
Translate text and capture chart meaning with context.
Understand UI and story text in real time without context switching.
Interpret rough handwriting and sketches into readable output.
“mask changed how I read foreign comics. It really understands sound effects.” - Early tester