• JackbyDev
      link
      fedilink
      English
      27 months ago

      Oh nice, thanks for sharing that project. I haven’t heard of it before!

    • @thevoidzero@lemmy.world
      link
      fedilink
      English
      17 months ago

      Not just semantics. PDFs doesn’t even have segmentations like spaces/lines/paragraph. It’s just text drawn at locations the text processor/any other softwares inserted into. Many pdf editor softwares just detect the closeness of the characters to group them together.

      And one step further is you can convert text to path, which basically won’t even have glyph (characters) info and font info, all characters will just be geometric shapes. In that case you can’t even copy the text. OCR is your only choice.

      PDF is for finalizing something and printing/sharing without the ability to edit.