FreeBSD 上使用 wkhtmltopdf 做網頁擷取

以往要用程式控制將網頁輸出pdf或擷取網頁,都是件浩大工程.
這次介紹如何在FreeBSD 命令列模式下即可輕易辦到,並且不需龐大的X Windows 圖形系統適合在server上跑.

以下是軟體原文介紹.

wkhtmltopdf

Convert html to pdf using webkit (qtwebkit)

Description

Simple shell utility to convert html to pdf using the webkit rendering engine, and qt.

Introduction

Searching the web, I have found several command line tools that allow you to convert a HTML-document to a PDF-document, however they all seem to use their own, and rather incomplete rendering engine, resulting in poor quality. Recently QT 4.4 was released with a WebKit widget (WebKit is the engine of Apples Safari, which is a fork of the KDE KHtml), and making a good tool became very easy.
此軟體使用WebKit開發完成,除了flash以外其他接可正常顯示!!(包含js)
在安裝前請先確定你的FreeBSD已安裝 linux-base 套件並正常使用,並且將port tree更新.

1.安裝linux-expat

# cd /usr/ports/textproc/linux-f10-expat;make install clean;
===>  License check disabled, port has not defined LICENSE
=> expat-2.0.1-5.i386.rpm doesn't seem to exist in /usr/ports/distfiles/rpm/i386/fedora/10.
=> Attempting to fetch from http://ftp.tw.freebsd.org/pub/FreeBSD/distfiles/rpm/i386/fedora/10/.
expat-2.0.1-5.i386.rpm                        100% of   82 kB  244 kBps
===>  Extracting for linux-f10-expat-2.0.1
=> MD5 Checksum OK for rpm/i386/fedora/10/expat-2.0.1-5.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/expat-2.0.1-5.i386.rpm.
===>   linux-f10-expat-2.0.1 depends on file: /usr/local/bin/rpm2cpio - found
===>  Patching for linux-f10-expat-2.0.1
===>  Configuring for linux-f10-expat-2.0.1
===>  Installing for linux-f10-expat-2.0.1
===>   linux-f10-expat-2.0.1 depends on file: /compat/linux/etc/fedora-release - found
===>   Generating temporary packing list
===>  Checking if textproc/linux-f10-expat already installed
cd /usr/ports/textproc/linux-f10-expat/work && /usr/bin/find * -type d -exec /bin/mkdir -p "/compat/linux/{}" \;
cd /usr/ports/textproc/linux-f10-expat/work && /usr/bin/find * ! -type d | /usr/bin/cpio -pm -R root:wheel /compat/linux
367 blocks
===>   Running linux ldconfig
/compat/linux/sbin/ldconfig -r /compat/linux
===>   Registering installation for linux-f10-expat-2.0.1
===>  Cleaning for linux-f10-expat-2.0.1

2.安裝linux-fontconfig

# cd /usr/ports/x11-fonts/linux-f10-fontconfig; make install clean;
===>  License check disabled, port has not defined LICENSE
=> fontconfig-2.6.0-3.fc10.i386.rpm doesn't seem to exist in /usr/ports/distfiles/rpm/i386/fedora/10.
=> Attempting to fetch from http://ftp.tw.freebsd.org/pub/FreeBSD/distfiles/rpm/i386/fedora/10/.
fontconfig-2.6.0-3.fc10.i386.rpm              100% of  182 kB  241 kBps
===>  Extracting for linux-f10-fontconfig-2.6.0
=> MD5 Checksum OK for rpm/i386/fedora/10/fontconfig-2.6.0-3.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/fontconfig-2.6.0-3.fc10.i386.rpm.
===>   linux-f10-fontconfig-2.6.0 depends on file: /usr/local/bin/rpm2cpio - found
===>  Patching for linux-f10-fontconfig-2.6.0
===>  Configuring for linux-f10-fontconfig-2.6.0
===>  Installing for linux-f10-fontconfig-2.6.0
===>   linux-f10-fontconfig-2.6.0 depends on file: /compat/linux/etc/fedora-release - found
===>   linux-f10-fontconfig-2.6.0 depends on file: /compat/linux/lib/libexpat.so.1 - found
===>   Generating temporary packing list
===>  Checking if x11-fonts/linux-f10-fontconfig already installed
cd /usr/ports/x11-fonts/linux-f10-fontconfig/work && /usr/bin/find * -type d -exec /bin/mkdir -p "/compat/linux/{}" \;
cd /usr/ports/x11-fonts/linux-f10-fontconfig/work && /usr/bin/find * ! -type d | /usr/bin/cpio -pm -R root:wheel /compat/linux
617 blocks
===>   Running linux ldconfig
/compat/linux/sbin/ldconfig -r /compat/linux
===>   Registering installation for linux-f10-fontconfig-2.6.0
===>  Cleaning for linux-f10-fontconfig-2.6.0

3.安裝 linux-xorg-libs

# cd /usr/ports/x11/linux-f10-xorg-libs; make install clean;
===>  Extracting for linux-f10-xorg-libs-7.4_1
=> MD5 Checksum OK for rpm/i386/fedora/10/libICE-1.0.4-4.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libICE-1.0.4-4.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libFS-1.0.1-2.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libFS-1.0.1-2.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libSM-1.1.0-2.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libSM-1.1.0-2.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libX11-1.1.5-4.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libX11-1.1.5-4.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXScrnSaver-1.1.3-1.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXScrnSaver-1.1.3-1.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXTrap-1.0.0-6.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXTrap-1.0.0-6.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXau-1.0.4-1.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXau-1.0.4-1.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXaw-1.0.4-3.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXaw-1.0.4-3.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXcomposite-0.4.0-5.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXcomposite-0.4.0-5.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXcursor-1.1.9-3.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXcursor-1.1.9-3.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXdamage-1.1.1-4.fc9.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXdamage-1.1.1-4.fc9.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXdmcp-1.0.2-6.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXdmcp-1.0.2-6.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXevie-1.0.2-4.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXevie-1.0.2-4.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXext-1.0.4-1.fc9.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXext-1.0.4-1.fc9.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXfixes-4.0.3-4.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXfixes-4.0.3-4.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXfont-1.3.3-1.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXfont-1.3.3-1.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXft-2.1.13-1.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXft-2.1.13-1.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXi-1.1.3-4.fc9.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXi-1.1.3-4.fc9.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXinerama-1.0.3-2.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXinerama-1.0.3-2.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXmu-1.0.4-1.fc9.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXmu-1.0.4-1.fc9.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXp-1.0.0-11.fc9.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXp-1.0.0-11.fc9.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXpm-3.5.7-4.fc9.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXpm-3.5.7-4.fc9.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXrandr-1.2.3-1.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXrandr-1.2.3-1.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXrender-0.9.4-3.fc9.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXrender-0.9.4-3.fc9.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXres-1.0.3-5.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXres-1.0.3-5.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXt-1.0.5-1.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXt-1.0.5-1.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXtst-1.0.3-3.fc9.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXtst-1.0.3-3.fc9.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXv-1.0.4-1.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXv-1.0.4-1.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXvMC-1.0.4-5.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXvMC-1.0.4-5.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXxf86dga-1.0.2-3.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXxf86dga-1.0.2-3.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXxf86misc-1.0.1-6.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXxf86misc-1.0.1-6.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libXxf86vm-1.0.2-1.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libXxf86vm-1.0.2-1.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libfontenc-1.0.4-6.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libfontenc-1.0.4-6.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libxcb-1.1.91-5.fc10.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libxcb-1.1.91-5.fc10.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/libxkbfile-1.0.4-5.fc9.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/libxkbfile-1.0.4-5.fc9.i386.rpm.
=> MD5 Checksum OK for rpm/i386/fedora/10/mesa-libGLw-6.5.1-5.fc9.i386.rpm.
=> SHA256 Checksum OK for rpm/i386/fedora/10/mesa-libGLw-6.5.1-5.fc9.i386.rpm.
===>   linux-f10-xorg-libs-7.4_1 depends on file: /usr/local/bin/rpm2cpio - found
===>  Patching for linux-f10-xorg-libs-7.4_1
===>  Configuring for linux-f10-xorg-libs-7.4_1
===>  Installing for linux-f10-xorg-libs-7.4_1
===>   linux-f10-xorg-libs-7.4_1 depends on file: /compat/linux/etc/fedora-release - found
===>   linux-f10-xorg-libs-7.4_1 depends on file: /compat/linux/lib/libexpat.so.1 - found
===>   linux-f10-xorg-libs-7.4_1 depends on file: /compat/linux/usr/lib/libfontconfig.so.1.3.0 - found
===>   Generating temporary packing list
===>  Checking if x11/linux-f10-xorg-libs already installed
cd /usr/ports/x11/linux-f10-xorg-libs/work && /usr/bin/find * -type d -exec /bin/mkdir -p "/compat/linux/{}" \;
cd /usr/ports/x11/linux-f10-xorg-libs/work && /usr/bin/find * ! -type d | /usr/bin/cpio -pm -R root:wheel /compat/linux
12139 blocks
===>   Running linux ldconfig
/compat/linux/sbin/ldconfig -r /compat/linux
===>   Registering installation for linux-f10-xorg-libs-7.4_1
===> SECURITY REPORT:
      This port has installed the following files which may act as network
      servers and may therefore pose a remote security risk to the system.
/compat/linux/usr/lib/libICE.so.6.3.0
/compat/linux/usr/lib/libXdmcp.so.6.0.0

      If there are vulnerabilities in these programs there may be a security
      risk to the system. FreeBSD makes no guarantee about the security of
      ports included in the Ports Collection. Please type 'make deinstall'
      to deinstall the port if this is a concern.

      For more information, and contact details about the security
      status of this software, see the following webpage:
http://x.org
===>  Cleaning for linux-f10-xorg-libs-7.4_1

4.安裝中文字型cwttf

# wget http://cle.linux.org.tw/fonts/cwttf/cwttf-v1.0.tar.gz
# cp * /usr/local/lib/X11/fonts/TTF
# fc-cache -f -v
/usr/local/lib/X11/fonts: caching, new cache contents: 0 fonts, 12 dirs
/usr/local/lib/X11/fonts/100dpi: caching, new cache contents: 398 fonts, 0 dirs
/usr/local/lib/X11/fonts/75dpi: caching, new cache contents: 398 fonts, 0 dirs
/usr/local/lib/X11/fonts/OTF: caching, new cache contents: 23 fonts, 0 dirs
/usr/local/lib/X11/fonts/TTF: caching, new cache contents: 31 fonts, 0 dirs
/usr/local/lib/X11/fonts/Type1: caching, new cache contents: 29 fonts, 0 dirs
/usr/local/lib/X11/fonts/bitstream-vera: caching, new cache contents: 10 fonts, 0 dirs
/usr/local/lib/X11/fonts/cyrillic: caching, new cache contents: 0 fonts, 0 dirs
/usr/local/lib/X11/fonts/encodings: caching, new cache contents: 0 fonts, 1 dirs
/usr/local/lib/X11/fonts/encodings/large: caching, new cache contents: 0 fonts, 0 dirs
/usr/local/lib/X11/fonts/lfpfonts-fix: caching, new cache contents: 71 fonts, 0 dirs
/usr/local/lib/X11/fonts/local: caching, new cache contents: 2 fonts, 0 dirs
/usr/local/lib/X11/fonts/misc: caching, new cache contents: 59 fonts, 0 dirs
/usr/local/lib/X11/fonts/util: caching, new cache contents: 0 fonts, 0 dirs
/root/.fonts: skipping, no such directory
/var/db/fontconfig: cleaning cache directory
/root/.fontconfig: not cleaning non-existent cache directory
fc-cache: succeeded
# fc-list :lang=zh-tw
文鼎PL中楷,AR PL KaitiM Big5:style=Regular
AR PL UMing TW:style=Light
AR PL UMing HK:style=Light
cwTeX 粗黑體,cwTeXHeiBold:style=Medium
AR PL UMing CN:style=Light
文鼎PL新宋,AR PL New Sung:style=Regular
AR PL UKai TW MBE:style=Book
cwTeX 仿宋體,cwTeXFangSong:style=Medium
cwTeX 明體,cwTeXMing:style=Medium
AR PL UKai CN:style=Book
AR PL UKai HK:style=Book
cwTeX 楷書,cwTeXKai:style=Medium
AR PL UKai TW:style=Book
文鼎PL細上海宋,AR PL Mingti2L Big5:style=Regular,Reguler
AR PL UMing TW MBE:style=Light
cwTeX 圓體,cwTeXYen:style=Medium

5.下載wkhtmltopdf Linux Static Binary (i368)

wget http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.10.0_beta4-static-i386.tar.bz2
--2010-08-03 20:13:15--  http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.10.0_beta4-static-i386.tar.bz2
正在查找主機 wkhtmltopdf.googlecode.com... 64.233.183.82
正在連接 wkhtmltopdf.googlecode.com|64.233.183.82|:80... 連上了。
已送出 HTTP 要求,正在等候回應... 200 OK
長度: 11712708 (11M) [application/x-bzip2]
Saving to: `wkhtmltopdf-0.10.0_beta4-static-i386.tar.bz2'

100%[=================================================================================================================================================================================>] 11,712,708  1.08M/s   in 13s

2010-08-03 20:13:29 (881 KB/s) -- 已儲存 ‘wkhtmltopdf-0.10.0_beta4-static-i386.tar.bz2’ [11712708/11712708])

6.執行

# ./wkhtmltopdf-i386
You need to specify atleast one input file, and exactly one output file
Use - for stdin or stdout

Name:
  wkhtmltopdf 0.10.0 beta4

Synopsis:
  wkhtmltopdf [GLOBAL OPTION]... [OBJECT]... 

Document objects:
  wkhtmltopdf is able to put several objecs into the output file, an object is
  either a single webpage, a cover webpage or a table of content.  The objects
  are put into the output document in the order they are specified on the
  commandline, options can be specified on a per object basis or in the global
  options area. Options from the Global Options section can only be placed in
  the global options area

  A page objects puts the content of a singe webpage into the output document.

  (page)? <  input url/file name  > [PAGE OPTION]...
  Options for the page object can be placed in the global options and the page
  options areas. The applicable options can be found in the Page Options and
  Headers And Footer Options sections.

  A cover objects puts the content of a singe webpage into the output document,
  the page does not appear in the table of content, and does not have headers
  and footers.

  cover <  input url/file name > [PAGE OPTION]...
  All options that can be specified for a page object can also be specified for
  a cover.

  A table of content object inserts a table of content into the output document.

  toc [TOC OPTION]...
  All options that can be specified for a page object can also be specified for
  a toc, further more the options from the TOC Options section can also be
  applied. The table of content is generated via xslt which means that it can be
  styled to look however you want it to look. To get an idear of how to do this
  you can dump the default xslt document by supplying the
  --dump-default-toc-xsl, and the outline it works on by supplying
  --dump-outline, see the Outline Options section.

Description:
  Converts one or more HTML pages into a PDF document, using wkhtmltopdf patched
  qt.

Global Options:
      --collate                       Collate when printing multiple copies
                                      (default)
      --no-collate                    Do not collate when printing multiple
                                      copies
      --copies                Number of copies to print into the pdf
                                      file (default 1)
  -H, --extended-help                 Display more extensive help, detailing
                                      less common command switches
  -g, --grayscale                     PDF will be generated in grayscale
  -h, --help                          Display help
  -l, --lowquality                    Generates lower quality pdf/ps. Useful to
                                      shrink the result document space
  -O, --orientation      Set orientation to Landscape or Portrait
                                      (default Portrait)
  -s, --page-size               Set paper size to: A4, Letter, etc.
                                      (default A4)
  -q, --quiet                         Be less verbose
      --read-args-from-stdin          Read command line arguments from stdin
      --title                   The title of the generated pdf file (The
                                      title of the first document is used if not
                                      specified)
  -V, --version                       Output version information an exit

Contact:
  If you experience bugs or want to request new features please visit
  , if you have any problems
  or comments please feel free to contact me: see
  

example:

# ./wkhtmltopdf-i386 http://tw.yahoo.com/ test.pdf


PDF SAMPLE
2011-03-04補充:
如需FAX轉成TIF 則配合ImagMagic使用:

1
wkhtmltoimage-i386 test.html test.png;convert test.png -colorspace HWB -monochrome -compress Fax test.tif

2 thoughts on “FreeBSD 上使用 wkhtmltopdf 做網頁擷取

  1. Hi,
    First of all, that for this great install tuto.
    Installing the same ports in PCBSD8.1 (PCBSD8.1 = FreeBSD8.1 + KDE) doesn’t allow me to run wkhtmltopdf.

    I got the following error when running the linux statically compiled version “wkhtmltopdf-0.10.0-beta5″ (i386):

    # file ./wkhtmltopdf-i386
    ./wkhtmltopdf-i386: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, stripped

    This is how it fails:
    # brandelf -t Linux wkhtmltopdf-i386

    # ./wkhtmltopdf-i386
    PROT_EXEC|PROT_WRITE failed.

    # truss ./wkhtmltopdf-i386
    linux_mmap(0xbfbfed28,0x1000,0xc01000,0x16b6bc6,0x 0,0×6) ERR#22 ‘Invalid argument’
    write(2,"PROT_EXEC|PROT_WRITE failed.\n",29) ERR#9 ‘Bad file descriptor’
    process exit, rval = 127

    # kdump
    29785 ktrace RET ktrace 0
    29785 ktrace CALL execve(0xbfbfee1b,0xbfbfed04,0xbfbfed0c)
    29785 ktrace NAMI “./wkhtmltopdf-i386″
    29785 wkhtmltopdf-i386 RET execve 0
    29785 wkhtmltopdf-i386 CALL dup2(0xbfbfed3c)
    29785 wkhtmltopdf-i386 RET dup2 -1 errno 22 Invalid argument
    29785 wkhtmltopdf-i386 CALL write(0x2,0x16b6b4e,0x1d)
    29785 wkhtmltopdf-i386 GIO fd 2 wrote 29 bytes
    “PROT_EXEC|PROT_WRITE failed. "
    29785 wkhtmltopdf-i386 RET write 29/0x1d
    29785 wkhtmltopdf-i386 CALL exit(0x7f)

    Help is appreciated!

    Thanks in advance
    Zabby

發表迴響

你的電子郵件位址並不會被公開。 必要欄位標記為 *