POSTS
FreeBSD 上使用 wkhtmltopdf 做網頁擷取
以往要用程式控制將網頁輸出pdf或擷取網頁,都是件浩大工程.
這次介紹如何在FreeBSD 命令列模式下即可輕易辦到,並且不需龐大的X Windows 圖形系統適合在server上跑.
以下是軟體原文介紹.
wkhtmltopdf
Convert html to pdf using webkit (qtwebkit)
Description
Simple shell utility to convert html to pdf using the webkit rendering engine, and qt.
Introduction
Searching the web, I have found several command line tools that allow you to convert a HTML-document to a PDF-document, however they all seem to use their own, and rather incomplete rendering engine, resulting in poor quality. Recently QT 4.4 was released with a WebKit widget (WebKit is the engine of Apples Safari, which is a fork of the KDE KHtml), and making a good tool became very easy.
此軟體使用WebKit開發完成,除了flash以外其他接可正常顯示!!(包含js)
在安裝前請先確定你的FreeBSD已安裝 linux-base 套件並正常使用,並且將port tree更新.
1.安裝linux-expat
# cd /usr/ports/textproc/linux-f10-expat;make install clean; ===> License check disabled, port has not defined LICENSE => expat-2.0.1-5.i386.rpm doesn't seem to exist in /usr/ports/distfiles/rpm/i386/fedora/10. => Attempting to fetch from http://ftp.tw.freebsd.org/pub/FreeBSD/distfiles/rpm/i386/fedora/10/. expat-2.0.1-5.i386.rpm 100% of 82 kB 244 kBps ===> Extracting for linux-f10-expat-2.0.1 => MD5 Checksum OK for rpm/i386/fedora/10/expat-2.0.1-5.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/expat-2.0.1-5.i386.rpm. ===> linux-f10-expat-2.0.1 depends on file: /usr/local/bin/rpm2cpio - found ===> Patching for linux-f10-expat-2.0.1 ===> Configuring for linux-f10-expat-2.0.1 ===> Installing for linux-f10-expat-2.0.1 ===> linux-f10-expat-2.0.1 depends on file: /compat/linux/etc/fedora-release - found ===> Generating temporary packing list ===> Checking if textproc/linux-f10-expat already installed cd /usr/ports/textproc/linux-f10-expat/work && /usr/bin/find * -type d -exec /bin/mkdir -p "/compat/linux/{}" \; cd /usr/ports/textproc/linux-f10-expat/work && /usr/bin/find * ! -type d | /usr/bin/cpio -pm -R root:wheel /compat/linux 367 blocks ===> Running linux ldconfig /compat/linux/sbin/ldconfig -r /compat/linux ===> Registering installation for linux-f10-expat-2.0.1 ===> Cleaning for linux-f10-expat-2.0.1
2.安裝linux-fontconfig
# cd /usr/ports/x11-fonts/linux-f10-fontconfig; make install clean; ===> License check disabled, port has not defined LICENSE => fontconfig-2.6.0-3.fc10.i386.rpm doesn't seem to exist in /usr/ports/distfiles/rpm/i386/fedora/10. => Attempting to fetch from http://ftp.tw.freebsd.org/pub/FreeBSD/distfiles/rpm/i386/fedora/10/. fontconfig-2.6.0-3.fc10.i386.rpm 100% of 182 kB 241 kBps ===> Extracting for linux-f10-fontconfig-2.6.0 => MD5 Checksum OK for rpm/i386/fedora/10/fontconfig-2.6.0-3.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/fontconfig-2.6.0-3.fc10.i386.rpm. ===> linux-f10-fontconfig-2.6.0 depends on file: /usr/local/bin/rpm2cpio - found ===> Patching for linux-f10-fontconfig-2.6.0 ===> Configuring for linux-f10-fontconfig-2.6.0 ===> Installing for linux-f10-fontconfig-2.6.0 ===> linux-f10-fontconfig-2.6.0 depends on file: /compat/linux/etc/fedora-release - found ===> linux-f10-fontconfig-2.6.0 depends on file: /compat/linux/lib/libexpat.so.1 - found ===> Generating temporary packing list ===> Checking if x11-fonts/linux-f10-fontconfig already installed cd /usr/ports/x11-fonts/linux-f10-fontconfig/work && /usr/bin/find * -type d -exec /bin/mkdir -p "/compat/linux/{}" \; cd /usr/ports/x11-fonts/linux-f10-fontconfig/work && /usr/bin/find * ! -type d | /usr/bin/cpio -pm -R root:wheel /compat/linux 617 blocks ===> Running linux ldconfig /compat/linux/sbin/ldconfig -r /compat/linux ===> Registering installation for linux-f10-fontconfig-2.6.0 ===> Cleaning for linux-f10-fontconfig-2.6.0
3.安裝 linux-xorg-libs
# cd /usr/ports/x11/linux-f10-xorg-libs; make install clean; ===> Extracting for linux-f10-xorg-libs-7.4_1 => MD5 Checksum OK for rpm/i386/fedora/10/libICE-1.0.4-4.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libICE-1.0.4-4.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libFS-1.0.1-2.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libFS-1.0.1-2.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libSM-1.1.0-2.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libSM-1.1.0-2.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libX11-1.1.5-4.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libX11-1.1.5-4.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXScrnSaver-1.1.3-1.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXScrnSaver-1.1.3-1.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXTrap-1.0.0-6.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXTrap-1.0.0-6.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXau-1.0.4-1.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXau-1.0.4-1.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXaw-1.0.4-3.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXaw-1.0.4-3.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXcomposite-0.4.0-5.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXcomposite-0.4.0-5.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXcursor-1.1.9-3.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXcursor-1.1.9-3.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXdamage-1.1.1-4.fc9.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXdamage-1.1.1-4.fc9.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXdmcp-1.0.2-6.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXdmcp-1.0.2-6.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXevie-1.0.2-4.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXevie-1.0.2-4.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXext-1.0.4-1.fc9.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXext-1.0.4-1.fc9.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXfixes-4.0.3-4.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXfixes-4.0.3-4.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXfont-1.3.3-1.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXfont-1.3.3-1.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXft-2.1.13-1.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXft-2.1.13-1.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXi-1.1.3-4.fc9.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXi-1.1.3-4.fc9.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXinerama-1.0.3-2.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXinerama-1.0.3-2.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXmu-1.0.4-1.fc9.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXmu-1.0.4-1.fc9.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXp-1.0.0-11.fc9.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXp-1.0.0-11.fc9.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXpm-3.5.7-4.fc9.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXpm-3.5.7-4.fc9.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXrandr-1.2.3-1.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXrandr-1.2.3-1.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXrender-0.9.4-3.fc9.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXrender-0.9.4-3.fc9.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXres-1.0.3-5.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXres-1.0.3-5.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXt-1.0.5-1.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXt-1.0.5-1.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXtst-1.0.3-3.fc9.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXtst-1.0.3-3.fc9.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXv-1.0.4-1.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXv-1.0.4-1.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXvMC-1.0.4-5.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXvMC-1.0.4-5.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXxf86dga-1.0.2-3.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXxf86dga-1.0.2-3.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXxf86misc-1.0.1-6.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXxf86misc-1.0.1-6.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libXxf86vm-1.0.2-1.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libXxf86vm-1.0.2-1.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libfontenc-1.0.4-6.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libfontenc-1.0.4-6.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libxcb-1.1.91-5.fc10.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libxcb-1.1.91-5.fc10.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/libxkbfile-1.0.4-5.fc9.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/libxkbfile-1.0.4-5.fc9.i386.rpm. => MD5 Checksum OK for rpm/i386/fedora/10/mesa-libGLw-6.5.1-5.fc9.i386.rpm. => SHA256 Checksum OK for rpm/i386/fedora/10/mesa-libGLw-6.5.1-5.fc9.i386.rpm. ===> linux-f10-xorg-libs-7.4_1 depends on file: /usr/local/bin/rpm2cpio - found ===> Patching for linux-f10-xorg-libs-7.4_1 ===> Configuring for linux-f10-xorg-libs-7.4_1 ===> Installing for linux-f10-xorg-libs-7.4_1 ===> linux-f10-xorg-libs-7.4_1 depends on file: /compat/linux/etc/fedora-release - found ===> linux-f10-xorg-libs-7.4_1 depends on file: /compat/linux/lib/libexpat.so.1 - found ===> linux-f10-xorg-libs-7.4_1 depends on file: /compat/linux/usr/lib/libfontconfig.so.1.3.0 - found ===> Generating temporary packing list ===> Checking if x11/linux-f10-xorg-libs already installed cd /usr/ports/x11/linux-f10-xorg-libs/work && /usr/bin/find * -type d -exec /bin/mkdir -p "/compat/linux/{}" \; cd /usr/ports/x11/linux-f10-xorg-libs/work && /usr/bin/find * ! -type d | /usr/bin/cpio -pm -R root:wheel /compat/linux 12139 blocks ===> Running linux ldconfig /compat/linux/sbin/ldconfig -r /compat/linux ===> Registering installation for linux-f10-xorg-libs-7.4_1 ===> SECURITY REPORT: This port has installed the following files which may act as network servers and may therefore pose a remote security risk to the system. /compat/linux/usr/lib/libICE.so.6.3.0 /compat/linux/usr/lib/libXdmcp.so.6.0.0 If there are vulnerabilities in these programs there may be a security risk to the system. FreeBSD makes no guarantee about the security of ports included in the Ports Collection. Please type 'make deinstall' to deinstall the port if this is a concern. For more information, and contact details about the security status of this software, see the following webpage: http://x.org ===> Cleaning for linux-f10-xorg-libs-7.4_1
4.安裝中文字型cwttf
# wget http://cle.linux.org.tw/fonts/cwttf/cwttf-v1.0.tar.gz # cp * /usr/local/lib/X11/fonts/TTF # fc-cache -f -v /usr/local/lib/X11/fonts: caching, new cache contents: 0 fonts, 12 dirs /usr/local/lib/X11/fonts/100dpi: caching, new cache contents: 398 fonts, 0 dirs /usr/local/lib/X11/fonts/75dpi: caching, new cache contents: 398 fonts, 0 dirs /usr/local/lib/X11/fonts/OTF: caching, new cache contents: 23 fonts, 0 dirs /usr/local/lib/X11/fonts/TTF: caching, new cache contents: 31 fonts, 0 dirs /usr/local/lib/X11/fonts/Type1: caching, new cache contents: 29 fonts, 0 dirs /usr/local/lib/X11/fonts/bitstream-vera: caching, new cache contents: 10 fonts, 0 dirs /usr/local/lib/X11/fonts/cyrillic: caching, new cache contents: 0 fonts, 0 dirs /usr/local/lib/X11/fonts/encodings: caching, new cache contents: 0 fonts, 1 dirs /usr/local/lib/X11/fonts/encodings/large: caching, new cache contents: 0 fonts, 0 dirs /usr/local/lib/X11/fonts/lfpfonts-fix: caching, new cache contents: 71 fonts, 0 dirs /usr/local/lib/X11/fonts/local: caching, new cache contents: 2 fonts, 0 dirs /usr/local/lib/X11/fonts/misc: caching, new cache contents: 59 fonts, 0 dirs /usr/local/lib/X11/fonts/util: caching, new cache contents: 0 fonts, 0 dirs /root/.fonts: skipping, no such directory /var/db/fontconfig: cleaning cache directory /root/.fontconfig: not cleaning non-existent cache directory fc-cache: succeeded # fc-list :lang=zh-tw 文鼎PL中楷,AR PL KaitiM Big5:style=Regular AR PL UMing TW:style=Light AR PL UMing HK:style=Light cwTeX 粗黑體,cwTeXHeiBold:style=Medium AR PL UMing CN:style=Light 文鼎PL新宋,AR PL New Sung:style=Regular AR PL UKai TW MBE:style=Book cwTeX 仿宋體,cwTeXFangSong:style=Medium cwTeX 明體,cwTeXMing:style=Medium AR PL UKai CN:style=Book AR PL UKai HK:style=Book cwTeX 楷書,cwTeXKai:style=Medium AR PL UKai TW:style=Book 文鼎PL細上海宋,AR PL Mingti2L Big5:style=Regular,Reguler AR PL UMing TW MBE:style=Light cwTeX 圓體,cwTeXYen:style=Medium
5.下載wkhtmltopdf Linux Static Binary (i368)
wget http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.10.0_beta4-static-i386.tar.bz2 --2010-08-03 20:13:15-- http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.10.0_beta4-static-i386.tar.bz2 正在查找主機 wkhtmltopdf.googlecode.com... 64.233.183.82 正在連接 wkhtmltopdf.googlecode.com|64.233.183.82|:80... 連上了。 已送出 HTTP 要求,正在等候回應... 200 OK 長度: 11712708 (11M) [application/x-bzip2] Saving to: `wkhtmltopdf-0.10.0_beta4-static-i386.tar.bz2' 100%[=================================================================================================================================================================================>] 11,712,708 1.08M/s in 13s 2010-08-03 20:13:29 (881 KB/s) -- 已儲存 ‘wkhtmltopdf-0.10.0_beta4-static-i386.tar.bz2’ [11712708/11712708])
6.執行
# ./wkhtmltopdf-i386 You need to specify atleast one input file, and exactly one output file Use - for stdin or stdout Name: wkhtmltopdf 0.10.0 beta4 Synopsis: wkhtmltopdf [GLOBAL OPTION]... [OBJECT]... <output file> Document objects: wkhtmltopdf is able to put several objecs into the output file, an object is either a single webpage, a cover webpage or a table of content. The objects are put into the output document in the order they are specified on the commandline, options can be specified on a per object basis or in the global options area. Options from the Global Options section can only be placed in the global options area A page objects puts the content of a singe webpage into the output document. (page)? < input url/file name > [PAGE OPTION]... Options for the page object can be placed in the global options and the page options areas. The applicable options can be found in the Page Options and Headers And Footer Options sections. A cover objects puts the content of a singe webpage into the output document, the page does not appear in the table of content, and does not have headers and footers. cover < input url/file name > [PAGE OPTION]... All options that can be specified for a page object can also be specified for a cover. A table of content object inserts a table of content into the output document. toc [TOC OPTION]... All options that can be specified for a page object can also be specified for a toc, further more the options from the TOC Options section can also be applied. The table of content is generated via xslt which means that it can be styled to look however you want it to look. To get an idear of how to do this you can dump the default xslt document by supplying the --dump-default-toc-xsl, and the outline it works on by supplying --dump-outline, see the Outline Options section. Description: Converts one or more HTML pages into a PDF document, using wkhtmltopdf patched qt. Global Options: --collate Collate when printing multiple copies (default) --no-collate Do not collate when printing multiple copies --copies <number> Number of copies to print into the pdf file (default 1) -H, --extended-help Display more extensive help, detailing less common command switches -g, --grayscale PDF will be generated in grayscale -h, --help Display help -l, --lowquality Generates lower quality pdf/ps. Useful to shrink the result document space -O, --orientation <orientation> Set orientation to Landscape or Portrait (default Portrait) -s, --page-size <size> Set paper size to: A4, Letter, etc. (default A4) -q, --quiet Be less verbose --read-args-from-stdin Read command line arguments from stdin --title <text> The title of the generated pdf file (The title of the first document is used if not specified) -V, --version Output version information an exit Contact: If you experience bugs or want to request new features please visit <http://code.google.com/p/wkhtmltopdf/issues/list>, if you have any problems or comments please feel free to contact me: see <http://www.madalgo.au.dk/~jakobt/#about>
example:
# ./wkhtmltopdf-i386 http://tw.yahoo.com/ test.pdf[ ][2] [PDF SAMPLE][3] 2011-03-04補充: 如需FAX轉成TIF 則配合ImagMagic使用:
wkhtmltoimage-i386 test.html test.png;convert test.png -colorspace HWB -monochrome -compress Fax test.tif