프로그램/C++

vc++에서 간단히 사용헐 수 있는 html 파싱 dll

네오류이 2021. 1. 11. 21:07

"이 포스팅은 쿠팡 파트너스 활동의 일환으로, 이에 따른 일정액의 수수료를 제공받습니다."

728x90

vc++에서 간단히 사용헐 수 있는 html 파싱 dll

http://htmlcxx.sourceforge.net/

htmlcxx - html and css APIs for C++

Description

htmlcxx is a simple non-validating css1 and html parser for C++. Although there are several other html parsers available, htmlcxx has some characteristics that make it unique:

STL like navigation of DOM tree, using excelent's tree.hh library from Kasper Peeters
It is possible to reproduce exactly, character by character, the original document from the parse tree
Bundled css parser
Optional parsing of attributes
C++ code that looks like C++ (not so true anymore)
Offsets of tags/elements in the original document are stored in the nodes of the DOM treeThe parsing politics of htmlcxx were created trying to mimic mozilla firefox (http://www.mozilla.org) behavior. So you should expect parse trees similar to those create by firefox. However, differently from firefox, htmlcxx does not insert non-existent stuff in your html. Therefore, serializing the DOM tree gives exactly the same bytes contained in the original HTML document.

News for version 0.85

Fixed gcc 4.3 compiler errors, several minor bug fixes, improved distribution of the css library.

News for version 0.7.3

Added utility code to escape/decode urls as defined by RFC 2396. Added new SAX interface. The API was slightly broken to support the new SAX interface :-(. Added Visual Studio 2003 projects for the WIN32 port.

Examples

Using htmlcxx is quite simple. Take a look at this example.

#include <htmlcxx/html/ParserDom.h> ... //Parse some html code string html = "<html><body>hey</body></html>"; HTML::ParserDom parser; tree<HTML::Node> dom = parser.parseTree(html); //Print whole DOM tree cout << dom << endl; //Dump all links in the tree tree<HTML::Node>::iterator it = dom.begin(); tree<HTML::Node>::iterator end = dom.end(); for (; it != end; ++it) { if (it->tagName() == "A") { it->parseAttributes(); cout << it->attributes("href"); } } //Dump all text of the document it = dom.begin(); end = dom.end(); for (; it != end; ++it) { if ((!it->isTag()) && (!it->isComment())) { cout << it->text(); } }

The htmlcxx application

htmlcxx is the name of both the library and the utility application that comes with this package. Although the htmlcxx (the application) is mostly useless for programming, you can use it to easily see how htmlcxx (the library) would parse your html code. Just install and try htmlcxx -h.

Downloads

Use the project page at sourceforge: http://sf.net/projects/htmlcxx

License Stuff

Code is now under the LGPL. This was our initial intention, and is now possible thanks to the author of tree.hh, who allowed us to use it under LGPL only for HTML::Node template instances. Check http://www.fsf.org or the COPYING file in the distribution for details about the LGPL license. The uri parsing code is a derivative work of Apache web server uri parsing routines. Check www.apache.org/licenses/LICENSE-2.0 or the ASF-2.0 file in the distribution for details.

Enjoy!

Davi de Castro Reis - <davi (a) users sf net>

Robson Braga Araújo - <braga (a) users sf net>

Last Updated: Thu Mar 24 00:56:09 2005

728x90

저작자표시 (새창열림)

'프로그램 > C++' 카테고리의 다른 글

Dialog 모달리스 사용하기 (0)	2021.01.13
클래스기반의 쓰레드 사용하기 (0)	2021.01.13
[MFC] 유니코드 멀티바이트 UTF-8 문자열 인코딩 변환 모음 (0)	2021.01.10
WM_COPYDATA 사용하기 ( OnCopyData ) (1)	2021.01.10
Sleep 을 대신할 수 있는 Wait 함수 (0)	2021.01.10

현재글vc++에서 간단히 사용헐 수 있는 html 파싱 dll

250x250

우리나라역사, 세계사. 그리스신화를 중심적으로 다루는 나의 역사 티스토리 다양한 유튜브 정보를 제공하는 티스로리

함수, 신화, 유튜브, 중국, 만들기, 추천영화, 역사, 영화, 방법, 추천, 조선, 일본, 접기, 종이접기, c++, ASP, 마이닝, 그리스, 자바스크립트, 안드로이드,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

세상사즐겁고행복하게

vc++에서 간단히 사용헐 수 있는 html 파싱 dll

htmlcxx - html and css APIs for C++

Description

News for version 0.85

News for version 0.7.3

Examples

The htmlcxx application

Downloads

License Stuff

'프로그램 > C++' 카테고리의 다른 글

'프로그램/C++'의 다른글

티스토리툴바

vc++에서 간단히 사용헐 수 있는 html 파싱 dll

htmlcxx - html and css APIs for C++

Description

News for version 0.85

News for version 0.7.3

Examples

The htmlcxx application

Downloads

License Stuff

'프로그램 > C++' 카테고리의 다른 글

'프로그램/C++'의 다른글

관련글

티스토리툴바