
If you've ever wanted to read web pages into your C or C++ program, then this code is for you. It's the smallest possible code that will let you read arbitrary web URLs into your own program. It does NOT have any kind of interesting features, and is not coded to be fully standards compliant, but:
Literally, you should be able to drop it into your own program in less than 10 minutes, assuming you already have a working devstudio project going.
HTTP GET uses the sock_port sockets portability layer to be easy to move to Linux. I'm releasing the source under the MIT license, which means you can include it in your own programs, at your own risk, without needing to release your own source, or pay anyone anything for the right to include the source. However, it if eats your computer, you accept the risk of that -- not me.
HTTP GET does not handle URL re-directs, refresh tags, or any other fanciness like that; it lets you, the user, decide how to handle the data that the server returns, in true minimalist fashion.
Get the source at the bottom of this article (log-in required).
For documentation, look at the unit test function, read the comments in the mysteriously named header , and look at the nettest.cpp implementation file, which implements a very bare-bones wget-like application that snarfs web URLs to disk.
To add to your own project, add mynetwork.cpp and sock_port.cpp to the project. Make sure that sock_port.h and mynetwork.h can be found in your include path. Build. That's it!
To start a new request, call NewHttpRequest(url), passing a full "http://host/path" URL (:port is optional). Then keep polling the request by checking whether it's complete(); when it is, you can read() data out of it until there is no more to read. Call dispose() when you're done.
Reading data while it's streaming (not yet complete()) is also supported, as well as calling rewind() to start reading from the beginning again. That, however, concludes the feature list. I told you it was small!
You will know something went wrong when NewHttpRequest() returns NULL, or when the request is complete() but read() returns 0 bytes. You don't know WHAT went wrong -- but, hey, that's the web for you!
| Attachment | Size |
|---|---|
| http-get-20081119.zip | 13.59 KB |
Comments
maybe double post
to fix that behaviour (its due to missing conentsize information)
remove the lines if (toRead_ <= 0) complete = true in step()
fix at SOCKET_WOULDBLOCK_ERROR the line if (r < 0) with if (r <= 0)
Great but has a bug
hi Hplus,
I found your code, and I tried it, but the problem it that it does not work with larger files. I give you an example :
http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDem...
the request says it's complete too early.
I also worked around the filename problem using 'isalnum'.
Please let me know if you have an updated version,
sanx@fxteam.net.
I don't understand what the
I don't understand what the problem is. That URL returns a result set of a total of 2444 bytes (including header), and works fine when I try it.
Also, I don't understand what "it's complete too early" means, because there is no error message in the sample code that mentions "early."
First of all, thank you very
First of all, thank you very much for the reply.
What happens is that the response I get is truncated to 1458 bytes, about the size of an MTU for a DSL connection (which i use).
the response is : http://pastebin.com/m29b307c7
When I put a data breakpoint at the address of HTTPQuery::complete_ to know who changes the value. I break at line 153
:
Maybe the size of the result was too small to be troncated on your machine. Now I use 10 results and you can change the results number to see the difference.
http://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDem...
Right now I'm using libcurl and it works great, but I think it could be great to have a lightweight library too.
Let me know if you find out : sanx@fxteam.net
Thank you.
OK, now I get the same
OK, now I get the same result. I'll take a look.
Although I'm sure you'll have better results with libcurl in general, once you've gotten it installed; it's a heavy-weight library that is very capable.