www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - std.net.curl get webpage asia font issue

reply "Sam Hu" <samhudotsamhu gmail.com> writes:
Greeting!

The document on this website provide an example on how to get 
webpage information by std.net.curl.It is quite straightforward:

[code]
import std.net.curl, std.stdio;

void main(){

// Return a string containing the content specified by an URL
string content = get("dlang.org");

writefln("%s\n",content);

readln;
}
[/code]

When I change get("dlang.org") to get("yahoo.com"),everything 
goes fine;but when I change to get("yahoo.com.cn"),a runtime 
error said bad gbk encoding bla...

So my very simple question is how to retrieve information from a 
webpage which could possibily contains asia font (like Chinese 
font)?

Thanks for your help in advance.

Regards,
Sam
Jun 06 2012
next sibling parent Kevin <kevincox.ca gmail.com> writes:
On 07/06/12 02:57, Sam Hu wrote:
 string content = get("dlang.org");
 writefln("%s\n",content);

 So my very simple question is how to retrieve information from a
 webpage which could possibily contains asia font (like Chinese font)?

I'm not really sure but try: wstring content = get("dlang.org"); Also make sure your terminal is set up for unicode.
Jun 07 2012
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 07.06.2012 10:57, Sam Hu wrote:
 Greeting!

 The document on this website provide an example on how to get webpage
 information by std.net.curl.It is quite straightforward:

 [code]
 import std.net.curl, std.stdio;

 void main(){

 // Return a string containing the content specified by an URL
 string content = get("dlang.org");

It's simple this line you "convert" whatever site content was to unicode. Problem is that "convert" is either broken or it's simply a cast whereas it should re-encode source as unicode. So the way around is to get it to array of bytes and decode yourself.
 writefln("%s\n",content);

 readln;
 }
 [/code]

 When I change get("dlang.org") to get("yahoo.com"),everything goes
 fine;but when I change to get("yahoo.com.cn"),a runtime error said bad
 gbk encoding bla...

 So my very simple question is how to retrieve information from a webpage
 which could possibily contains asia font (like Chinese font)?

 Thanks for your help in advance.

 Regards,
 Sam

-- Dmitry Olshansky
Jun 07 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 08.06.2012 5:03, Sam Hu wrote:
 On Thursday, 7 June 2012 at 10:43:32 UTC, Dmitry Olshansky wrote:
 string content = get("dlang.org");

It's simple this line you "convert" whatever site content was to unicode. Problem is that "convert" is either broken or it's simply a cast whereas it should re-encode source as unicode. So the way around is to get it to array of bytes and decode yourself.

Thanks.May I know how ?Appreciated a piece of code segment.

seems like ubyte[] data = get!(AutoProtocol, ubyte)("your-site.cn"); //should work, sorry I'm on windows and curl doesn't work here for me then you work with your data, decode and whatever, at least this: writeln(data);//will not throw but will print bytes -- Dmitry Olshansky
Jun 08 2012
prev sibling next sibling parent "Sam Hu" <samhudotsamhu gmail.com> writes:
On Thursday, 7 June 2012 at 10:43:32 UTC, Dmitry Olshansky wrote:
 string content = get("dlang.org");

It's simple this line you "convert" whatever site content was to unicode. Problem is that "convert" is either broken or it's simply a cast whereas it should re-encode source as unicode. So the way around is to get it to array of bytes and decode yourself.

Thanks.May I know how ?Appreciated a piece of code segment.
Jun 07 2012
prev sibling parent "Sam Hu" <samhudotsamhu gmail.com> writes:
On Thursday, 7 June 2012 at 10:38:53 UTC, Kevin wrote:
 On 07/06/12 02:57, Sam Hu wrote:
 string content = get("dlang.org");
 writefln("%s\n",content);

 So my very simple question is how to retrieve information from 
 a
 webpage which could possibily contains asia font (like Chinese 
 font)?

I'm not really sure but try: wstring content = get("dlang.org"); Also make sure your terminal is set up for unicode.

Sorry,no,it does not work,I tried to print the content to DFL TextBox control but still the same issue.
Jun 07 2012