digitalmars.D.learn - UTF-8 strings and endianness

denizzzka (3/3) Oct 29 2012 Hi!

Adam D. Ruppe (1/1) Oct 29 2012 UTF-8 isn't affected by endianness.

denizzzka (2/3) Oct 29 2012 Ok, thanks!
Jesse Phillips (4/5) Oct 30 2012 If this is true why does the BOM have marks for big and little

Tobias Pankrath (2/7) Oct 30 2012 UTF8 has only one?

Dmitry Olshansky (5/15) Oct 30 2012 to signal at the start that the text stream is encoded in UTF-8
Jesse Phillips (3/14) Oct 30 2012 oops, mixed up and thought he just said "UTF isn't ..."

Jordi Sayol (4/10) Oct 29 2012 UTF-8 is always big emdian.

denizzzka (4/12) Oct 29 2012 Yes.
denizzzka (9/17) Oct 29 2012 oops, what?

"denizzzka" <4denizzz gmail.com> writes:

Hi!

How to convert D's string to big endian?
How to convert to D's string from big endian?

Oct 29 2012

"Adam D. Ruppe" <destructionator gmail.com> writes:

UTF-8 isn't affected by endianness.

Oct 29 2012

"denizzzka" <4denizzz gmail.com> writes:

On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
 UTF-8 isn't affected by endianness.

Ok, thanks!

Oct 29 2012

"Jesse Phillips" <Jessekphillips+D gmail.com> writes:

On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
 UTF-8 isn't affected by endianness.

If this is true why does the BOM have marks for big and little 
endian?

http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

Oct 30 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips wrote:
 On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
 UTF-8 isn't affected by endianness.

 If this is true why does the BOM have marks for big and little 
 endian?

 http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

UTF8 has only one?

Oct 30 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

10/30/2012 5:17 PM, Tobias Pankrath пишет:
 On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips wrote:
 On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe wrote:
 UTF-8 isn't affected by endianness.

 If this is true why does the BOM have marks for big and little endian?

 http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

 UTF8 has only one?

Even Wiki knows the simple truth:
 Byte order has no meaning in UTF-8, [5] so its only use in UTF-8 is 

to  signal at the start that the text stream is encoded in UTF-8

-- 
Dmitry Olshansky

Oct 30 2012

"Jesse Phillips" <Jessekphillips+D gmail.com> writes:

On Tuesday, 30 October 2012 at 17:17:36 UTC, Tobias Pankrath 
wrote:
 On Tuesday, 30 October 2012 at 17:12:41 UTC, Jesse Phillips 
 wrote:
 On Monday, 29 October 2012 at 15:22:39 UTC, Adam D. Ruppe 
 wrote:
 UTF-8 isn't affected by endianness.

 If this is true why does the BOM have marks for big and little 
 endian?

 http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

 UTF8 has only one?

oops, mixed up and thought he just said "UTF isn't ..."

Oct 30 2012

Jordi Sayol <g.sayol yahoo.es> writes:

Al 29/10/12 16:17, En/na denizzzka ha escrit:
 Hi!
 
 How to convert D's string to big endian?
 How to convert to D's string from big endian?
 
 

UTF-8 is always big emdian.
-- 
Jordi Sayol

Oct 29 2012

"denizzzka" <4denizzz gmail.com> writes:

On Monday, 29 October 2012 at 15:46:43 UTC, Jordi Sayol wrote:
 Al 29/10/12 16:17, En/na denizzzka ha escrit:
 Hi!
 
 How to convert D's string to big endian?
 How to convert to D's string from big endian?
 
 

 UTF-8 is always big emdian.

Yes.

(I thought that the problem in this place but the problem was 
different.)

Oct 29 2012

"denizzzka" <4denizzz gmail.com> writes:

On Monday, 29 October 2012 at 15:46:43 UTC, Jordi Sayol wrote:
 Al 29/10/12 16:17, En/na denizzzka ha escrit:
 Hi!
 
 How to convert D's string to big endian?
 How to convert to D's string from big endian?
 
 

 UTF-8 is always big emdian.

oops, what?

Q: Is the UTF-8 encoding scheme the same irrespective of whether 
the underlying processor is little endian or big endian?

A: Yes. Since UTF-8 is interpreted as a sequence of bytes, there 
is no endian problem as there is for encoding forms that use 
16-bit or 32-bit code units. Where a BOM is used with UTF-8, it 
is only used as an ecoding signature to distinguish UTF-8 from 
other encodings — it has nothing to do with byte order.

Oct 29 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - UTF-8 strings and endianness