www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Incomplete words read from file

reply pascal111 <judas.the.messiah.111 gmail.com> writes:
I made small program that shows the content of textual files, and 
it succeeded to show non-English (Ascii code) language, but in 
many lines some words are not complete and their rests are in 
next lines, how can fix it?

"https://i.postimg.cc/rpP7dQYH/Screenshot-from-2021-11-18-01-40-43.png"

'''d

// D programming language

import std.stdio;
import std.string;

int main()
{

string s;
char[] f;


try{
write("Enter file name and path: ");
readln(f);
f=strip(f);}

catch(Exception err){
stderr.writefln!"Warning! %s"(err.msg);}


File file = File(f, "r");

while (!file.eof()) {
       s = chomp(file.readln());
       writeln(s);
    }

file.close();

return 0;

}


'''
Nov 17 2021
next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Wednesday, 17 November 2021 at 23:46:15 UTC, pascal111 wrote:
 I made small program that shows the content of textual files, 
 and it succeeded to show non-English (Ascii code) language, but 
 in many lines some words are not complete and their rests are 
 in next lines, how can fix it?
there's nothing in your program that breaks lines differently from the input. If you've a Unicode-aware terminal it should really work as it is. If 'cat Jekyll1' doesn't produce the same output as this program... then there must be some right-to-left work that needs to happen that I'm aware of. If what you're wanting to do is to *reshape* text so that it prints with proper word-breaks across lines according to the current size of the terminal, then you've got to do this work yourself. On Unix a simple shortcut might be to print through fmt(1) instead: ```d void main() { import std.process : pipeShell, Redirect, wait; auto fmt = pipeShell("fmt", Redirect.stdin); scope (exit) { fmt.stdin.close; wait(fmt.pid); } char[15] longword = 'x'; foreach (i; 1 .. 10) { fmt.stdin.writeln(longword); } } ``` which outputs: ``` xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx ``` or... oh, there's std.string.wrap. With the same output: ```d void main() { import std.string : wrap; import std.stdio : write; enum longtext = { char[15] longword = 'x'; string result; foreach (i; 1 .. 10) result ~= ' ' ~ longword; return result; }(); write(longtext.wrap(72)); } ``` These tools might not do what you want with that language, though.
Nov 17 2021
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Nov 18, 2021 at 12:39:12AM +0000, jfondren via Digitalmars-d-learn
wrote:
[...]
 If what you're wanting to do is to *reshape* text so that it prints
 with proper word-breaks across lines according to the current size of
 the terminal, then you've got to do this work yourself.
[...] Just to chip in: line-breaking in Unicode is, in general, non-trivial, because it changes depending on language, left-to-right / right-to-left settings, font properties, and display environment. If this is what you want to do, the `linebreak` dub package may be a good starting point (it implements the Unicode line-breaking algorithm in Annex 14): https://code.dlang.org/packages/linebreak/1.1.2 Note that this algorithm only gives you linebreak opportunities; you still have to figure out yourself where among these opportunities to actually insert a linebreak. For this you will need to measure how long each text segment is. In general, this also depends on your font, font size, and font properties. If you're outputting to the terminal, this is somewhat simpler (most graphemes are 1 column wide) but you still have to take into account double-width and zero-width characters (and also how your terminal actually displays such characters -- not all terminals will display double-width characters as double width, though most will). T -- MAS = Mana Ada Sistem?
Nov 17 2021
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/17/21 3:46 PM, pascal111 wrote:
 I made small program that shows the content of textual files, and it
 succeeded to show non-English (Ascii code) language, but in many lines
 some words are not complete and their rests are in next lines, how can
 fix it?
D assumes UTF-8 encoding by default. If the file is not UTF-8, std.encoding.transcode may be useful: https://dlang.org/library/std/encoding/transcode.html Of course, the encoding of the file must be known. However, I think your file is already UTF-8.
 "https://i.postimg.cc/rpP7dQYH/Screenshot-from-2021-11-18-01-40-43.png"
The characterns indeed look correct. I wonder whether the lines don't fit your terminal's width and the terminal is wrapping them? If you want to wrap the lines programmatically, there std.string.wrap: https://dlang.org/phobos/std_string.html#.wrap Because we've talked about parts of your program earlier, I take liberty to comment on it. :) Then I will show an alternative version below.
 '''d

 // D programming language

 import std.stdio;
 import std.string;

 int main()
 {

 string s;
It is a general guideline that variables should be defined as close to their first use as possible. This allows for more readable, maintainable, and refactorable code.
 char[] f;
I was able to make this a string in the program below by calling the no-parameter version of readln().
 try{
 write("Enter file name and path: ");
 readln(f);
 f=strip(f);}
Because errors can occur in other parts of the program as well, you can wrap the whole code in a try block.
 catch(Exception err){
 stderr.writefln!"Warning! %s"(err.msg);}
It is better to either return with a non-zero error code here or do something about the error. For example: writeln("Using the default file.") f = "my_default_file".dup; But I think it is better to return 1 there.
 File file = File(f, "r");

 while (!file.eof()) {
There is byLine (and byLineCopy) that produce a file line-by-line, which you can use here as well.
        s = chomp(file.readln());
        writeln(s);
     }

 file.close();
Although harmless, you don't need to call File.close because it is already called by the destructor of File.
 return 0;

 }


 '''
Here is an alternative: import std.stdio; import std.string; int main() { try { printFileLines(); } catch(Exception err){ stderr.writefln!"Warning! %s"(err.msg); return 1; } return 0; } void printFileLines() { write("Enter file name and path: "); string f = strip(readln()); File file = File(f, "r"); foreach (line; file.byLine) { const s = chomp(line); writeln(s); } } Ali
Nov 17 2021
parent reply pascal111 <judas.the.messiah.111 gmail.com> writes:
On Thursday, 18 November 2021 at 00:42:49 UTC, Ali Çehreli wrote:
 On 11/17/21 3:46 PM, pascal111 wrote:
 I made small program that shows the content of textual files,
and it
 succeeded to show non-English (Ascii code) language, but in
many lines
 some words are not complete and their rests are in next
lines, how can
 fix it?
D assumes UTF-8 encoding by default. If the file is not UTF-8, std.encoding.transcode may be useful: https://dlang.org/library/std/encoding/transcode.html Of course, the encoding of the file must be known. However, I think your file is already UTF-8.
 
"https://i.postimg.cc/rpP7dQYH/Screenshot-from-2021-11-18-01-40-43.png" The characterns indeed look correct. I wonder whether the lines don't fit your terminal's width and the terminal is wrapping them? If you want to wrap the lines programmatically, there std.string.wrap: https://dlang.org/phobos/std_string.html#.wrap Because we've talked about parts of your program earlier, I take liberty to comment on it. :) Then I will show an alternative version below.
 '''d

 // D programming language

 import std.stdio;
 import std.string;

 int main()
 {

 string s;
It is a general guideline that variables should be defined as close to their first use as possible. This allows for more readable, maintainable, and refactorable code.
 char[] f;
I was able to make this a string in the program below by calling the no-parameter version of readln().
 try{
 write("Enter file name and path: ");
 readln(f);
 f=strip(f);}
Because errors can occur in other parts of the program as well, you can wrap the whole code in a try block.
 catch(Exception err){
 stderr.writefln!"Warning! %s"(err.msg);}
It is better to either return with a non-zero error code here or do something about the error. For example: writeln("Using the default file.") f = "my_default_file".dup; But I think it is better to return 1 there.
 File file = File(f, "r");

 while (!file.eof()) {
There is byLine (and byLineCopy) that produce a file line-by-line, which you can use here as well.
        s = chomp(file.readln());
        writeln(s);
     }

 file.close();
Although harmless, you don't need to call File.close because it is already called by the destructor of File.
 return 0;

 }


 '''
Here is an alternative: import std.stdio; import std.string; int main() { try { printFileLines(); } catch(Exception err){ stderr.writefln!"Warning! %s"(err.msg); return 1; } return 0; } void printFileLines() { write("Enter file name and path: "); string f = strip(readln()); File file = File(f, "r"); foreach (line; file.byLine) { const s = chomp(line); writeln(s); } } Ali
I fixed the code like this and it worked without breaking words, but this time it shows single lines as if the normal context is a poem. Can we fix this or the terminal will force us and make wrapping for lines? "https://i.postimg.cc/FHQFPgm8/Screenshot-from-2021-11-18-03-16-41.png" import std.stdio; import std.string; import std.process : pipeShell, Redirect, wait; import std.format; int main() { try { printFileLines(); } catch(Exception err){ stderr.writefln!"Warning! %s"(err.msg); return 1; } return 0; } void printFileLines() { auto fmt = pipeShell("fmt", Redirect.stdin); scope (exit) { fmt.stdin.close; wait(fmt.pid);} write("Enter file name and path: "); string f = strip(readln()); File file = File(f, "r"); foreach (line; file.byLine) { const s = chomp(line); fmt.stdin.writeln(s); } }
Nov 17 2021
parent reply jfondren <julian.fondren gmail.com> writes:
On Thursday, 18 November 2021 at 01:21:00 UTC, pascal111 wrote:
 I fixed the code like this and it worked without breaking 
 words, but this time it shows single lines as if the normal 
 context is a poem. Can we fix this or the terminal will force 
 us and make wrapping for lines?

 "https://i.postimg.cc/FHQFPgm8/Screenshot-from-2021-11-18-03-16-41.png"
...
   auto fmt = pipeShell("fmt", Redirect.stdin);
try "fmt --width=120" 'man fmt' will tell you about its other arguments.
Nov 17 2021
parent pascal111 <judas.the.messiah.111 gmail.com> writes:
On Thursday, 18 November 2021 at 01:28:47 UTC, jfondren wrote:
 On Thursday, 18 November 2021 at 01:21:00 UTC, pascal111 wrote:
 I fixed the code like this and it worked without breaking 
 words, but this time it shows single lines as if the normal 
 context is a poem. Can we fix this or the terminal will force 
 us and make wrapping for lines?

 "https://i.postimg.cc/FHQFPgm8/Screenshot-from-2021-11-18-03-16-41.png"
...
   auto fmt = pipeShell("fmt", Redirect.stdin);
try "fmt --width=120" 'man fmt' will tell you about its other arguments.
It works! "https://i.postimg.cc/dtDnWpwN/Screenshot-from-2021-11-18-03-59-01.png"
Nov 17 2021