I have a text file that contains about 100000 articles.
The structure of file is:
.Document ID 42944-YEAR:5
.Date 03\08\11
.Cat political
Article Content 1
.Document ID 42945-YEAR:5
.Date 03\08\11
.Cat political
Article Content 2
I want to open this file in c# for processing it line by line.
I tried this code:
String[] FileLines = File.ReadAllText(
TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray());
But it says:
Exception of type
'System.OutOfMemoryException' was
thrown.
The question is How can I open this file and read it line by line.
- File Size: 564 MB (591,886,626 bytes)
- File Encoding: UTF-8
- File contains Unicode characters.
Eric J. :
You can open the file and read it as a stream rather than loading everything into memory all at once.\n\nFrom MSDN:\n\nusing System;\nusing System.IO;\n\nclass Test \n{\n public static void Main() \n {\n try \n {\n // Create an instance of StreamReader to read from a file.\n // The using statement also closes the StreamReader.\n using (StreamReader sr = new StreamReader(\"TestFile.txt\")) \n {\n String line;\n // Read and display lines from the file until the end of \n // the file is reached.\n while ((line = sr.ReadLine()) != null) \n {\n Console.WriteLine(line);\n }\n }\n }\n catch (Exception e) \n {\n // Let the user know what went wrong.\n Console.WriteLine(\"The file could not be read:\");\n Console.WriteLine(e.Message);\n }\n }\n}\n",
2010-04-25T19:33:13
Michael Petrotta :
Your file is too large to be read into memory in one go, as File.ReadAllText is trying to do. You should instead read the file line by line.\n\nAdapted from MSDN:\n\nstring line;\n// Read the file and display it line by line.\nusing (StreamReader file = new StreamReader(@\"c:\\yourfile.txt\"))\n{\n while ((line = file.ReadLine()) != null)\n { \n Console.WriteLine(line);\n // do your processing on each line here\n }\n}\n\n\nIn this way, no more than a single line of the file is in memory at any one time.",
2010-04-25T19:35:48
Dan Terry :
If you are using .NET Framework 4, there is a new static method on System.IO.File called ReadLines that returns an IEnumerable of string. I believe it was added to the framework for this exact scenario; however, I have yet to use it myself.\n\nMSDN Documentation - File.ReadLines Method (String)\n\nRelated Stack Overflow Question - Bug in the File.ReadLines(..) method of the .net framework 4.0",
2010-05-19T03:25:33