Fortran gurus:
I'm looking for the fastest, safest, and most portable way to read the entire contents of a text file into a Fortran allocatable string. Here's what I've come up with:
subroutine read_file(filename, str) implicit none character(len=*),intent(in) :: filename character(len=:),allocatable,intent(out) :: str !parameters: integer,parameter :: n_chunk = 256 !chunk size for reading file [arbitrary] character(len=*),parameter :: nfmt = '(A256)' !corresponding format statement character(len=1),parameter :: newline = new_line('') integer :: iunit, istat, isize character(len=n_chunk) :: chunk integer :: filesize,ipos character(len=:),allocatable :: tmp !how many characters are in the file: inquire(file=filename, size=filesize) !is this portable? !initialize: ipos = 1 !where to put the next chunk !preallocate the str array to speed up the process for large files: !str = '' allocate( character(len=filesize) :: str ) !open the file: open(newunit=iunit, file=trim(filename), status='OLD', iostat=istat) if (istat==0) then !read all the characters from the file: do read(iunit,fmt=nfmt,advance='NO',size=isize,iostat=istat) chunk if (istat==0) then !str = str//chunk str(ipos:ipos+isize-1) = chunk ipos = ipos+isize elseif (IS_IOSTAT_EOR(istat)) then if (isize>0) then !str = str//chunk(1:isize)//newline str(ipos:ipos+isize) = chunk(1:isize)//newline ipos = ipos+isize+1 else !str = str//newline str(ipos:ipos) = newline ipos = ipos + 1 end if elseif (IS_IOSTAT_END(istat)) then if (isize>0) then !str = str//chunk(1:isize) str(ipos:ipos+isize) = chunk(1:isize)//newline ipos = ipos+isize+1 end if exit else stop 'Error' end if end do !resize the string if (ipos<filesize+1) str = str(1:ipos-1) close(iunit, iostat=istat) else write(*,*) 'Error opening file: '//trim(filename) end if end subroutine read_file
Some notes/questions about this:
- This routine will read the 100 MB file at https://github.com/seductiveapps/largeJSON in about 1 sec on my PC.
- Is it really portable to use the SIZE argument of INQUIRE to get the number of characters? I notice that the string I end up with is somewhat smaller than this value, but that could be due to #3. What is the portable way to get the file size in number of characters (I'd like it to also work on other non-ifort compilers, as well as on other platforms).
- I don't think this way preserves the Windows line breaks (if present), since it essentially reads it line by line and then inserts the newline character. The string I end up with is smaller (which is why I'm trimming it at the end). Is there a way to read it in a way that includes the line breaks as is?
- The original (naive) version of the routine (see the commented-out bits, e.g., str = str//chunk) is extremely slow and also causes stack overflows for very large strings. The slowness makes sense to me due to all the reallocations, but I didn't expect it to cause stack overflows. Is that to be expected?
- Any other improvements that anyone can see?