python - File read performance test yields interesting results. Possible explanations? -
i'm stress-testing system determine how punishment filesystem can take. 1 test involves repeated reads on single small (thus presumably heavily cached) file determine overhead.
the following python 3.6.0 script generates 2 lists of results:
import random, string, time stri = bytes(''.join(random.choice(string.ascii_lowercase) in range(100000)), 'latin-1') inf = open('bench.txt', 'w+b') inf.write(stri) t in range(0,700,5): readl = b'' start = time.perf_counter() in range(t*10): inf.seek(0) readl += inf.read(200) print(t/10.0, time.perf_counter()-start) print() t in range(0,700,5): readl = b'' start = time.perf_counter() in range(3000): inf.seek(0) readl += inf.read(t) print(t/10.0, time.perf_counter()-start) inf.close()
when plotted following graph:
i find these results weird. second test (blue in picture, mutable read lenght parameter) starts off linearly increasing expected, after point decides climb more quickly. more surprisingly, first test (pink, mutable repetitions count , fixed read length) shows wild departure interesting because size of read function remains fixed there. it's irregular head-scratching @ best. system idle when running tests.
what plausible reason there causes such major performance degradation after number of repetitions?
edit:
the fact readl
byte array apparently major performance hog. switching string drastically improves everything. yet when working strings, calling read , seek functions minor factor comparison. here more test variants of test 1 (mutable repetitions). test 2 left out because results turn out entirely explained byte array performance difference alone:
import random, string, time strs = ''.join(random.choice(string.ascii_lowercase) in range(100000)) strb = bytes(strs, 'latin-1') inf = open('bench.txt', 'w+b') inf.write(strb) #bytes , read t in range(0,700,5): readl = b'' start = time.perf_counter() in range(t*10): inf.seek(0) readl += inf.read(200) print(t/10.0, '%f' % (time.perf_counter()-start)) print() #bytes no read t in range(0,700,5): readl = b'' start = time.perf_counter() in range(t*10): readl += strb[0:200] print(t/10.0, '%f' % (time.perf_counter()-start)) print() #string , read t in range(0,700,5): readl = '' start = time.perf_counter() in range(t*10): inf.seek(0) readl += inf.read(200).decode('latin-1') print(t/10.0, '%f' % (time.perf_counter()-start)) print() #string no read t in range(0,700,5): readl = '' start = time.perf_counter() in range(t*10): readl += strs[0:200] print(t/10.0, '%f' % (time.perf_counter()-start)) print() inf.close()
Comments
Post a Comment