python - File read performance test yields interesting results. Possible explanations? -

January 15, 2011

i'm stress-testing system determine how punishment filesystem can take. 1 test involves repeated reads on single small (thus presumably heavily cached) file determine overhead.

the following python 3.6.0 script generates 2 lists of results:

import random, string, time  stri = bytes(''.join(random.choice(string.ascii_lowercase) in range(100000)), 'latin-1')  inf = open('bench.txt', 'w+b') inf.write(stri)  t in range(0,700,5):     readl = b''     start = time.perf_counter()     in range(t*10):         inf.seek(0)         readl += inf.read(200)     print(t/10.0, time.perf_counter()-start)  print()  t in range(0,700,5):     readl = b''     start = time.perf_counter()     in range(3000):         inf.seek(0)         readl += inf.read(t)     print(t/10.0, time.perf_counter()-start)  inf.close()

when plotted following graph:

results graph

i find these results weird. second test (blue in picture, mutable read lenght parameter) starts off linearly increasing expected, after point decides climb more quickly. more surprisingly, first test (pink, mutable repetitions count , fixed read length) shows wild departure interesting because size of read function remains fixed there. it's irregular head-scratching @ best. system idle when running tests.

what plausible reason there causes such major performance degradation after number of repetitions?

edit:

the fact readl byte array apparently major performance hog. switching string drastically improves everything. yet when working strings, calling read , seek functions minor factor comparison. here more test variants of test 1 (mutable repetitions). test 2 left out because results turn out entirely explained byte array performance difference alone:

import random, string, time  strs = ''.join(random.choice(string.ascii_lowercase) in range(100000)) strb = bytes(strs, 'latin-1')  inf = open('bench.txt', 'w+b') inf.write(strb)  #bytes , read t in range(0,700,5):     readl = b''     start = time.perf_counter()     in range(t*10):         inf.seek(0)         readl += inf.read(200)     print(t/10.0, '%f' % (time.perf_counter()-start)) print()  #bytes no read t in range(0,700,5):     readl = b''     start = time.perf_counter()     in range(t*10):         readl += strb[0:200]     print(t/10.0, '%f' % (time.perf_counter()-start)) print()  #string , read t in range(0,700,5):     readl = ''     start = time.perf_counter()     in range(t*10):         inf.seek(0)         readl += inf.read(200).decode('latin-1')     print(t/10.0, '%f' % (time.perf_counter()-start)) print()  #string no read t in range(0,700,5):     readl = ''     start = time.perf_counter()     in range(t*10):         readl += strs[0:200]     print(t/10.0, '%f' % (time.perf_counter()-start)) print()  inf.close()

results graph

Search This Blog

RT

python - File read performance test yields interesting results. Possible explanations? -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -