python - How can I check whether every second value from a dictionary is in a specific range? -
i have dictionary reads in file called peaks_ee.xpk.
sample peaks_ee.xpk:
label dataset sw sf 1h 1h_2 noesy_f1ef2e.nv 4807.69238281 4803.07373047 600.402832031 600.402832031 1h.l 1h.p 1h.w 1h.b 1h.e 1h.j 1h.u 1h_2.l 1h_2.p 1h_2.w 1h_2.b 1h_2.e 1h_2.j 1h_2.u vol int stat comment flag0 flag8 flag9 0 {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 1 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 2 {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 3 {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 4 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.h1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 5 {2.h1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 6 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 7 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 8 {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 9 {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 10 {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} {4.h1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 11 {4.h1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 12 {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 13 {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 14 {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 15 {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 16 {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.h1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 17 {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 18 {2.h1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 19 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 20 {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} {4.h1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 21 {4.h1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 22 {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 23 {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 24 {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0`
in line 0 of peaks_ee.xpk example, atom name 1.h1' , it's chemical shift 5.82020. in same line, in 8th column, there atom name, 2.h8 , it's chemical shift 7.61004. want check if first chemical shift in row (5.82020) in range , if second chemical shift (7.49932) in range. if is, write out atom names (1.h1' , 2.h8) file called tclust.txt
this code far, posted question before , @wwii helped me code.
pattern = '''{(\d\.h\d'?)}\s(\d\.\d+)\s''' rex = re.compile(pattern) j = 0; contents_atom = [] atom_lines=[] result = {} open("peaks_ee.xpk","r") atom_name: line in atom_name: match in rex.finditer(line): name, shift = match.groups() if name not in result: result[name] = float(shift) print (name,shift) if filename == 'ee_pinkh1.xpk': if result[name]<=8.5 float_str = re.findall("\d\.\h\d'?",name) if (len(float_str))>1: j=j+1 value1 = ('atom ' + str(j) + ' ' + str(float_str[0])+ ' ' + str(float_str[1])+ '\n') atom_lines.insert(-1,value1) tclust_atom = open("tclust.txt","a") value1 in atom_lines: tclust_atom.write(value1) tclust_atom.close()
this picture of list of atom names , chemical shifts printed out line print (names,shift)
from picture, first 2 lines are:
"1.h1'","5.82020" "2.h8","7.61004" first 2 lines first line of peaks_ee.xpk , want see if "5.82020" between range of 5.1 , 6, , if 7.61004 between 7 , 8.25. there way can using values of dictionary? notice every second line values want see if they're between 5.1 , 6, , alternating values ones want see if they're between 7 , 8.25.
edit: complete code:
import pandas pd import os import sys import re i=0; contents_peak=[] peak_lines=[] open ("ee_pinkh1.xpk","r") peakppm: ppm in peakppm.readlines(): float_num = re.findall("[\s][1-9]{1}\.[0-9]+",ppm) if (len(float_num)>1): i=i+1 value = ('peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n') peak_lines.insert(-1,value) tclust_peak = open("tclust.txt","w+") tclust_peak.write("rbclust \n") value in peak_lines: tclust_peak.write(value) tclust_peak.close() pattern = '''{\d\.h\d'?)}\s(\d\.\d+)\s''' rex = re.compile(pattern) j=0; contents_atom=[] atom_lines=[] result = {} open("peaks_ee.xpk","r") atomname: name in atomname: match in rex.finditer(line): name,shift = match.groups() print (name,shift) if name not in result: result[name]=float(shift) float_str = re.findall("\d\.h\d'?",name) if (len(float_str)>1): j=j+1 value1 = ('atom ' +str(j)+ ' ' + str(float_str[0])+ ' ' + str(float_str[1]) + '\n') atom_lines.insert(-1,value) df = pd.read_csv("d:/tmp/peaks_ee.xpk", sep= " ", skiprows=5) shift1= df["1h.p"] shift2= df["1h_2.p"] mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25)) result = df[mask] result = result[["1h.l","1h.p","1h_2.l","1h_2.p"]] print result tclust_atom = open("tclust.txt","a") value1 in atom_lines: tclust_atom.write(value1) tclust_atom.close()
this error getting:
traceback (most recent call last): file "pandas.py", line 1, in <module> import pandas pd file "/users/malaikaiyer/downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/pandas.py", line 23, in <module> rex = re.compile(pattern) file "/users/malaikaiyer/downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/lib/re.py", line 190, in compile file "/users/malaikaiyer/downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/lib/re.py", line 242, in _compile sre_constants.error: unbalanced parenthesis
edit: newest code 7/26:
import pandas pd import os import sys import re import csv i=0; contents_peak=[] peak_lines=[] open ("ee_pinkh1.xpk","r") peakppm: ppm in peakppm.readlines(): float_num = re.findall("[\s][1-9]{1}\.[0-9]+",ppm) if (len(float_num)>1): i=i+1 value = ('peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n') peak_lines.insert(-1,value) tclust_peak = open("tclust.txt","w+") tclust_peak.write("rbclust \n") value in peak_lines: tclust_peak.write(value) tclust_peak.close() pattern = ‘’’{(\d\.h\d’?)}\s(\d\.\d+)\s''' rex = re.compile(pattern) j=0; contents_atom=[] atom_lines=[] result = {} text = ‘ee’ if text == ‘ee’: df = pd.read_csv('peaks_ee.xpk', sep=" ",skiprows=5) shift1= df["1h.p"] shift= df["1h_2.p"] if filename == 'ee_pinkh1.xpk' mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25)) elif filename == 'ee_pinkh2.xpk' mask = ((shift1>3.25)&(shift1<5))&((shift2>7)&(shift2<8.5)) result = df[mask] result = result[["1h.l","1h.p","1h_2.l","1h_2.p"]] result.to_csv("result.csv") if text == ‘ef’: df = pd.read_csv('peaks_ef.xpk', sep=" ",skiprows=5) shift1= df["1h.p"] shift2= df["1h_2.p"] if filename == ‘ef_blue.xpk’: mask = ((shift1>5) & (shift1<6)) & ((shift2>7.25) & (shift2<8.25)) elif filename == ‘ef_green.xpk’: mask = ((shift1>7) & (shift1<9)) & ((shift2>5.25) & (shift2<6.2)) elif filename == ‘ef_orange: mask = ((shift1>3) & (shift1<5)) & ((shift2>5.2) & (shift2<6.25)) result = df[mask] result = result[["1h.l","1h.p","1h_2.l","1h_2.p"]] result.to_csv("result.csv") if text == ‘fe’: df = pd.read_csv('peaks_fe.xpk', sep=" ",skiprows=5) shift1= df[“atom1”] shift2= df[“atom2”] if filename == ‘fe_yellow’: mask = ((shift1>3) & (shift1<5)) & ((shift2>5) & (shift2<6)) elif filename == ‘fe_green’: mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25)) result = df[mask] result = result[["1h.l","1h.p","1h_2.l","1h_2.p"]] result.to_csv("result.csv") tclust_peak = open("tclust.txt”,”a") tclust_peak.write((str(result)) tclust_atom.close()
you can try out pandas
package.
following code load file , skip first 5 rows in order load data want. bitwise check between columns creating mask, , selects columns want.
import pandas pd df = pd.read_csv("peaks_ee.xpk", sep=" ", skiprows=5) shift1 = df["1h.p"] shift2 = df["1h_2.p"] mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25)) result = df[mask] result = result[["1h.l","1h.p","1h_2.l","1h_2.p"]]
result follows:
>>> result 1h.l 1h.p 1h_2.l 1h_2.p 0 {1.h1'} 5.82020 {2.h8} 7.61004 3 {1.h1'} 5.82020 {1.h8} 8.13712 5 {2.h1'} 5.90291 {2.h8} 7.61004 8 {1.h1'} 5.82020 {2.h8} 7.61004 11 {4.h1'} 5.74125 {3.h6} 7.53261 12 {3.h1'} 5.54935 {4.h8} 7.49932 15 {3.h1'} 5.54935 {3.h6} 7.53261 18 {2.h1'} 5.90291 {3.h6} 7.53261 21 {4.h1'} 5.74125 {4.h8} 7.49932 24 {3.h1'} 5.54935 {4.h8} 7.49932
then if want can export result
csv file follows:
result.to_csv("result.csv")
i not sure if code need, may start on how use pandas
.
Comments
Post a Comment