python - How can I check whether every second value from a dictionary is in a specific range? -

i have dictionary reads in file called peaks_ee.xpk.

sample peaks_ee.xpk:

label dataset sw sf 1h 1h_2 noesy_f1ef2e.nv 4807.69238281 4803.07373047 600.402832031 600.402832031 1h.l 1h.p 1h.w 1h.b 1h.e 1h.j 1h.u 1h_2.l 1h_2.p 1h_2.w 1h_2.b 1h_2.e 1h_2.j 1h_2.u vol int stat comment flag0 flag8 flag9 0 {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 1 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 2 {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 3 {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 4 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.h1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 5 {2.h1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 6 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 7 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 8 {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 9 {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 10 {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} {4.h1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 11 {4.h1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 12 {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 13 {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 14 {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 15 {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 16 {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.h1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 17 {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 18 {2.h1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 19 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 20 {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} {4.h1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 21 {4.h1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 22 {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 23 {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.h6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 24 {3.h1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.h8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0`

in line 0 of peaks_ee.xpk example, atom name 1.h1' , it's chemical shift 5.82020. in same line, in 8th column, there atom name, 2.h8 , it's chemical shift 7.61004. want check if first chemical shift in row (5.82020) in range , if second chemical shift (7.49932) in range. if is, write out atom names (1.h1' , 2.h8) file called tclust.txt

this code far, posted question before , @wwii helped me code.

pattern = '''{(\d\.h\d'?)}\s(\d\.\d+)\s''' rex = re.compile(pattern)  j = 0; contents_atom = [] atom_lines=[]  result = {} open("peaks_ee.xpk","r") atom_name:     line in atom_name:         match in rex.finditer(line):             name, shift = match.groups()             if name not in result:                 result[name] = float(shift)                 print (name,shift)                     if filename == 'ee_pinkh1.xpk':                         if result[name]<=8.5                             float_str = re.findall("\d\.\h\d'?",name)                             if (len(float_str))>1:                                 j=j+1                                 value1 = ('atom ' + str(j) + ' ' + str(float_str[0])+ ' ' + str(float_str[1])+ '\n')                                 atom_lines.insert(-1,value1)  tclust_atom = open("tclust.txt","a") value1 in atom_lines:     tclust_atom.write(value1) tclust_atom.close()

this picture of list of atom names , chemical shifts printed out line print (names,shift)

atom names , chemical shifts

from picture, first 2 lines are:

"1.h1'","5.82020" "2.h8","7.61004" first 2 lines first line of peaks_ee.xpk , want see if "5.82020" between range of 5.1 , 6, , if 7.61004 between 7 , 8.25. there way can using values of dictionary? notice every second line values want see if they're between 5.1 , 6, , alternating values ones want see if they're between 7 , 8.25.

edit: complete code:

import pandas pd import os import sys import re  i=0; contents_peak=[] peak_lines=[] open ("ee_pinkh1.xpk","r") peakppm:     ppm in peakppm.readlines():         float_num = re.findall("[\s][1-9]{1}\.[0-9]+",ppm)         if (len(float_num)>1):             i=i+1             value = ('peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n')             peak_lines.insert(-1,value) tclust_peak = open("tclust.txt","w+") tclust_peak.write("rbclust \n") value in peak_lines:     tclust_peak.write(value) tclust_peak.close()  pattern = '''{\d\.h\d'?)}\s(\d\.\d+)\s''' rex = re.compile(pattern)  j=0; contents_atom=[] atom_lines=[] result = {} open("peaks_ee.xpk","r") atomname:     name in atomname:         match in rex.finditer(line):             name,shift = match.groups()             print (name,shift)             if name not in result:                 result[name]=float(shift)                 float_str = re.findall("\d\.h\d'?",name)                 if (len(float_str)>1):                     j=j+1                     value1 = ('atom ' +str(j)+ ' ' + str(float_str[0])+ ' ' + str(float_str[1]) + '\n')                     atom_lines.insert(-1,value)  df = pd.read_csv("d:/tmp/peaks_ee.xpk", sep= " ", skiprows=5)  shift1= df["1h.p"] shift2= df["1h_2.p"]  mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))  result = df[mask] result = result[["1h.l","1h.p","1h_2.l","1h_2.p"]] print result  tclust_atom = open("tclust.txt","a") value1 in atom_lines:     tclust_atom.write(value1) tclust_atom.close()

this error getting:

traceback (most recent call last):   file "pandas.py", line 1, in <module>     import pandas pd    file "/users/malaikaiyer/downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/pandas.py", line 23, in <module> rex = re.compile(pattern)   file "/users/malaikaiyer/downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/lib/re.py", line 190, in compile   file "/users/malaikaiyer/downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/lib/re.py", line 242, in _compile sre_constants.error: unbalanced parenthesis

edit: newest code 7/26:

import pandas pd import os import sys import re import csv   i=0; contents_peak=[] peak_lines=[] open ("ee_pinkh1.xpk","r") peakppm:     ppm in peakppm.readlines():         float_num = re.findall("[\s][1-9]{1}\.[0-9]+",ppm)         if (len(float_num)>1):             i=i+1             value = ('peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n')             peak_lines.insert(-1,value) tclust_peak = open("tclust.txt","w+") tclust_peak.write("rbclust \n") value in peak_lines:     tclust_peak.write(value) tclust_peak.close()  pattern = ‘’’{(\d\.h\d’?)}\s(\d\.\d+)\s''' rex = re.compile(pattern)  j=0; contents_atom=[] atom_lines=[] result = {} text = ‘ee’  if text == ‘ee’:     df = pd.read_csv('peaks_ee.xpk', sep=" ",skiprows=5)      shift1= df["1h.p"]     shift= df["1h_2.p"]     if filename == 'ee_pinkh1.xpk'         mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))     elif filename == 'ee_pinkh2.xpk'         mask = ((shift1>3.25)&(shift1<5))&((shift2>7)&(shift2<8.5))     result = df[mask]     result = result[["1h.l","1h.p","1h_2.l","1h_2.p"]]     result.to_csv("result.csv")  if text == ‘ef’:     df = pd.read_csv('peaks_ef.xpk', sep=" ",skiprows=5)      shift1= df["1h.p"]     shift2= df["1h_2.p"]     if filename == ‘ef_blue.xpk’:         mask = ((shift1>5) & (shift1<6)) & ((shift2>7.25) & (shift2<8.25))     elif filename == ‘ef_green.xpk’:         mask = ((shift1>7) & (shift1<9)) & ((shift2>5.25) & (shift2<6.2))     elif filename == ‘ef_orange:         mask = ((shift1>3) & (shift1<5)) & ((shift2>5.2) & (shift2<6.25))     result = df[mask]     result = result[["1h.l","1h.p","1h_2.l","1h_2.p"]]     result.to_csv("result.csv")  if text == ‘fe’:     df = pd.read_csv('peaks_fe.xpk', sep=" ",skiprows=5)      shift1= df[“atom1”]     shift2= df[“atom2”]     if filename == ‘fe_yellow’:         mask = ((shift1>3) & (shift1<5)) & ((shift2>5) & (shift2<6))     elif filename == ‘fe_green’:         mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))         result = df[mask]         result = result[["1h.l","1h.p","1h_2.l","1h_2.p"]]         result.to_csv("result.csv")  tclust_peak = open("tclust.txt”,”a") tclust_peak.write((str(result)) tclust_atom.close()

you can try out pandas package.

following code load file , skip first 5 rows in order load data want. bitwise check between columns creating mask, , selects columns want.

import pandas pd df = pd.read_csv("peaks_ee.xpk", sep=" ", skiprows=5)  shift1 = df["1h.p"] shift2 = df["1h_2.p"]  mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))  result = df[mask] result = result[["1h.l","1h.p","1h_2.l","1h_2.p"]]

result follows:

>>> result        1h.l     1h.p  1h_2.l   1h_2.p 0   {1.h1'}  5.82020  {2.h8}  7.61004 3   {1.h1'}  5.82020  {1.h8}  8.13712 5   {2.h1'}  5.90291  {2.h8}  7.61004 8   {1.h1'}  5.82020  {2.h8}  7.61004 11  {4.h1'}  5.74125  {3.h6}  7.53261 12  {3.h1'}  5.54935  {4.h8}  7.49932 15  {3.h1'}  5.54935  {3.h6}  7.53261 18  {2.h1'}  5.90291  {3.h6}  7.53261 21  {4.h1'}  5.74125  {4.h8}  7.49932 24  {3.h1'}  5.54935  {4.h8}  7.49932

then if want can export result csv file follows:

result.to_csv("result.csv")

i not sure if code need, may start on how use pandas.

Search This Blog

RT

python - How can I check whether every second value from a dictionary is in a specific range? -

Comments

Post a Comment

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

javascript - pass values from mssql to views in node -