%(n) | Jmol SMILES adds unlimited branching. Daylight SMILES allows indication of "rings" using the digits 1-9, for example, C1CCCC1. Actually, these numbers do not necessarily indicate rings. Rather, in association with "." component notation, they may simply indicate connectivity. For example, ethane can be CC or C1.C1. The original SMILES notation allows up to 99 open connections using $nn, where nn is 10-99. In generalizing SMILES to Jmol SMILES and Jmol bioSMILES, since connections can represent hydrogen bonds between nucleic acid chains, it was necessary to allow more than 99 open connections. Adding parentheses, for example %(130), allows for an unlimited number of open connections. Note that despite this allowance, Jmol itself will not generate SMILES strings using this notation unless it is absolutely necessary. |
//*...*// | Jmol SMILES is free-formatted, allowing
standard whitespace as well as general comments in the form //*....*// anywhere within the string.
For example, note the difference when Jmol debugging is set ON for the show SMILES command:
$ show SMILES [n](C)1c2=O.c23=c4[n](C)c1=O.[n](C)3c=[n]4$ set debug; show SMILES //* N1 #1 *// [n]( //* C2 #2 *// C)1 //* C13 #13 *// c2= //* O14 #14 *// O. //* C12 #12 *// c23= //* C7 #7 *// c4 //* N5 #5 *// [n]( //* C6 #6 *// C) //* C3 #3 *// c1= //* O4 #4 *// O. //* N10 #10 *// [n]( //* C11 #11 *// C)3 //* C9 #9 *// c= //* N8 #8 *// [n]4This allows a direct correlation between an actual atom in the 3D structure and its contribution to the SMILES string. Comments are used in Jmol bioSMILES representations for indicating the Jmol version used for its creation as well as chain and residue information: $ load =1crn; print {*}.find("SEQUENCE") //* Jmol bioSMILES 14.3.16_2015.08.25 2015-08-25 09:07 1 *// //* chain A protein 1 *// ~p~TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN //* 46 *// |
~X~ |
Jmol bioSMILES separates all protein, nucleic, and carbohydrate polymers into separate SMILES components,
separated by ".".
The Jmol bioSEQUENCE type, consisting of a character surrounded by two tildes,
introduces each Jmol bioSEQUENCE component.
The character X may be one of p, d, r, or c,
indicating a protein, DNA, or RNA sequence, respectively.
Generally, the string will be
a sequence of standard single-character group symbols appropriate for that sequence type.
For example, the Jmol bioSEQUENCE
string created using the commands load =1crn; print {*}.find("SEQ") is:
~p~TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYANWhen a group is a non-standard amino acid or is present, it is indicated by its residue name in brackets. For example, the Jmol bioSEQUENCE returned from load =4zyg; print {:A and protein}.find("SEQ") is: ~p~MLVYGLYKSPLGYITVAKDDKGFIMLDFCDCVEGNSRDDSSFTEFFHK LDLYFEGKPINLREPINLKTYPFRLSVFKEVMKIPWGKVMTYKQIADSLGTSPRAVGMALSKNPILLIIP[SMC]HR VIAENGIGGYSRGVKLKRALLELEGVKIPEwhere [SMC] indicates S-methylcysteine. Components that are not bioSEQUENCE types will indicate connectivity to a bioSEQUENCE (if such connection exists) via a fully qualified bioSMILES designation for the connected atoms. For example, the magnesium atom component in PDB structure 1p9b is described by: [Mg]123456.O3[C]N([O])[C][C]([O])[O].[IMO.O1#8]4.[GDP.O2B#8]2.[GLY.O#8]5.[ASP.OD1#8]6.[GDP.O2A#8]1.(The element number, #8 here, allows searching these jmol bioSMILES strings themselves, in the absence of the associated model.) Notice that connection 3 here is a to an unidentified ligand. Jmol bioSMILES only abbreviates groups that are within polymers. Unconnected components such as water molecules will not be repeated; the Jmol bioSMILES representation of a protein with associated water molecules will only show one component in the form: [O]. |
":" |
Jmol bioSEQUENCES utilize the bond type ":" to indicate "cross-linked groups".
Recognized cross-linking includes hydrogen bonding between the purine N1 and pyrimidine N3 in nucleic acids,
hydrogen bonds created with create hbonds, cysteine-cysteine disulfide bonds
in proteins, and ether linkages between carbohydrate residues.
For instance, the commands load =3LL2; print {carbohydrate}.find("SEQ", true) reports for this branched carbohydrate:
~c~[MAN]:1[MAN]:2[MAN].~c~[MAN]:2[MAN].~c~[MAN]:1[MAN][MAN]indicating a branched manose hexamer. No indication is given for exact atom-atom connectivity in the Jmol bioSEQUENCE; all connectivity is at the level of residues. |
Var x = '$R1="[CH3,NH2]";$R2="[OH]"; {a}[$R1]' // select aromatic atoms attached to CH3 or NH2 select within(SMARTS,@x)Note that these variables are any string whatsoever, not just atom sets. The syntax is simply:
Var x = '$R1="[CH3,NH2]";$R2="[$($R1),OH]"; {a}[$R1]' // select aromatic atoms attached to CH3, NH2, or OH select within(SMARTS,@x)
Var x = '$R1="[CH3,NH2]";$R2="[OH]"; {a}[$([$R1]),$([$R2])]' // aromatic attached to CH3, NH2, or OH select within(SMARTS,@x)Note that $(...) need not be within [...], and wherever it is, it always means "just the first atom".
[Element] | capitalized - standard notation Na, Si, etc. -- specific non-aromatic atom |
[element] | uncapitalized - specific aromatic atom (as for standard notation, no limitations) |
* | any atom |
A | any non-aromatic atom |
a | any aromatic atom |
# | atomic number |
(integer) | mass number -- Note, however, that [H1] is [*H1], "any atom with one attached hydrogen", not unlabeled hydrogen, [1H]. |
D | degree - total number of connections |
H | exact hydrogen count |
h | "implicit" hydrogen count (atoms are not in structure) |
R | in the specified number of rings |
r | in ring of a given size (not restricted to SSSR set) |
v | valence (total bond order) |
X | calculated connectivity, including implicit hydrogens |
x | number of ring bonds |
@ | stereochemistry |
d | non-hydrogen degree -- number of non-hydrogen connections |
= | Jmol atom index, for example: [=23] |
"xxx" | atom type, in double quotes, for example: ["39"r5] (After calculate partialcharge this will be the MMFF94 atom type. [Jmol 12.3.24] |
$(select xxx)" | external selection method. For Jmol, this is an atom expression. For example: [c$(select temperature>10)] [Jmol 12.3.26] |
r500 | a specifically aromatic 5-membered ring [Jmol 12.3.24] |
r600 | a specifically aromatic 6-membered ring [Jmol 12.3.24] |
number? | mass number or undefined (so, for example, [C12?] means any carbon that isn't explicitly C13 or C14 |
$n(pattern) | A specific number of occurances of pattern. For example, C[$3(C=C)]C is synonymous with CC=CC=CC=CC. |
$min-max(pattern) | A variable number of occurances of pattern. For example: A[$0-2(C:G)]A is synonymous with AA or AC(:G)A or AC(:G)C(:G)A. |
residueName#resno^insCode.atomName#atomicNumber | All five fields are optional; only the period itself is required. This primitive may appear with other primitives provided (a) it is first, and (b) it is followed by an operator ("," ,"&", or ";"). This allows searching a bioSMILES string using SMARTS patterns that only involve standard atom types. In the above example, notice that the connecting atoms to protein chains within the non-bioSEQUENCE component indicates the connections to the protein using this extended notation. Thus, both the actual 3D model and the bioSMARTS string for 1d66 can be searched using the SMARTS search "[Cd][S]" as well as the more specific search "Cd[*.SG]". Wild cards provide additional options: [*#35.], [ALA.*], [*#*^A.] [*.*], [*.CA]; however, their presence is optional: [#35.], [ALA.], [^A.] [.], [.CA]. The special designation "0" for an atom name, as in [GLY.0], indicates the "lead atom" -- the alpha carbon for proteins, the phosphorus atom in nucleic acids, or the anomeric carbon in carbohydrates. |
/..../ |
processing directives Jmol recognizes /..../ at the beginning of a pattern as processing directives.
These directives can be introduced individually or as groups. They are not case-sensitive.
/noaromatic/ /nostereo/ is read the same as /noAromatic,noStereo/.
|
{...} |
Jmol atom selection Then general way within Jmol to select atoms based on SMARTS searches is to use select search("..."). To assign variables to the results of a search, one can use the find() command. However, to select one or more atoms within the found pattern, simply enclose the desired atoms in { }: select search("{C}C=O"), for example, returns all alpha carbons, and select search("~d~G{C}A") returns all DNA cytidines that are in GCA sequences. No valence calculation is done to add any additional hydrogens to unbracketed atoms. "CCC" is the same as "[C][C][C]". only unbracketed or bracketed hydrogen atoms such as H[C]C or [H] or [2H] are selected; connected hydrogen atoms as in [CH3] are not selected. |
(.measure) | The extension capitalizes on the fact that in a standard SMARTS string, period "." cannot
ever appear immediately following an open parenthesis "(". Using this fact, the format involves the following:
"(." [single character type - "d" (distance), "a" (angle), or "t" (torsion)] [optional numeric identifier] ":" [optional "!" (not)] [ranges] ")"where [ranges] is one or more ranges in the form [minimum value], [maximum value] separated by commas. That is, one or more This extension must appear immediately following an element symbol or a bracketed atom expression. The separators "," or "-" between minimum and maximum values are equivalent. For example, the following will find all aliphatic carbon-carbon bonds that are between 1.5 and 1.6 angstroms long. select search("C(.d:1.5-1.6)C")The following will select for all 1,2-trans-diaxial methyl groups on a cyclohexane ring, finding all torsions that are outside the range -160 to 160 degrees: select search("{[CH3]}(.t:!-160,160)CC{[CH3]}")The following will select for all 1,2-trans-diequatorial methyl groups on a cyclohexane ring by selecting for all adjacent methyl groups that are anit to a ring atom: select search("{[$([CH3](.t:!-160,160)CC[Cr6])]}CC{[$([CH3](.t:!-160,160)CC[Cr6])]}")The following will select for all gauche-dimethyl groups on a cyclohexane ring: select search("{[CH3]}(.t:50,70,-50,-70)CC{[CH3]}")and the following prints the number of gauche interactions. Division by two is necessary in this case because of the symmetry involved. print compare({*},{*},"MAP","[CH3](.t:50,70,-70,-50)CC[CH3]").length/2The default in terms of specifying which atoms are involved is simply "the next N-1 atoms," where N is 2, 3, or 4. For more complicated patterns, one can designate the specific atoms in the measurement using a numeric identifier after the measurement type. The following will target the bond angle across the carbonyl group in the backbone of a peptide: select search("[*.CA](.a1:105-110)C(.a1)(=O)N(.a1)")Designations can overlap; one simply adds whatever (.xn) designation is wanted after the desired atoms: select search("C(.a1:105,108)C(.a1)(.a2:110,130)C(.a1)(.a2)C(.a2)")In Jmol, this capability is extended to the measure command for easy access to SMARTS-based measurements: select * measure search("C(.a1:110,130)C(.a1)(=O)C(.a1)")Note that the atoms in no way have to be connected. The only restriction is that the three markers for an angle or the four markers for a torsion will be identified in order from left to right within the SMARTS string. The following, for example, will find all carbonyl oxygen atoms that are within 5 angstroms of each other: select search("{O}(.d1:0,5)=C.{O}(.d1)=C")The "." here indicating "not bonded." {O} specifies that although we want to find the entire set, we only want to select the oxygen atoms. The close of the selection brace may appear before or after the (.x) designation. |
pattern1 || pattern2 | "||" indicates "or" and allows searching for multiple patterns, which may overlap. For example: select search("c{O} || c{C}"). Note that the "||" syntax is an alternative to using "[,]", in this case being equivalent to (and slightly slower than) select search("c{[O,C]}"). |
"~" | Any biopolymer. |
"~n~" | DNA or RNA |
"+" | Jmol bioSMARTS adds the "+" bond type to indicate standard sequence. The Jmol bioSMARTS pattern "~p~C+C+C" is the same as "~p~CCC". In conjunction with the cross-linking type ":", one can do searches for double-stranded nucleic acids quite easily. ~d~CCC:GGG would be three CG base-pairs (because the two strands are going in opposite direction). Note that Jmol atom selection can be specified by For example, select search("[CYS.CA][PRO.CB]") would select just the alpha carbon of cysteine and the beta carbon of an adjacent proline. |
branching | Branching (cross-linking) can also be indicated using the standard
SMILES (...) notation. So, for example, ~d~C(G)C(G)C(G) indicates three CG base pairs.
Ring notation can also be used: C:1CC(GGG:1) is the same three CG base pairs.
An empty branch, ~C(), indicates "not cross-linked" -- in this case a cysteine without a disulphide bond or a cytidine that is not base-paired. |
# note: prior to parsing, all white space is removed [smilesDef] == [preface] [smiles] [preface] == { [directiveDefs] | NULL } [directiveDefs] == { [directiveDef] || [directiveDef] [directiveDefs] } [directiveDef] == "/" [processingDirectives] "/" [processingDirectives] == { [processingFlag] | [processingDirective] [processingDirectives] } [processingFlag] == { "noAromatic" | "aromaticDefined" | "aromaticStrict" | "noStereo" | "invertStereo"} (case-insensitive) # note: the noAromatic directive indicates to not distinguish between # aromatic/aliphatic searches -- "C" and "c" # note: the noStereo directive turns off all stereochemical testing # note: thus, both "/noAromatic//noStereo/" and "/noAromatic noStereo/" are valid [smiles] == { [entity] | [entity] "." [entity] } [entity] == { [bioSequence] | [molecularSequence] } [molecularSequence] = [node][connections] [node] == { [atomExpression] | [connectionPointer] } [atomExpression] = { [unbracketedAtomType] | "[" [bracketedExpression] "]" } [unbracketedAtomType] == [atomType] & ! { "Ac" | "Ba" | "Ca" | "Na" | "Pa" | "Sc" | "ac" | "ba" | "ca" | "na" | "pa" | "sc" } # note: Brackets are required for these elements: [Na], [Ca], etc. # These elements Xy are instead interpreted as "X" "y", a single-letter # element followed by an aromatic atom. [atomType] == { [validElementSymbol] | [aromaticType] } [validElementSymbol] == (see Elements.java; including Xx and only through element 109) [aromaticType] == { [validElementSymbol].toLowerCase() } [bracketedExpression] == { "[" [atomPrimitives] "]" } [atomPrimitives] == { [atom] | [atom] [atomModifiers] } [atom] == { [isotope] [atomType] | [atomType] } [isotope] == [digits] # note -- isotope mass must come before the element symbol. [digits] == { [digit] | [digit] [digits] } [digit] == { "0" | "1" | "2" | "3" | "4" | "5" | "6" | 7" | "8" | "9" } [atomModifiers] == { [atomModifier] | [atomModifier] [atomModifiers] } [atomModifier] == { [charge] | [stereochemistry] | [H_Prop] } [charge] == { "-" [digits] | "+" [digits] | [plusSet] | [minusSet] } [plusSet] == { "+" | "+" [plusSet] } [minusSet] == { "-" | "-" [minusSet] } [stereochemistry] == { "@" # anticlockwise | "@@" # clockwise | "@" [stereochemistryDescriptor] | "@@" [stereochemistryDescriptor] } [stereochemistryDescriptor] == [stereoClass] [stereoOrder] [stereoClass] == { "AL" | "TH" | "SP" | "TP" | "OH" } [stereoOrder] == [digits] [connectionPointer] == { "%" [digit][digit] | [digit] | "%(" [digits] ")"} # note: all connectionPointers must have a second matching connectionPointer # and must be preceded by an atomExpression for the # first occurance and either an atomExpression or a bond # for the second occurance # note: Jmol bioSMARTS extends the possible number of rings to > 100 by # allowing %(n) [connections] == [connection] | NULL } [connection] == { [branch] | [bond] [node] } [connections] [branch] == { "(" { [smiles] | [bond] [smiles] } ")" | "()" } # note: empty parentheses "()" are ignored in SMILES and bioSMILES [bond] == { "-" | "=" | "#" | "." | "/" | "\\" | ":" | NULL # note: Jmol will not match two totally independent molecular pieces. For example, # Jmol will not math [Na+].[Cl-]. However, "." can be used to clarify a # structure that has "ring" bond notation: # CC1CCC.C1CC is a valid structure. # note: bioSEQUENCE uses ":" to indicate "cross-linked", which is the default for branches [bioSequence] == [bioCode] [bioNode] [connections] [bioCode] == { "~" | "~" [bioType] "~" } # note: The "~" must be the first character in a component and must be repeated # for each component (separated by ".") [bioType] == { "p" | "n" | "r" | "d" } # note: protein, nucleic, RNA, DNA [bioNode] == { "[" [bioResidueName] "." [bioAtomName] "]" | "[" [bioResidueName] "." [bioAtomName] "#" [atomicNumber] "]" | [bioResidueCode] } [atomicNumber] == [digits] [bioResidueName] == { "ARG" | "GLY" ... } (case-insensitive) [bioAtomName] == {"C" | "CA" | "N" ... } (case-insensitive) [bioResidueCode] == { "A" | "R" | "G" ... } (case-sensitive) # note: In a BioSEQUENCE, residues are designated using standard 1-letter-code group names # or bracketed residues [xxx] with optional atoms specified: [ARG], [CYS.SG].
######## GENERAL ######## # note: prior to parsing, all white space is removed [smartDef] == [preface] [smartsSet] [preface] == { [directiveDefs] [variableDefs] | [variableDefs] | NULL } [directiveDefs] == { [directiveDef] || [directiveDef] [directiveDefs] } [directiveDef] == "/" [processingDirectives] "/" [processingDirectives] == { [processingDirective] | [processingDirective] [processingDirectives] } [processingFlag] == { "noAromatic" | "aromaticDefined" | "aromaticStrict" | "noStereo" | "invertStereo"} (case-insensitive) # note: the noAromatic directive indicates to not distinguish between # aromatic/aliphatic searches -- "C" and "c" # note: the noStereo directive turns off all stereochemical testing # note: thus, both "/noAromatic//noStereo/" and "/noAromatic noStereo/" are valid [variableDefs] == [variableDef] | [variableDef] [variableDefs] [variableDef] == "$" [label] "=" "\"" [smarts] "\"" [comments] ";" [label] == 'A-Z' [any characters other than "=", "(", or "$"] [comments] == [any characters other than ";"] # note: Variable definitions must be parsed first. # After that, all variable references [$XXXX] are replaced [smartsSet] == { [smarts] | [smarts] "||" [smartsSet] } # note: Jmol adds the "or" operation "||", for example: "C=O || C=N" # which, in this case, could also be written as "C=[O,N]" # Jmol preprocesses these sets, evaluates them independently, and then # combines them. [smarts] == { [node3D] [connections] | [bioSequence] } [connections] == [connection] | NULL } [connection] == { [branch] | [bondExpression] [node3D] } [connections] [branch] == { "(" { [smarts] | [bondExpression] [smarts] } ")" | "()" } # note: Default bonding for a branch is single for SMARTS or cross-linked (:) for bioSEQUENCE # note: "()" is ignored in SMARTS and indicates "not cross-linked" in bioSEQUENCE ######## ATOMS ######## [node3D] == { [atomExpression] | [atomExpression] "(." [measure] ")" | [connectionPointer] } [atomExpression] = { [unbracketedAtomType] | [bracketedExpression] | [multipleExpression] | [nestedExpression] } [unbracketedAtomType] == [atomType] & ! { "Ac" | "Ba" | "Ca" | "Na" | "Pa" | "Sc" | "ac" | "ba" | "ca" | "na" | "pa" | "sc" } # note: Brackets are required for these elements: [Na], [Ca], etc. # These elements Xy are instead interpreted as "X" "y", a single-letter # element followed by an aromatic atom. # note: in a bioSEQUENCE, all atom types are 1-letter code group names [atomType] == { [validElementSymbol] | "A" | [aromaticType] | "*" } [validElementSymbol] == (see Elements.java; including Xx and only through element 109) [aromaticType] == { "a" | [validElementSymbol].toLowerCase() } [bracketedExpression] == "[" { [atomOrSet] | [atomOrSet] ";" [atomAndSet] } "]" [atomOrSet] == { [atomAndSet] | [atomAndSet] "," [atomAndSet] } [atomAndSet] == { [atomPrimitives] | [atomPrimitives] "&" [atomAndSet] | "!" [atomPrimitive] | "!" [atomPrimitive] "&" [atomAndSet] } ######## ATOM PRIMITIVES ######## [atomPrimitives] == { [atomPrimitive] | [atomPrimitive] [atomPrimitives] } # note -- if & is not used, certain combinations of primitiveDescritors # are not allowed. Specifically, combinations that together # form the symbol for an element will be read as the element (Ar, Rh, etc.) # when NOT followed by a digit and no element has already been defined # So, for example, [Ar] is argon, [Ar3] is [A&r3], [ORh] is [O&R&h], # but [Ard2] is [Ar&d2] -- "argon with two non-hydrogen connections" # Also, "!" may not be use with implied "&". # Thus, [!a], [!a&!h2], and [h2&!a] are all valid, but [!ah2] is invalid. [atomPrimitive] == { [isotope] | [atomType] | [charge] | [stereochemistry] | [a_Prop] | [A_Prop] | [D_Prop] | [H_Prop] | [h_Prop] | [R_Prop] | [r_Prop] | [v_Prop] | [X_Prop] | [x_Prop] | [at_Prop] | [nestedExpression] } [isotope] == [digits] | [digits] "?" # note -- isotope mass may come before or after element symbol, # EXCEPT "H1" which must be parsed as "an atom with a single H" [digits] == { [digit] | [digit] [digits] } [digit] == { "0" | "1" | "2" | "3" | "4" | "5" | "6" | 7" | "8" | "9" } [charge] == { "-" [digits] | "+" [digits] | [plusSet] | [minusSet] } [plusSet] == { "+" | "+" [plusSet] } [minusSet] == { "-" | "-" [minusSet] } [stereochemistry] == { "@" # anticlockwise | "@@" # clockwise | "@" [stereochemistryDescriptor] | "@@" [stereochemistryDescriptor] } [stereochemistryDescriptor] == [stereoClass] [stereoOrder] [stereoClass] == { "AL" | "TH" | "SP" | "TP" | "OH" } [stereoOrder] == [digits] # note -- "?" here (unspecified) is not relevant in Jmol SMARTS, only Jmol bioSMARTS [A_Prop] == "#" [digits] # elemental atomic number [a_Prop] == "=" [digits] # Jmol atom index (starts with 0) [D_Prop] == { "D" [digits] | "D" } # degree -- total number of connections # excludes implicit H atoms; default 1 [d_Prop] == { "d" [digits] | "d" } # degree -- non-hydrogen connections # default 1 [H_Prop] == { "H" [digits] | "H" } # exact hydrogen count # excludes implicit H atoms [h_Prop] == { "h" [digits] | "h" } # implicit hydrogens -- "h" indicates "at least one" # (see note below) [R_Prop] == { "R" [digits] | "R" } # ring membership; e.g. "R2" indicates "in two rings" # "R" indicates "in a ring" # !R" or "R0" indicates "not in any ring" [r_Prop] == { "r" [digits] | "r" } # in ring of size [digits]; "r" indicates "in a ring" # r500 and r600 match specifically aromatic # 5- and 6-membered rings, respectively [Jmol 12.3.24] [v_Prop] == { "v" [digits] | "v" } # valence -- total bond order (counting double as 2, e.g.) [X_Prop] == { "X" [digits] | "X" } # connectivity -- total number of connections # includes implicit H atoms [x_Prop] == { "x" [digits] | "x" } # ring connectivity -- total ring connections [at_Prop] == { "\"" [charsExceptDoubleQuote] | "\"" } # atom type (in double quotes) [ Jmol 12.3.24] ######## Nested and Multiple Expressions ######## [nestedExpression] == "$(" [atomExpression] ")" | "$(select" [contextualSearchPhrase] ")" # note: nestedExpressions return only the first atom as a match when an atom expression # is involved, not all atoms in the expression. [contextualSearchPhrase] == [any characters with well-matched "(" and ")"] # note: the contextual search phrase is to be processed by the context implementing # the SMARTS. In the case of Jmol, [contextualSearchPhrase] is any Jmol # atom expression that can be in a standard Jmol SELECT command. [multipleExpression] == { "[$" [nTimes] "(" [orExpression] ")]" | "[$[nMinimum] "-" [nMaximum](" [orExpression] ")]" } [orExpression] = { [atomExpression] | [atomExpression "|" [orExpression] | [atomExpression "||" [orExpression] } # note: "|" and "||" are synonymous in this inner context; "|" is preferred simply # for readability (whereas "||" is required for the [smartsSet] context). # note: This syntax is carefully written to exclude $(xxx) by itself, which # is a nestedExpression, not a multipleExpression. The difference is that # the nestedExpression only returns the first atom, while the multipleExpression # returns all atoms. To return only the first atom within this context # it is necessary to use a nested expression within the multiple expression. # For example: "CC[$2( $(C=O) | $(C=N) )]" # is the same as "CC$(C=[O,N])$(C=[O,N])", although Jmol preprocesses it as # "CC$(C=O)$(C=O)||CC$(C=O)$(C=N)||CC$(C=N)$(C=O)||CC$(C=N)$(C=N)" [nTimes] == [digits] [nMinimum] == [digits] [nMaximum] == [digits] # note: multipleExpressions allow for searching a given number of expressions or # a variable number of expressions (including 0, perhaps) # Jmol pre-processes these expressions and turns them into a set: # pattern1 || pattern2 || pattern3.... ######## BioSEQUENCE ######## [bioSequence] == [bioCode] [bioNode] [connections] [bioCode] == { "~" | "~" [bioType] "~" } # note: The "~" must be the first character in a component and must be repeated # for each component (separated by ".") [bioType] == { "p" | "n" | "r" | "d" } # note: protein, nucleic, RNA, DNA [bioNode] == { "[" [bioResidueName] "]" | "[" [bioResidueName] "." [bioAtomName] "]" | "[" [bioResidueName] "." [bioAtomName] [A_Prop] "]" | [bioResidueCode] } [bioResidueName] == { "*" | "ARG" | "GLY" ... } (case-insensitive) [bioAtomName] == { "*" | "0" | "C" | "CA" | "N" ... } (case-insensitive) # note: "0" indicates the "lead atom": # nucleic: P if present, or H5T if present, or O5'/O5* # protein: CA # carbohydrate: the first atom of the group listed in the model file [bioResidueCode] == { "*" | "A" | "R" | "G" ... } (case-sensitive) # note: wildcard or standard group 1-letter-code # or, in the case of RNA or DNA: # "N" (any residue; same as "*"), # "R" (any purine -- A or G) # "Y" (any pyrimidine -- C or T or U) ######## CONNECTIONS (aka "rings") ######## [connectionPointer] == { [digit] | "%" [digit][digit] | "%(" [digits] ")" } # note: All connectionPointers must have a second matching connectionPointer # and must be preceded by an atomExpression for the # first occurance and either an atomExpression or a bondExpression # for the second occurance. The matching connectionPointers may be # in different "components" (separated by "."), in which case they # represent general connections and not necessarily rings. ######## BONDS ######## [bondExpression] == { [bondOrSet] | [bondOrSet] ";" [bondAndSet] } [bondOrSet] == { [bondAndSet] | [bondAndSet] "," [bondAndSet] } [bondAndSet] == { [bondPrimitives] | [bondPrimitives] "&" [bondAndSet] | "!" [bondPrimitive] | "!" [bondPrimitive] "&" [bondAndSet] } ######## BOND PRIMITIVES ######## [bondPrimitives] == { [bondPrimitive] | [bondPrimitive] [bondPrimitives] } [bond] == { "-" | "=" | "#" | "." | "/" | "\\" | ":" | "~" | "@" | "+" | "^" | NULL # note: All bondExpressions are not valid. Stereochemistry should not # be mixed with the others, as it represents a single bond always. # In addition, "." ("no bond") cannot be mixed with any bond type. # Nothing would be retrieved by "-&=", as a bond cannot be both single # and double. However, "-@" is potentially very useful -- "ring single-bonds" # or "=&!@" -- "doubly-bonded atoms where the double bond is not in a ring" # note: Jmol will not match two totally independent molecular pieces. For example, # Jmol will not math [Na+].[Cl-] # note: "+" indicates "adjacent biomolecular groups in a chain" # note: a bioSEQUENCE ends with "." or the end of the string. A new bioSEQUENCE # can continue with "~" immediately following this "." # note: For a SMARTS search, "." indicates the start of a new subset, not necessarily a # new component. # note: "^" indicates atropisomer bond with positive dihedral angle ######## MEASURES ######## [measure] == { [measureId] | [measureId] ":" [ranges] | [measureId] ":!" [range] } [measureId] == { [measureCode] | [measureCode] [digits] } [measureCode == { "d" | "a" | "t" } [ranges] == {[range] | [ranges] { "," | "-" } [range]} [range] == [minimumValue] { "," | "-" } [maximumValue] [minimumValue] == [decimalNumber] [maximumValue] == [decimalNumber]
1 / \ 2 6 -- 6a | | 5a -- 5 4 \ / 3with arbitrary order and up to N substituents...