Thursday, 12 September 2013

Need some help find Open Reading frame in DNA using Regex?

Need some help find Open Reading frame in DNA using Regex?

I'm trying to find all the possible reading frames of a minimum nucleotide
length.
"A[TU]G(?:(...){3}){%d,}?(?:[TU]AG|[TU]AA|[TU]GA)" % (minimal_aa)
this pretty much does what I want, but for some reason, some reading
frames do not acknowledge some stop codons.
I'm sure it has something to do with the (...) portion. How do I tell it
to ALWAYS stop at [TU]AG|[TU]AA|[TU]GA, although passing through multiple
start codons is fine.
I'm using Python on Eclipse.
I'm using Pythex.org to check my strings, but here's a sample of what I'm
talking about:
AUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAACACACGUCCAACUCAGUUUGCCUGUCCUUCAGGUUAGAGACGUGCUAGUGCGUGGCUUCGGGGACUCUGUGGAAGAGGCCCUAUCGGAGGCACGUGAACACCUCAAAAAUGGCACUUGUGGUCUAGUAGAGCUGGAAAAAGGCGUACUGCCCCAGCUUGAACAGCCCUAUGUGUUCAUUAAACGUUCUGAUGCCUUAAGCACCAAUCACGGCCACAAGGUCGUUGAGCUGGUUGCAGAAAUGGACGGCAUUCAGUACGGUCGUAGCGGUAUAACACUGGGAGUACUCGUGCCACAUGUGGGCGAAACCCCAAUUGCAUACCGCAAUGUUCUUCUUCGUAAGAACGGUAAUAAGGGAGCCGGUGGUCAUAGCUAUGGCAUCGAUCUAAAGUCUUAUGACUUAGGUGACGAGCUUGGCACUGAUCCCAUUGAAGAUUAUGAACAAAACUGGAACACUAAGCAUGGCAGUGGUGCACUCCGUGAACUCACUCGUGAGCUCAAUGGAGGUGCAGUCACUCGCUAUGUCGACAACAAUUUCUGUGGCCCAGAUGGGUACCCUCUUGAUUGCAUCAAAGAUUUUCUCGCACGCGCGGGCAAGUCAAUGUGCACUCUUUCCGAACAACUUGAUUACAUCGAGUCGAAGAGAGGUGUCUACUGCUGCCGUGACCAUGAGCAUGAAAUUGCCUGGUUCACUGAGCGCUCUGAUAAGAGCUACGAGCACCAGACACCCUUCGAAAUUAAGAGUGCCAAGAAAUUUGACACUUUCAAAGGGGAAUGCCCAAAGUUUGUGUUUCCUCUUAACUCAAAAGUCAAAGUCAUUCAACCACGUGUUGAAAAGAAAAAGACUGAGGGUUUCAUGGGGCGUAUACGCUCUGUGUACCCUGUUGCAUCUCCACAGGAGUGUAACAAUAUGCACUUGUCUACCUUGAUGAAAUGUAAUCAUUGCGAUGAAGUUUCA
UGGCAGACGUGCGACUUUCUGAAAGCCACUUGUGAACAUUGUGGCACUGAAAAUUUAGUUAUUGAAGGACCUACUACAUGUGGGUACCUACCUACUAAUGCUGUAGUGAAAAUGCCAUGUCCUGCCUGUCAAGACCCAGAGAUUGGACCUGAGCAUAGUGUUGCAGAUUAUCACAACCACUCAAACAUUGAAACUCGACUCCGCAAGGGAGGUAGGACUAGAUGUUUUGGAGGCUGUGUGUUUGCCUAUGUUGGCUGCUAUAAUAAGCGUGCCUACUGGGUUCCUCGUGCUAGUGCUGAUAUUGGCUCAGGCCAUACUGGCAUUAA
wait. this is a bad example. because it actually checks out right now that
I eyeball the output. I had to shorten this, but there was one code a few
thousands of nucleotides that was riddled with stop codons and nothing was
working properly. I hope you get what I mean, if not don't worry.
Thanks in advance amigos!

No comments:

Post a Comment