tion. Often they indicate sequence-specific binding sites for proteins and other important markers. However, sometimes they are not exactly conserved, meaning some mutations can happen in a motif in a particular organism. Mutations can be DNA substitutions/deletions/insertions. Therefore, sequences are usually aligned and a consensus pattern of a motif is calculated over all examples from organisms. The following are examples of a transcription factor binding (TFB) site for the lexA repressor in_ E. Coli _located in a file called lexA.fasta:

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
1st Edition
ISBN:9780357392676
Author:FREUND, Steven
Publisher:FREUND, Steven
Chapter7: Creating Templates, Importing Data, And Working With Smartart, Images, And Screenshots
Section: Chapter Questions
Problem 10AYK
icon
Related questions
Question

Building and using DNA Motifs

Background

Sequence motifs are short, recurring (meaning conserved) patterns in DNA that are presumed to have a biological function. Often they indicate sequence-specific binding sites for proteins and other important markers. However, sometimes they are not exactly conserved, meaning some mutations can happen in a motif in a particular organism. Mutations can be DNA substitutions/deletions/insertions. Therefore, sequences are usually aligned and a consensus pattern of a motif is calculated over all examples from organisms.

The following are examples of a transcription factor binding (TFB) site for the lexA repressor in_ E. Coli _located in a file called lexA.fasta:

>dinD 32->52 aactgtatataaatacagtt >dinG 15->35 tattggctgtttatacagta >dinH 77->97 tcctgttaatccatacagca >dinI 19->39 acctgtataaataaccagta >lexA-1 28->48 tgctgtatatactcacagca >lexA-2 7->27 aactgtatatacacccaggg >polB(dinA) 53->73 gactgtataaaaccacagcc >recA 59->79 tactgtatgagcatacagta >recN-1 49->69 tactgtatataaaaccagtt >recN-2 27->47 tactgtacacaataacagta >recN-3 9-29 TCCTGTATGAAAAACCATTA >ruvAB 49->69 cgctggatatctatccagca >sosC 18->38 tactgatgatatatacaggt >sosD 14->34 cactggatagataaccagca >sulA 22->42 tactgtacatccatacagta >umuDC 20->40 tactgtatataaaaacagta >uvrA 83->103 tactgtatattcattcaggt >uvrB 75->95 aactgtttttttatccagta >uvrD 57->77 atctgtatatatacccagct

Each line that starts with “>” is the header that states what gene this sequence was upstream of and where it is located relative to the gene. (For your purposes, we can ignore this and your code should ignore these lines when parsing the DNA sequences in). Each line in between is each nucleotide sequence of each TFB. Each nucleotide has a position in the sequence. You can assume that all sequences will be the same length.

You also can do very minimal input error checking – I won’t be checking extensively for input error checking. However, do make sure that if a function relies on another function being run first, you have it do that.

Creating DNAMOTIF class

You will create a DNAMOTIF class that has the following attributes and functions:

  1. __init__(self): Initialize the class.

self.instances=[] #These are a list of DNA sequence strings (no header) self.consensus=[] # A DNA sequence String self.counts= {'A': [], 'C': [], 'G':[],'T':[]} # A dictionary of nucleotide counts

  1. __str__: Return a string with the sequence instances of the motif on each line

  2. __len__: Return the length of a motif, which is the length of one of the sequences in the collection.

Example Input:

lexA=DNAMOTIF() lexA.parse("lexA.fasta") print(len(lexA))

Output:

20

  1. parse(self,filename): read in DNA instances from a FASTA file

Example Usage:

lexA.parse("lexA.fasta") print(lexA) aactgtatataaatacagtt tattggctgtttatacagta tcctgttaatccatacagca acctgtataaataaccagta tgctgtatatactcacagca aactgtatatacacccaggg gactgtataaaaccacagcc tactgtatgagcatacagta tactgtatataaaaccagtt tactgtacacaataacagta TCCTGTATGAAAAACCATTA cgctggatatctatccagca tactgatgatatatacaggt cactggatagataaccagca tactgtacatccatacagta tactgtatataaaaacagta tactgtatattcattcaggt aactgtttttttatccagta atctgtatatatacccagct

  1. count(self): Count occurrences of A’s, C’s, G’s, and T’s in each position and store in a dictionary. Convert all sequences to upper case for consistency

Example Input:

lexA.count()

To Access Result:

lexA.counts={'A': [5, 13, 0, 0, 0, 1, 15, 1, 15, 4, 12, 6, 16, 6, 10, 0, 19, 0, 0, 12], 'C': [2, 3, 18, 0, 0, 0, 1, 2, 0, 1, 3, 6, 1, 4, 8, 19, 0, 0, 6, 1], 'G': [1, 2, 0, 0, 19, 3, 0, 1, 3, 1, 1, 0, 0, 0, 0, 0, 0, 18, 3, 1], 'T': [11, 1, 1, 19, 0, 15, 3, 15, 1, 13, 3, 7, 2, 9, 1, 0, 0, 1, 10, 5]}

  1. compute_consensus(self): Return an UPPERCASE sequence of the most frequent nucleotides in each position of the motif. If more than one are tied, return the first one lexicographically.

Example Input:

lexA.compute_consensus()

To Access Result:

print(lexA.consensus) TACTGTATATATATACAGTA

|lexA - Notepad
File Edit Format View Help
>dind 32->52
aactgtatataaatacagtt
>dinG 15->35
tattggctgtttatacagta
>dinH 77->97
tcctgttaatccatacagca
>dinI 19->39
acctgtataaataaccagta
>lexA-1 28->48
tgctgtatatactcacagca
>lexA-2 7->27
aactgtatatacacccaggg
>polB(dinA) 53->73
gactgtataaaaccacagcc
>recA 59->79
tactgtatgagcatacagta
>recN-1 49->69
tactgtatataaaaccagtt
>recN-2 27->47
tactgtacacaataacagta
>recN-3 9-29
ТССTGTATGAААAАССАТТА
>ruvAB 49- >69
cgctggatatctatccagca
>sosc 18->38
tactgatgatatatacaggt
>sosD 14->34
cactggatagataaccagca
>sulA 22->42
tactgtacatccatacagta
>umuDC 20->40
tactgtatataaaaacagta
>uvrA 83->103
tactgtatattcattcaggt
>uvrB 75->95
aactgtttttttatccagta
>uvrD 57->77
atctgtatatatacccagct
Transcribed Image Text:|lexA - Notepad File Edit Format View Help >dind 32->52 aactgtatataaatacagtt >dinG 15->35 tattggctgtttatacagta >dinH 77->97 tcctgttaatccatacagca >dinI 19->39 acctgtataaataaccagta >lexA-1 28->48 tgctgtatatactcacagca >lexA-2 7->27 aactgtatatacacccaggg >polB(dinA) 53->73 gactgtataaaaccacagcc >recA 59->79 tactgtatgagcatacagta >recN-1 49->69 tactgtatataaaaccagtt >recN-2 27->47 tactgtacacaataacagta >recN-3 9-29 ТССTGTATGAААAАССАТТА >ruvAB 49- >69 cgctggatatctatccagca >sosc 18->38 tactgatgatatatacaggt >sosD 14->34 cactggatagataaccagca >sulA 22->42 tactgtacatccatacagta >umuDC 20->40 tactgtatataaaaacagta >uvrA 83->103 tactgtatattcattcaggt >uvrB 75->95 aactgtttttttatccagta >uvrD 57->77 atctgtatatatacccagct
main.py
Load default template...
1 class DNAMOTIF:
def _init_(self):
self.instances=[]
self.consensus=[]
self.counts= {'A': [], 'C': [], 'G':[],'T':[]}
3
4
6
_str_(self):
pass # todo
7
def
8
insert your code here
e.g. return
9.
10
def _len__(self):
11
# todo
insert your code here
e.g. return
12
13
def count(self):
14
pass # todo
15
16
def compute_consensus(self):
17
pass # todo
18
19
def parse(self, filename):
20
21
# todo
insert your code here - e.g. self. instances
<raw DNA sequences>
%3D
22
Transcribed Image Text:main.py Load default template... 1 class DNAMOTIF: def _init_(self): self.instances=[] self.consensus=[] self.counts= {'A': [], 'C': [], 'G':[],'T':[]} 3 4 6 _str_(self): pass # todo 7 def 8 insert your code here e.g. return 9. 10 def _len__(self): 11 # todo insert your code here e.g. return 12 13 def count(self): 14 pass # todo 15 16 def compute_consensus(self): 17 pass # todo 18 19 def parse(self, filename): 20 21 # todo insert your code here - e.g. self. instances <raw DNA sequences> %3D 22
Expert Solution
steps

Step by step

Solved in 2 steps with 3 images

Blurred answer
Knowledge Booster
Data Tables
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:
9780357392676
Author:
FREUND, Steven
Publisher:
CENGAGE L
Programming with Microsoft Visual Basic 2017
Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:
9781337102124
Author:
Diane Zak
Publisher:
Cengage Learning
New Perspectives on HTML5, CSS3, and JavaScript
New Perspectives on HTML5, CSS3, and JavaScript
Computer Science
ISBN:
9781305503922
Author:
Patrick M. Carey
Publisher:
Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:
9781337508841
Author:
Carey
Publisher:
Cengage