- Do NOT hand in the program output - just answer the questions!
Add your name and e-mail address on the front page!!!
Don't forget! Read through the assignment first, so you get a better picture of what its all about!
In this assignment, we will be working with the Needle, Matcher and Water programs (from the EMBOSS package) to compare two sequences on the DNA and protein levels. We will also play with the parameters, to understand better how pairwise alignment works.
NOTE: The order of the sequences makes a difference! Make sure you are consistent whenever you copy paste!
NOTE: The links in the assignment will open new windows, so if you need a new one for a given program or database, go back to where it was first given, and it will open another one for you.
Your lab has been looking for the gene that causes the ocean smell - the reaction has been known for a while, an enzyme (DMSP lyase) has been found in bacteria, but not in eukaryotes. The bacterial activity isn't enough to explain all of the DMS (the product of the reaction) found in the ocean. You suspect that the algae Emiliania huxleyi is a major producer, but there is no homolog of the bacterial gene. You perform biochemical fractionation of the cells in order to isolate the active fraction. After a lot of work, including transcriptomics and proteomics, the enzyme is found, and named Alma1. (for more details see: https://pubmed.ncbi.nlm.nih.gov/26113722/)
The new question: is Emiliania huxleyi the only DMS producer in the ocean, or are there other eukaryotic species that have homologs?
You perform various database searches, and find an interesting candidate in a coral symbiont, Symbiodinium. Now we have to compare the sequences, and see how similar it really is.
Take the Emiliania huxleyi protein sequence (accession number: AKO62592.1) from NCBI in fasta format (if you give it a simple name, like the species, it will make it easier to tell things apart afterwards. You can change the name after you paste, or you can take the sequence without the header line and create your own). Make sure to keep a window with each sequence open, you will have to copy/paste them a few times.
The Symbiodinium sequence comes from the TSA database (next generation transcript assemblies), and the translation isn't in NCBI. You can get the translation here symbio.pep
First of all, we'd like to see how similar these sequences are overall, so we'll run a global alignment program, Needle. We'll use the EBI website for this NEEDLE.
Make sure you the program is set on protein, and check the parameters (but use the defaults):
Open a window at EBI by clicking here
Make sure you choose protein. Paste in the sequences, and run using the default parameters (Open them so that you know what they are).
Open a Matcher window here
Make sure you choose protein, and in the parameters, set alternative matches to 3.
Now we'll start changing the parameters.
Go back to Matcher, and try to get the parameters as close as possible to the others (Gap open of 10, gap extend of 1).
Now we'll really play with the parameters using Water
Start by trying to make the parameters more stringent, like the Matcher default: Gap open of 15, gap extend of 5.
Lets go even higher and lower: Run Water twice more, once with the Gap open as 50, gap extend of 5, and once with gap open of 1, gap extend of 0.1
Now we'll look at the DNA sequences, and both are available on NCBI (accession numbers: KR703620.1 and GAKY01006775.1)
We'll use all three programs with the default parameters. Make sure the programs are set on DNA!
The standard questions for each program...
(for anyone interested in the real answer, you can read the paper mentioned in the beginning....)