[ Program Manual | User's Guide | Data Files | Databases ]
PepData translates DNA sequence(s) in all six frames, concatenates the translations, and creates a single, protein output file.
It is sometimes necessary to look for protein sequence patterns in nucleic acid sequences where coding regions have not been defined or where there is some suspicion that a coding sequence may contain a frame-shift error. PepData simply translates the entire sequence in all six frames so that every possible protein translation can be examined. PepData translates nucleotide sequences in all six frames and concatenates these six amino acid sequence together into one output sequence. Stop codons appear as '*'s in the output sequence.
If you translate several different nucleotide sequences, the translations are written into separate output files with the same name as the nucleotide sequence but with the file name extension .pdt.
You can find out how to specify a group of sequences that interest you in Chapter 2, Using Sequence Files and Databases in the User's Guide.
Here is a session using PepData to translate some of the human globin sequences in GenBank:
% pepdata PEPDATA from what sequence(s) ? GenBank:Humhb* Humhb16aa 1,216 bp 2,428 aa Humhb1az 483 bp 962 aa Humhb24 2,231 bp 4,458 aa /////////////////////////////// PEPDATA complete with Input files: 103 Amino acids: 431,184 Output files: 103 Output file names: *.pdt %
!!AA_SEQUENCE 1.0 PEPDATA from: humhb16aa check: 8158 from: 1 to: 1,216 M31630 Human cyclic AMP response element-binding protein (HB16) mRNA, 3' end. 6/90 bases 1 to: 1216 translated into: 1 to: 405 bases 2 to: 1216 translated into: 406 to: 810 bases 3 to: 1216 translated into: 811 to: 1214 reverse of bases 1 to: 1216 translated into: 1215 to: 1619 reverse of bases 1 to: 1215 translated into: 1620 to: 2024 reverse of bases 1 to: 1214 translated into: 2025 to: 2428 humhb16aa.pdt Length: 2428 September 30, 1998 15:35 Type: P Check: 3306 .. 1 VPGPFPLLLH LPNGQTMPVA IPASITSSNV HVPAAVPLVR PVTMVPSVPG 51 IPGPSSPQPV QSEAKMRLKA ALTQQHPPVT NGDTVKGHGS GLVRTQSEES 101 RPQSLQQPAT STTETPASPA HTTPQTQSTS GRRRRAANED PDEKRRKFLE ///////////////////////////////////////////////////////////
PepData accepts multiple (one or more) nucleotide sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*. If PepData rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.
Translate and Map are other programs that translate nucleotide sequences into protein. BLAST searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST can produce gapped alignments for the matches it finds. TFastA does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences. TFastA translates the nucleotide sequences in all six reading frames before performing the comparison. It is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?" FrameSearch searches a group of protein sequences for similarity to one or more nucleotide query sequences, or searches a group of nucleotide sequences for similarity to one or more protein query sequences. For each sequence comparison, the program finds an optimal alignment between the protein sequence and all possible codons on each strand of the nucleotide sequence. Optimal alignments may include reading frame shifts. FrameAlign creates an optimal alignment of the best segment of similarity (local alignment) between a protein sequence and the codons in all possible reading frames on a single strand of a nucleotide sequence. Optimal alignments may include reading frame shifts.
Since GCG sequences cannot contain more than 350,000 symbols, PepData cannot translate sequences longer than 175,000 bp.
PepData names output files with the base name of the input file (without the directory) and the file name extension .pdt and writes them in your current working directory.
All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % pepdata [-INfile=]genbank:humhb* Prompted Parameters: None Local Data Files: [-TRANslate=]translate.txt contains the genetic code Optional parameters: -EXTension=.pdt lets you specify an output file name extension -DIRectory=dirname writes output to another directory
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate=mycode.txt. Translation tables are discussed in more detail in Appendix VII.
You can set the parameters listed below from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)
allows you to choose a file name extension for the output files other than the .pdt that PepData uses by default.
writes the output files into a directory other than your current working directory.
[ Program Manual | User's Guide | Data Files | Databases ]
Technical Support: support-us@accelrys.com
or support-eu@accelrys.com
Copyright (c) 1982-2002 Accelrys Inc. A subsidiary of Pharmacopeia, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark and GCG and the GCG logo are registered trademarks of Accelrys Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.