读懂GenBank文件格式中的资料
一、 LOCUS
在 GenBank 格式中,
LOCUS NM_001469 2156 bp mRNA linear (家系血统) PRI ( primate 猿类) 16-DEC-2004 DEFINITION Homo sapiens thyroid autoantigen 70kDa (Ku antigen) (G22P1), mRNA. The LOCUS field contains a number of different data elements, including locus name, sequence length, molecule type, GenBank division, and modification date. Each element is described below.
二、 COMMENT
1 、 REVIEWED REFSEQ :说明了该 RefSeq 生成的过程。
2 、 Summary :说明了该序列的功能。
三、 Feature
名词解释: information about genes and gene products, as well as regions of ( biological significance reported in the ) sequence. These can include regions of the sequence that code for proteins and RNA molecules.
Feature 下的副标题内容太复杂,必要时到这里 The DDBJ/EMBL/GenBank Feature Table 查 .
1 、 key :一般选择 Location/Qualifier 。
2 、 complement : cDNA 。 If a feature is located on the complementary strand, the word "complement" will appear before the base span.
3 、 5< :指向 5 ’端。 If the "<" symbol precedes a base span, the sequence is partial on the 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span, the sequence is partial on the 3' end (e.g., CDS 435..915>.
4 、 /db_xref :其字符串是通往其他数据库的链接。
/db_xref="taxon:9606" taxonomy 物种分类学 /db_xref="GeneID:2547" 链接到 Gene 。 /db_xref="LocusID:2547" 链接到 Locuslink 。 /db_xref="MIM:152690" 链接到 OMIM 。
四、两个例子:
Key =Location/Qualifiers CDS=23..400 ====/product="alcohol dehydrogenase" ====/gene="adhI" might be read as: The feature CDS is a coding sequence beginning at base 23 and ending at base 400, has a product called 'alcohol dehydrogenase' and is coded for by a gene called “ adhI ” A more complex description: Key=Location/Qualifiers CDS=join(544..589,688..>1032) ====/product="T-cell receptor beta-chain" which might be read as: This feature, which is a partial coding sequence is formed by joining elements indicated to form one contiguous sequence encoding a product called T-cell receptor beta-chain.