官术网_书友最值得收藏!

Getting ready

Our simple example will use data from the region where the LCT gene is located in the human genome. The LCT gene codifies lactase, an enzyme involved in the digestion of lactose.

We will take this information from Ensembl. Go to http://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000115850 and choose Export data. The Output format should be BED Format. Gene information should be selected (you can choose more if you want). For convenience, a downloaded file is available in the Chapter02 directory, called LCT.bed.

The Notebook for this code is called Chapter02/Processing_BED_with_HTSeq.ipynb.

Take a look at the file before we start. An example of a few lines of this file is as follows:

track name=gene description="Gene information"
2 135836529 135837180 ENSE00002202258 0 -
2 135833110 135833190 ENSE00001660765 0 -
2 135789570 135789798 NM_002299.2.16 0 -
2 135787844 135788544 NM_002299.2.17 0 -
2 135836529 135837169 CCDS2178.117 0 -
2 135833110 135833190 CCDS2178.116 0 -

The fourth column is the feature name. This will vary widely from file to file, and you will have to check it each and every time. However, in our case, it seems apparent that we have Ensembl Exons (ENSE...), Genbank records (NM_...), and coding region information (CCDS) from the CCDS database (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi).

主站蜘蛛池模板: 游戏| 塔城市| 廉江市| 高邑县| 翁牛特旗| 克东县| 康平县| 清原| 射洪县| 徐闻县| 兴和县| 拉萨市| 吴川市| 秦皇岛市| 喀喇沁旗| 获嘉县| 阿荣旗| 万宁市| 乌拉特中旗| 广宁县| 达尔| 长岭县| 贺州市| 清水县| 新闻| 乌海市| 凌云县| 西乡县| 荃湾区| 平山县| 黑山县| 宁国市| 绩溪县| 桐城市| 昌黎县| 蓝田县| 那坡县| 方正县| 宁城县| 醴陵市| 新丰县|